I ended up considering N8N for something else (workflow for extracting data from documents) so your video is just great fit to use extracted data to suitecrm and create leads (and upload processed files).
One of the things I would be very interested in is to have a way to go through my suitecrm file library and (re)classify them. There are thousands of documents which were uploaded before this became possibilty so going through old records and try to classify them would be great.
Runing it once on existing library would be nice, and then running the workflow on all newly uploaded files.
For the extraction, I’m doing something similar in multiple for CVs and invoices. Make sure that the context window is big enough, if the documents are getting longer.
And depending on how / what you’d like to extract, check out BERT models or maybe Spacy.
As for the initial load of existing data:
How about one dropdown field or date time field that you would fill for records that are completed by your workflow.
Then your workflow can trigger cron based every 5 min or so, fetch all records with empty dropdown or date time fields, take the top 20 records and call a subworkflow with the record ID.
The subworkflow would be your standard workflow which executes your extraction logic.