Hello,
I want to start off by saying that I am not a programmer...and have very little knowledge in this area.
What I would like to know if Apache would be capable of doing the following:
Take an extensive list (A) of strings of unique words (these are titles - anywhere from 4 words to 30) saved in either an Excel worksheet or in a text file and search for instances (B) where these can be found in PDF files saved on a hard drive (over 100k files). The search would need to be done using a fuzzy logic rather than exact matching and the output would be in an Excel file list the unique string found (A), the file name in which the match was made (B), the page number where the match was made and the surrounding text on either side of As well, would this be a complicated program, usable by novices coached in the process necessary to input the title file (A) and direct the search to the relevant folder containing the PDF files (B).
I eagerly await (hopefully) an affirmative answer.
Cheers!
I want to start off by saying that I am not a programmer...and have very little knowledge in this area.
What I would like to know if Apache would be capable of doing the following:
Take an extensive list (A) of strings of unique words (these are titles - anywhere from 4 words to 30) saved in either an Excel worksheet or in a text file and search for instances (B) where these can be found in PDF files saved on a hard drive (over 100k files). The search would need to be done using a fuzzy logic rather than exact matching and the output would be in an Excel file list the unique string found (A), the file name in which the match was made (B), the page number where the match was made and the surrounding text on either side of As well, would this be a complicated program, usable by novices coached in the process necessary to input the title file (A) and direct the search to the relevant folder containing the PDF files (B).
I eagerly await (hopefully) an affirmative answer.
Cheers!