May 7, 2013 at 10:29 pm
OCR means Optical character recognition, it is the mechanical or electronic conversion of scanned images of handwritten, typewritten or printed text into machine-encoded text. Some PDFs are scans, so OCR recongnition[/url] would be required, PDF format is well-documented, PDF have multiple columns and the extraction of pdf text needs to use a mature and structure pdf reading app.
May 8, 2013 at 1:11 am
dawnbrown243 (5/7/2013)
OCR means Optical character recognition, it is the mechanical or electronic conversion of scanned images of handwritten, typewritten or printed text into machine-encoded text. Some PDFs are scans, so OCR recongnition[/url] would be required, PDF format is well-documented, PDF have multiple columns and the extraction of pdf text needs to use a mature and structure pdf reading app.
Even though this thread is getting old, it was never fully resolved and remains interesting.
Are you able to suggest how to "use a mature and structure (sic.) pdf reading app" in SSIS to solve this problem?
May 9, 2013 at 9:51 am
Phil Parkin (5/8/2013)
dawnbrown243 (5/7/2013)
OCR means Optical character recognition, it is the mechanical or electronic conversion of scanned images of handwritten, typewritten or printed text into machine-encoded text. Some PDFs are scans, so OCR recongnition[/url] would be required, PDF format is well-documented, PDF have multiple columns and the extraction of pdf text needs to use a mature and structure pdf reading app.Even though this thread is getting old, it was never fully resolved and remains interesting.
Are you able to suggest how to "use a mature and structure (sic.) pdf reading app" in SSIS to solve this problem?
The first thing I would try is invoking the application that extracts the text from the PDF file from a command line (I would expect a "mature" PDF processing application to have a command line interface), probably with an Execute Process task. It would likely be easiest to have the app write the text to a flat file, then use the appropriate connection managers, etc. to ETL those files.
Jason Wolfkill
Viewing 3 posts - 16 through 18 (of 18 total)
You must be logged in to reply to this topic. Login to reply
This website stores cookies on your computer.
These cookies are used to improve your website experience and provide more personalized services to you, both on this website and through other media.
To find out more about the cookies we use, see our Privacy Policy