Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

SEARCHABLE PDF'S Expand / Collapse
Author
Message
Posted Monday, March 31, 2014 8:45 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, March 31, 2014 9:53 AM
Points: 2, Visits: 2
Hi Guys,

I have a sql server 2005 table that stores a list of small pdf articles, there are over 1900 of them, I have a Title, Author and Location field which stores a link to the file, I can search for a keyword in the Title and Author field but want to be able to search the content of the pdf itself. Can I bulk insert the pdfs into the database and make that field searchable, or would that be too slow?

Thanks for any suggestions you may have.
Post #1556505
Posted Monday, March 31, 2014 9:50 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Yesterday @ 3:21 PM
Points: 13,083, Visits: 11,918
JK 80940 (3/31/2014)
Hi Guys,

I have a sql server 2005 table that stores a list of small pdf articles, there are over 1900 of them, I have a Title, Author and Location field which stores a link to the file, I can search for a keyword in the Title and Author field but want to be able to search the content of the pdf itself. Can I bulk insert the pdfs into the database and make that field searchable, or would that be too slow?

Thanks for any suggestions you may have.


You will have to actually open the pdf and read it. If you insert it into the database you would have a byte array and the contents would look something like

0x255044462D312E340D0A25........

That obviously might be searchable but highly unlikely to return the desired results. The other option would be to open each file one at a time and scan them for the text you are looking for. I think I would look to CLR for something like this as just using sql is really not the right tool for the task at hand.


_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Post #1556561
Posted Monday, March 31, 2014 9:55 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, March 31, 2014 9:53 AM
Points: 2, Visits: 2
Thanks for the reply, I appreciate your time.
Post #1556564
Posted Monday, March 31, 2014 9:57 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 11:09 AM
Points: 12,881, Visits: 31,819
have you considered adding a full text index on the pdfs themselves? you'd obviously need to modify HOW you search after you have that in place.

first example i found when searching for SQL server full text index pdfs

http://stackoverflow.com/questions/7690921/sql-server-pdf-full-text-search-not-working-on-filestream-pdf-file


Lowell

--There is no spoon, and there's no default ORDER BY in sql server either.
Actually, Common Sense is so rare, it should be considered a Superpower. --my son
Post #1556566
Posted Wednesday, April 30, 2014 12:33 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, May 6, 2014 9:48 PM
Points: 2, Visits: 13
JK 80940 (3/31/2014)
Hi Guys,

I have a sql server 2005 table that stores a list of small
created pdf articles, there are over 1900 of them, I have a Title, Author and Location field which stores a link to the file, I can search for a keyword in the Title and Author field but want to be able to search the content of the pdf itself. Can I bulk insert the pdfs into the database and make that field searchable, or would that be too slow?

Thanks for any suggestions you may have.



Check whether following three links can offer some useful information.

http://www.ehow.com/how_7447329_store-pdf-files-database.html

http://www.rootschat.com/forum/index.php?topic=606461.0

http://stackoverflow.com/questions/10854858/best-practices-for-searchable-archive-of-thousands-of-documents-pdf-and-or-xml
Post #1566224
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse