|
|
|
Forum Newbie
      
Group: General Forum Members
Last Login: Monday, March 25, 2013 4:09 AM
Points: 5,
Visits: 17
|
|
Hi
I am a programmer and need to do the following.
Currently we are using SQL Server 2000 and we are handling lots of resume/cv in .DOC and PDF formats. we use some 3rd party tools to extracts the contents of the file and store it in NTEXT column which is full-text indexed.
Now we are planning to go with SQL Server 2012 Standard and i would like to get some advice on this scenario.
1. I dont want to use the text extracter (3rd party tools). Should i store the files as single BLOB and full-text index on it. This is main point as we search these very often to find the best matched candidate for a job requirement.
Are is there any other best method to handle this situation?
Thanks.
|
|
|
|
|
Hall of Fame
       
Group: General Forum Members
Last Login: Today @ 9:01 PM
Points: 3,572,
Visits: 5,108
|
|
kavidha (3/20/2013) Hi
I am a programmer and need to do the following.
Currently we are using SQL Server 2000 and we are handling lots of resume/cv in .DOC and PDF formats. we use some 3rd party tools to extracts the contents of the file and store it in NTEXT column which is full-text indexed.
Now we are planning to go with SQL Server 2012 Standard and i would like to get some advice on this scenario.
1. I dont want to use the text extracter (3rd party tools). Should i store the files as single BLOB and full-text index on it. This is main point as we search these very often to find the best matched candidate for a job requirement.
Are is there any other best method to handle this situation?
Thanks.
1) are you sure you need Ntext? Almost every time I see this at clients there is no need for it. Unless you are or plan to store languages that require 2-byte characters, save 50% of the storage cost by choosing an non-Unicode datatype.
2) I would look into a purpose-built 3rd party device for text indexing/searching. They can provide MUCH more performance and importantly for you flexibility/power in your search capabilities.
3) Having said that, SQL 2012 FTS is really very good. There is also the new Semantic Search capabilities you probably should investigate. There are also various iFilters you can use to enable indexing various document types such as pdf, doc, etc.
Best,
Kevin G. Boles SQL Server Consultant SQL MVP 2007-2012 TheSQLGuru at GMail
|
|
|
|
|
Forum Newbie
      
Group: General Forum Members
Last Login: Monday, March 25, 2013 4:09 AM
Points: 5,
Visits: 17
|
|
Hi Kevin,
Thanks a lot for your reply.
1. No i dont want to use NText...
2. I am not sure about the Single BLOB...so for my scenario, can i use it? is it better solution?
Thanks.
|
|
|
|
|
Mr or Mrs. 500
      
Group: General Forum Members
Last Login: Thursday, May 16, 2013 3:45 AM
Points: 572,
Visits: 1,157
|
|
|
|
|
|
Hall of Fame
       
Group: General Forum Members
Last Login: Today @ 9:01 PM
Points: 3,572,
Visits: 5,108
|
|
kavidha (3/20/2013) Hi Kevin,
Thanks a lot for your reply.
1. No i dont want to use NText...
2. I am not sure about the Single BLOB...so for my scenario, can i use it? is it better solution?
Thanks.
Yes, you can store SOME types of files as blobs (either filestream or not) and then have Full Text Indexing shred them. Some file types will not be FTS-aware. You may wish to consult with an FTS expert to see what is best for your needs. Personally I would set up tests of several different mechanisms and see which works/works best for YOUR needs.
Best,
Kevin G. Boles SQL Server Consultant SQL MVP 2007-2012 TheSQLGuru at GMail
|
|
|
|
|
Forum Newbie
      
Group: General Forum Members
Last Login: Monday, March 25, 2013 4:09 AM
Points: 5,
Visits: 17
|
|
Thanks for you suggestion Roshan Joe.
But I heard that filestream is at its best if the size is over 1gb...
|
|
|
|
|
Forum Newbie
      
Group: General Forum Members
Last Login: Monday, March 25, 2013 4:09 AM
Points: 5,
Visits: 17
|
|
Thanks Kevin. Will try to execute with different mechanism. Thanks once again.
|
|
|
|
|
Hall of Fame
       
Group: General Forum Members
Last Login: Today @ 9:01 PM
Points: 3,572,
Visits: 5,108
|
|
kavidha (3/22/2013)
Thanks for you suggestion Roshan Joe. But I heard that filestream is at its best if the size is over 1gb... 
http://www.sqlskills.com/blogs/paul/sql-server-2008-filestream-performance/ http://www.sqlskills.com/blogs/paul/high-performance-filestream-tips-and-tricks/
Best,
Kevin G. Boles SQL Server Consultant SQL MVP 2007-2012 TheSQLGuru at GMail
|
|
|
|
|
SSC-Dedicated
           
Group: Administrators
Last Login: Today @ 9:03 PM
Points: 31,406,
Visits: 13,723
|
|
If you are going to 2012, think about FileTable. In some ways the development is easier than Filestream.
The filters for blobs are based on the iFilter standard. You can purchase third party filters for some types, or that work better. The Office filters work pretty well, but if you are Office 2010 or later, you'll need to download and install them separately. The native iFilters for FTS are 2007 based.
For file sizes, in general, if you are 256kb or smaller, use varbinary and stick them in there. If your files are 1MB or larger, go Filestream/Filetable. In between, test, but if you don't have lots of updates (mostly inserts), I'd just do Filestream/Filetable and not worry about it.
I wouldn't look at Semantic Search in 2012. It is a neat idea, but it only supports uni-grams (single terms) and isn't terribly useful, IMHO. FTS searches work better. If they get to n-grams in 20xx, then I'd revisit this. With that in mind, separate and contain your search logic so you can replace it if things improve or you want to change things. The NEAR and proximity searches are improved in 2012. You also get property searches, so the "tags" and metadata you see in the file properties or explorer windows can be searched in 2012. That's pretty cool stuff if your sources use it.
Follow me on Twitter: @way0utwest
 Forum Etiquette: How to post data/code on a forum to get the best help
|
|
|
|
|
Forum Newbie
      
Group: General Forum Members
Last Login: Monday, March 25, 2013 4:09 AM
Points: 5,
Visits: 17
|
|
|
|
|