Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Blob data handling - best option? Expand / Collapse
Author
Message
Posted Wednesday, March 20, 2013 2:31 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, March 25, 2013 4:09 AM
Points: 5, Visits: 17
Hi

I am a programmer and need to do the following.

Currently we are using SQL Server 2000 and we are handling lots of resume/cv in .DOC and PDF formats. we use some 3rd party tools to extracts the contents of the file and store it in NTEXT column which is full-text indexed.

Now we are planning to go with SQL Server 2012 Standard and i would like to get some advice on this scenario.


1. I dont want to use the text extracter (3rd party tools). Should i store the files as single BLOB and full-text index on it. This is main point as we search these very often to find the best matched candidate for a job requirement.

Are is there any other best method to handle this situation?

Thanks.




Post #1433059
Posted Wednesday, March 20, 2013 7:38 AM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Yesterday @ 12:11 PM
Points: 4,343, Visits: 6,150
kavidha (3/20/2013)
Hi

I am a programmer and need to do the following.

Currently we are using SQL Server 2000 and we are handling lots of resume/cv in .DOC and PDF formats. we use some 3rd party tools to extracts the contents of the file and store it in NTEXT column which is full-text indexed.

Now we are planning to go with SQL Server 2012 Standard and i would like to get some advice on this scenario.


1. I dont want to use the text extracter (3rd party tools). Should i store the files as single BLOB and full-text index on it. This is main point as we search these very often to find the best matched candidate for a job requirement.

Are is there any other best method to handle this situation?

Thanks.



1) are you sure you need Ntext? Almost every time I see this at clients there is no need for it. Unless you are or plan to store languages that require 2-byte characters, save 50% of the storage cost by choosing an non-Unicode datatype.

2) I would look into a purpose-built 3rd party device for text indexing/searching. They can provide MUCH more performance and importantly for you flexibility/power in your search capabilities.

3) Having said that, SQL 2012 FTS is really very good. There is also the new Semantic Search capabilities you probably should investigate. There are also various iFilters you can use to enable indexing various document types such as pdf, doc, etc.



Best,

Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru at GMail
Post #1433207
Posted Wednesday, March 20, 2013 8:44 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, March 25, 2013 4:09 AM
Points: 5, Visits: 17
Hi Kevin,

Thanks a lot for your reply.

1. No i dont want to use NText...

2. I am not sure about the Single BLOB...so for my scenario, can i use it? is it better solution?


Thanks.



Post #1433564
Posted Wednesday, March 20, 2013 9:00 PM


SSChasing Mays

SSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing Mays

Group: General Forum Members
Last Login: Tuesday, August 19, 2014 3:08 AM
Points: 617, Visits: 1,273
Filestream is the most efficient way to store unstructured data, and is available since 2008
You need to enable filestream for the database and add a filestream filegroup.
Alter the table column to varbinary.

See the below links


http://msdn.microsoft.com/en-us/library/gg471497.aspx
http://msdn.microsoft.com/en-us/library/cc645923.aspx
http://msdn.microsoft.com/en-us/library/cc645585.aspx


-- Roshan Joe
*******************************************
Jeff Moden -Forum Etiquette: How to post data/code on a forum to get the best help
Custom cleanup script for backups
Post #1433566
Posted Thursday, March 21, 2013 8:37 AM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Yesterday @ 12:11 PM
Points: 4,343, Visits: 6,150
kavidha (3/20/2013)
Hi Kevin,

Thanks a lot for your reply.

1. No i dont want to use NText...

2. I am not sure about the Single BLOB...so for my scenario, can i use it? is it better solution?


Thanks.


Yes, you can store SOME types of files as blobs (either filestream or not) and then have Full Text Indexing shred them. Some file types will not be FTS-aware. You may wish to consult with an FTS expert to see what is best for your needs. Personally I would set up tests of several different mechanisms and see which works/works best for YOUR needs.


Best,

Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru at GMail
Post #1433822
Posted Friday, March 22, 2013 3:43 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, March 25, 2013 4:09 AM
Points: 5, Visits: 17
Thanks for you suggestion Roshan Joe.

But I heard that filestream is at its best if the size is over 1gb...



Post #1434170
Posted Friday, March 22, 2013 3:44 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, March 25, 2013 4:09 AM
Points: 5, Visits: 17
Thanks Kevin. Will try to execute with different mechanism. Thanks once again.


Post #1434171
Posted Friday, March 22, 2013 8:27 AM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Yesterday @ 12:11 PM
Points: 4,343, Visits: 6,150
kavidha (3/22/2013)
Thanks for you suggestion Roshan Joe.

But I heard that filestream is at its best if the size is over 1gb...


http://www.sqlskills.com/blogs/paul/sql-server-2008-filestream-performance/
http://www.sqlskills.com/blogs/paul/high-performance-filestream-tips-and-tricks/


Best,

Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru at GMail
Post #1434299
Posted Friday, March 22, 2013 10:13 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: Administrators
Last Login: Today @ 10:40 AM
Points: 33,169, Visits: 15,303
If you are going to 2012, think about FileTable. In some ways the development is easier than Filestream.

The filters for blobs are based on the iFilter standard. You can purchase third party filters for some types, or that work better. The Office filters work pretty well, but if you are Office 2010 or later, you'll need to download and install them separately. The native iFilters for FTS are 2007 based.

For file sizes, in general, if you are 256kb or smaller, use varbinary and stick them in there. If your files are 1MB or larger, go Filestream/Filetable. In between, test, but if you don't have lots of updates (mostly inserts), I'd just do Filestream/Filetable and not worry about it.

I wouldn't look at Semantic Search in 2012. It is a neat idea, but it only supports uni-grams (single terms) and isn't terribly useful, IMHO. FTS searches work better. If they get to n-grams in 20xx, then I'd revisit this. With that in mind, separate and contain your search logic so you can replace it if things improve or you want to change things. The NEAR and proximity searches are improved in 2012. You also get property searches, so the "tags" and metadata you see in the file properties or explorer windows can be searched in 2012. That's pretty cool stuff if your sources use it.







Follow me on Twitter: @way0utwest

Forum Etiquette: How to post data/code on a forum to get the best help
Post #1434373
Posted Monday, March 25, 2013 4:11 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, March 25, 2013 4:09 AM
Points: 5, Visits: 17
Thank you Steve.


Post #1434805
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse