SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Blob data handling - best option?


Blob data handling - best option?

Author
Message
kavidha
kavidha
Forum Newbie
Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)

Group: General Forum Members
Points: 5 Visits: 17
Hi

I am a programmer and need to do the following.

Currently we are using SQL Server 2000 and we are handling lots of resume/cv in .DOC and PDF formats. we use some 3rd party tools to extracts the contents of the file and store it in NTEXT column which is full-text indexed.

Now we are planning to go with SQL Server 2012 Standard and i would like to get some advice on this scenario.


1. I dont want to use the text extracter (3rd party tools). Should i store the files as single BLOB and full-text index on it. This is main point as we search these very often to find the best matched candidate for a job requirement.

Are is there any other best method to handle this situation?

Thanks.



TheSQLGuru
TheSQLGuru
SSChampion
SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)

Group: General Forum Members
Points: 12333 Visits: 8546
kavidha (3/20/2013)
Hi

I am a programmer and need to do the following.

Currently we are using SQL Server 2000 and we are handling lots of resume/cv in .DOC and PDF formats. we use some 3rd party tools to extracts the contents of the file and store it in NTEXT column which is full-text indexed.

Now we are planning to go with SQL Server 2012 Standard and i would like to get some advice on this scenario.


1. I dont want to use the text extracter (3rd party tools). Should i store the files as single BLOB and full-text index on it. This is main point as we search these very often to find the best matched candidate for a job requirement.

Are is there any other best method to handle this situation?

Thanks.



1) are you sure you need Ntext? Almost every time I see this at clients there is no need for it. Unless you are or plan to store languages that require 2-byte characters, save 50% of the storage cost by choosing an non-Unicode datatype.

2) I would look into a purpose-built 3rd party device for text indexing/searching. They can provide MUCH more performance and importantly for you flexibility/power in your search capabilities.

3) Having said that, SQL 2012 FTS is really very good. There is also the new Semantic Search capabilities you probably should investigate. There are also various iFilters you can use to enable indexing various document types such as pdf, doc, etc.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
kavidha
kavidha
Forum Newbie
Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)

Group: General Forum Members
Points: 5 Visits: 17
Hi Kevin,

Thanks a lot for your reply.

1. No i dont want to use NText...

2. I am not sure about the Single BLOB...so for my scenario, can i use it? is it better solution?


Thanks.



Roshan Jospeh
Roshan Jospeh
SSCommitted
SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)

Group: General Forum Members
Points: 1605 Visits: 2087
Filestream is the most efficient way to store unstructured data, and is available since 2008
You need to enable filestream for the database and add a filestream filegroup.
Alter the table column to varbinary.

See the below links


http://msdn.microsoft.com/en-us/library/gg471497.aspx
http://msdn.microsoft.com/en-us/library/cc645923.aspx
http://msdn.microsoft.com/en-us/library/cc645585.aspx

----------------------------------------------------------------------------------------------------------------------------------------------------
Roshan Joe

Jeff Moden -Forum Etiquette: How to post data/code on a forum to get the best help
TheSQLGuru
TheSQLGuru
SSChampion
SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)

Group: General Forum Members
Points: 12333 Visits: 8546
kavidha (3/20/2013)
Hi Kevin,

Thanks a lot for your reply.

1. No i dont want to use NText...

2. I am not sure about the Single BLOB...so for my scenario, can i use it? is it better solution?


Thanks.


Yes, you can store SOME types of files as blobs (either filestream or not) and then have Full Text Indexing shred them. Some file types will not be FTS-aware. You may wish to consult with an FTS expert to see what is best for your needs. Personally I would set up tests of several different mechanisms and see which works/works best for YOUR needs.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
kavidha
kavidha
Forum Newbie
Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)

Group: General Forum Members
Points: 5 Visits: 17
Thanks for you suggestion Roshan Joe.

But I heard that filestream is at its best if the size is over 1gb...Unsure



kavidha
kavidha
Forum Newbie
Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)

Group: General Forum Members
Points: 5 Visits: 17
Thanks Kevin. Will try to execute with different mechanism. Thanks once again.



TheSQLGuru
TheSQLGuru
SSChampion
SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)

Group: General Forum Members
Points: 12333 Visits: 8546
kavidha (3/22/2013)
Thanks for you suggestion Roshan Joe.

But I heard that filestream is at its best if the size is over 1gb...Unsure


http://www.sqlskills.com/blogs/paul/sql-server-2008-filestream-performance/
http://www.sqlskills.com/blogs/paul/high-performance-filestream-tips-and-tricks/

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
Steve Jones
Steve Jones
SSC Guru
SSC Guru (62K reputation)SSC Guru (62K reputation)SSC Guru (62K reputation)SSC Guru (62K reputation)SSC Guru (62K reputation)SSC Guru (62K reputation)SSC Guru (62K reputation)SSC Guru (62K reputation)

Group: Administrators
Points: 62104 Visits: 19101
If you are going to 2012, think about FileTable. In some ways the development is easier than Filestream.

The filters for blobs are based on the iFilter standard. You can purchase third party filters for some types, or that work better. The Office filters work pretty well, but if you are Office 2010 or later, you'll need to download and install them separately. The native iFilters for FTS are 2007 based.

For file sizes, in general, if you are 256kb or smaller, use varbinary and stick them in there. If your files are 1MB or larger, go Filestream/Filetable. In between, test, but if you don't have lots of updates (mostly inserts), I'd just do Filestream/Filetable and not worry about it.

I wouldn't look at Semantic Search in 2012. It is a neat idea, but it only supports uni-grams (single terms) and isn't terribly useful, IMHO. FTS searches work better. If they get to n-grams in 20xx, then I'd revisit this. With that in mind, separate and contain your search logic so you can replace it if things improve or you want to change things. The NEAR and proximity searches are improved in 2012. You also get property searches, so the "tags" and metadata you see in the file properties or explorer windows can be searched in 2012. That's pretty cool stuff if your sources use it.

Follow me on Twitter: @way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com
kavidha
kavidha
Forum Newbie
Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)

Group: General Forum Members
Points: 5 Visits: 17
Thank you Steve.



Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search