Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««1234»»»

Key Word Searches Expand / Collapse
Author
Message
Posted Monday, February 28, 2011 8:11 AM


Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Tuesday, July 30, 2013 8:55 AM
Points: 21, Visits: 96
Actually, I wrote of this just a week ago. The Contains function is often not sufficient for end user search. It doesn't handle misspellings, suggested search, implied "AND", and crashes on use of a noise word.

FTS is not simply an inverted index of words, but is dependent on the query pipelining that happens before the index is accessed.



Post #1070551
Posted Monday, February 28, 2011 8:43 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Wednesday, September 26, 2012 8:25 AM
Points: 11, Visits: 50
KermitTheRock (2/28/2011)
Actually, I wrote of this just a week ago. The Contains function is often not sufficient for end user search. It doesn't handle misspellings, suggested search, implied "AND", and crashes on use of a noise word.

FTS is not simply an inverted index of words, but is dependent on the query pipelining that happens before the index is accessed.





Hmmm...well I've worked some time on getting it to work reasonably well for a somewhat specialized database with lots of weird product names like "M&Ms". Not sure I understood what the consensus was on the previous thread, that "only google has the resources to do it", or "we only will do it when we're paid to do it"?
Post #1070594
Posted Monday, February 28, 2011 8:56 AM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Thursday, July 17, 2014 12:34 PM
Points: 1,414, Visits: 4,539
thx

going to have to play with this. i have a huge database on a SQL 2005 server with full text indexing enabled and it can really be a PITA a lot of times. sometimes simple searches are extremely slow for some reason and inserting large amounts of data into a FT enabled table is a major FAIL at least with 2005. i have to disable FT, run my inserts and then enable it again. and maintaining the FTI is also a PITA as i've had a few corruptions due to the backup running during the FT maintenance times


https://plus.google.com/100125998302068852885/posts?hl=en
http://twitter.com/alent1234
x-box live gamertag: i am null
[url=http://live.xbox.com/en-US/MyXbox/Profile[/url]
Post #1070603
Posted Monday, February 28, 2011 9:10 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Wednesday, September 26, 2012 8:25 AM
Points: 11, Visits: 50
alen teplitsky (2/28/2011)
thx

going to have to play with this. i have a huge database on a SQL 2005 server with full text indexing enabled and it can really be a PITA a lot of times. sometimes simple searches are extremely slow for some reason and inserting large amounts of data into a FT enabled table is a major FAIL at least with 2005. i have to disable FT, run my inserts and then enable it again. and maintaining the FTI is also a PITA as i've had a few corruptions due to the backup running during the FT maintenance times


Just curious, how big is "huge"? I haven't had any of the above problems on 2005/8.
Post #1070610
Posted Monday, February 28, 2011 9:16 AM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Thursday, July 17, 2014 12:34 PM
Points: 1,414, Visits: 4,539
about 400GB or so for 30 days of data

every day i have vb scripts dump security and application log data from domain controllers, sql and other servers into a central database. there are three tables with an average of 100 million rows for each one. i tried having one table per server but it was a major PITA with joins so i use one table for DC's, one for SQL servers and one for everything else. every morning there are SSRS reports that get fired off to people with app log errors, locked accounts, any data about anyone doing any kind of account manipulation in AD, etc.



https://plus.google.com/100125998302068852885/posts?hl=en
http://twitter.com/alent1234
x-box live gamertag: i am null
[url=http://live.xbox.com/en-US/MyXbox/Profile[/url]
Post #1070617
Posted Monday, February 28, 2011 11:16 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Wednesday, March 12, 2014 2:56 PM
Points: 12, Visits: 107
#1 This will be faster than full text search for large loads (hundreds of queries per minute) on a db with millions of records/documents to search.

I did this exact thing in Oracle in 1999. We loaded the entire database of U.S. - Books In Print from the Library of Congress and built a search engine on that for a website and a cash register slash inventory management system for small book sellers.

Full text search was too slow for a high load environment, and I'd bet money on the fact that SQL Server is the same. You just can't push hundreds of full text searches per minute with a high number of documents. The method here will be faster.

------------------------------------------------
#2 This method doesn't allow wildcards

We dynamically built a SQL query for each keyword. This allows for the wildcards, and for special processing to allow wildcard suffixes. (See #3)

------------------------------------------------
#3 By reversing the kewords in the keywords table, you get a new feature no database has (without a full table scan)

One additional thing I did is to reverse all the keywords in the keywords table. Since Wildcard LIKE searches were allowed on the keywords table (like in full text search), having each keyword spelled backwards allowed a wonderful feature:

Search keywords: "hold her %ly"
[search for cheesy love novels: "hold her gently", "hold her gingerly", "hold her tightly"]

Normally for a wildcard search to use an index, the wildcard can't be at the begining (ex: giv%, matching give, giving, giver), otherwise a full table scan results on the keyword table. With a keyword table containing hundreds of thousands of rows that slows things down considerably. Reversing the keywords in the keyword table allows the keyword parser to detect wildcards and construct a where clause that includes the keyword reversed:

select <cols> from document where docid in (
select docid from dockeyword where keywordid IN (SELECT keywordid from keyword where keyword LIKE 'hold')
union
select docid from dockeyword where keywordid IN (SELECT keywordid from keyword where keyword LIKE 'her')
union
select docid from dockeyword where keywordid IN (SELECT keywordid from keyword where reversekeyword LIKE 'yl%' --note keyword spelled backwards)
)

There are number of different ways to do the query above. I just presented the easiest to grok. One other way that is interesting is to do a group by:

select <cols> from document d where docid in (
select docid from dockeyword dkw
inner join keyword kw on dkw.keywordid = kw.keywordid
where keyword = 'hold' or keyword = 'her' or reversekeyword like 'yl%'
group by docid
having count(docid) = 3
select










Post #1070698
Posted Monday, February 28, 2011 11:28 AM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Thursday, June 19, 2014 6:44 PM
Points: 959, Visits: 2,884
Thanks everyone for your comments. I should have mentioned in the article about Full Text searches in SQL Server.

As one poster mentioned, it's kind of a pain in the neck to start with. It's a can of worms that doesn't necessarily need to be opened for just one simple application where you know exactly the type of user input you're going to get - i.e. very structured user input. In 12 years of programming SQL Server I have only had 1 client that even had Full Text enabled. This auto parts dealer is really huge and the didn't have it.

This technique certainly won't handle every kind of search and wouldn't be appropriate for many applications. For one, if the data has a lot of noise punctuation and you don't really know what all is there so you can't predict it.

As I mentioned in the article, this is for item/product type descriptions. This would be in a very controlled environment where most everything is known about the noise words ahead of time.

In any case, I'm grateful for the discussion.
Todd Fifield
Post #1070706
Posted Monday, February 28, 2011 11:33 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Wednesday, September 26, 2012 8:25 AM
Points: 11, Visits: 50
tfifield (2/28/2011)
Thanks everyone for your comments. I should have mentioned in the article about Full Text searches in SQL Server.

As one poster mentioned, it's kind of a pain in the neck to start with. It's a can of worms that doesn't necessarily need to be opened for just one simple application where you know exactly the type of user input you're going to get - i.e. very structured user input. In 12 years of programming SQL Server I have only had 1 client that even had Full Text enabled. This auto parts dealer is really huge and the didn't have it.

It certainly won't handle every kind of search and wouldn't be appropriate for many applications. For one, if the data has a lot of noise punctuation and you don't really know what all is there so you can't predict it.

As I mentioned in the article, this is for item/product type descriptions. This would be in a very controlled environment where most everything is known about the noise words ahead of time.

In any case, I'm grateful for the discussion.
Todd Fifield


I'm still interested to know how you're handling plurals, tradenames, abbreviations, etc.?
Post #1070712
Posted Monday, February 28, 2011 11:34 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 6:51 AM
Points: 13,252, Visits: 10,135
quickdraw (2/28/2011)

#3 By reversing the kewords in the keywords table, you get a new feature no database has (without a full table scan)

One additional thing I did is to reverse all the keywords in the keywords table. Since Wildcard LIKE searches were allowed on the keywords table (like in full text search), having each keyword spelled backwards allowed a wonderful feature:

...

Normally for a wildcard search to use an index, the wildcard can't be at the begining (ex: giv%, matching give, giving, giver), otherwise a full table scan results on the keyword table. With a keyword table containing hundreds of thousands of rows that slows things down considerably. Reversing the keywords in the keyword table allows the keyword parser to detect wildcards and construct a where clause that includes the keyword reversed:

select <cols> from document where docid in (
select docid from dockeyword where keywordid IN (SELECT keywordid from keyword where keyword LIKE 'hold')
union
select docid from dockeyword where keywordid IN (SELECT keywordid from keyword where keyword LIKE 'her')
union
select docid from dockeyword where keywordid IN (SELECT keywordid from keyword where reversekeyword LIKE 'yl%' --note keyword spelled backwards)
)



That is a great suggestion, reversing keywords. Nice.




How to post forum questions.
Need an answer? No, you need a question.
What’s the deal with Excel & SSIS?

Member of LinkedIn. My blog at LessThanDot.

MCSA SQL Server 2012 - MCSE Business Intelligence
Post #1070715
Posted Monday, February 28, 2011 11:41 AM
SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Wednesday, May 28, 2014 1:35 PM
Points: 1,635, Visits: 1,970
The product I support actually works somewhat similar to the way the article says to do it. One of the major difference is that the list of excluded words is coded into the proc, not stored in a table. Another is that we do a sequential check on the keywords. IE - Get a list of items with keyword 1 then compare to a list of items with keyword2 etc. It'll be interesting to see what kind of performance bump happens from doing the check in a set based manner instead of procedurally. We do have a benefit in that the list of terms is delivered to us on a quarterly basis so we get updates already split out so no work is required for us to do that.
Post #1070720
« Prev Topic | Next Topic »

Add to briefcase ««1234»»»

Permissions Expand / Collapse