Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Key Word Searches


Key Word Searches

Author
Message
antonio.collins
antonio.collins
SSC-Addicted
SSC-Addicted (454 reputation)SSC-Addicted (454 reputation)SSC-Addicted (454 reputation)SSC-Addicted (454 reputation)SSC-Addicted (454 reputation)SSC-Addicted (454 reputation)SSC-Addicted (454 reputation)SSC-Addicted (454 reputation)

Group: General Forum Members
Points: 454 Visits: 921
quickdraw (3/1/2011)
@ Antonio,

Just so I can file this away in my brain for later:

What is your max sustained throughput (total users and max queries per minute)? How many CPUs?

Thanks.


trying to track just full text queries is tough for me since our app performs a lot of caching. i'll see if a dba can expose something via tracing. however, we typically have 150+ users on at any given time and searching is a common task.
KermitTheRock
KermitTheRock
SSC Rookie
SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)

Group: General Forum Members
Points: 31 Visits: 100
Sharepoint Search service uses SQL Server FTS. Based on results listed

http://technet.microsoft.com/en-us/library/cc262574(office.12).aspx

you should be able to do 20 requests per second on a 15 million item library with one server.


20 RPS is quite a bit, even for a public website. If your getting that many, you probably got a datacenter or two and a team of DBAs
Mark Hill-285393
Mark Hill-285393
Forum Newbie
Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)

Group: General Forum Members
Points: 7 Visits: 28
While the discussion has been about comparing this technique to words and text, it is also a way to handle leading wildcard searches on part numbers. I have done something like this by building an index of all reasonable suffix substrings of a part number, clipping off one leading character at a time until there's nothing "significant" left. Then if a user wants to search for, say, '%74%138%', I clip off the leading wild characters and do a LIKE on the suffix index. (I could further optimize by guessing that '138%' would be a shorter scan than '74%', and applying the original '%74%138%' pattern to the original part number.) I quickly get all the matches like 'MC74AC138N', 'SN74LS138D', etc. I don't imagine that FTS would be applicable to this problem.

While it sounds like I'd be multiplying the size of my index by the number of characters in a part number, it's really not so bad. Short part numbers have fewer suffix substrings, and you can add some restrictions if you disallow putting a wildcard character in the middle of certain sequences, like a series of digits. For about 125,000 part numbers I have less than 600,000 suffixes to part number mappings.
quickdraw
quickdraw
Grasshopper
Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)

Group: General Forum Members
Points: 24 Visits: 107
We loaded something like 150 million records from Library of Congress and had a co-located 2 post rack running a quad processor something-or-other with a RAID 5 setup on Oracle 8i. Users ranged into the thousands online at once doing queries. RPS peaked over 50 per second if memory serves (this was 1999).

We filtered stop words, and plurals, but not misspellings nor synonyms.

Using materialized views is very smart. Be aware of disk space with lots of records. I'm on a project now that has over a billion records in one table. We have to plan very carefully for index growth and maintenance (rebuilds). Adding a materialized view that copies and concatenates text columns (and then builds FTS indexes on that, which might be in addition to FTS indexes on individual columns) could take up considerable space.

For 15 million records with an average row length of 65 characters you're talking close to a gigabyte of space for the materialized view. Not a problem for an in-house server, but for a hosted server or a cloud database, that could cost you actual coin per month.

My point there isn't to talk you out of materialized views. I love that idea. Just look at the whole picture when designing things, and plan for growth/scale.
cliffb
cliffb
Say Hey Kid
Say Hey Kid (681 reputation)Say Hey Kid (681 reputation)Say Hey Kid (681 reputation)Say Hey Kid (681 reputation)Say Hey Kid (681 reputation)Say Hey Kid (681 reputation)Say Hey Kid (681 reputation)Say Hey Kid (681 reputation)

Group: General Forum Members
Points: 681 Visits: 438
I liked the article. A new technique to add to my toolset. I did play around with it to remove the Temp Tables and at least on my dev server, there was quite a performance increase.


DECLARE @Keywords VARCHAR(8000)
SET @Keywords = 'The Red Rider Carbon Action 200 shot range model air rifle' ;

SELECT
DISTINCT
P.ProductID
,P.Name
,P.Color
,P.Size
,P.ListPrice
FROM
-- convert the keywords to a table. Replaces #KW
dbo.DelimitedSplit8KNEW(REPLACE(REPLACE(@Keywords, '/', ' '), ',',
''), ' ') searchterms
LEFT JOIN dbo.KeyWordsToExclude xcl
ON xcl.KeyWord = searchterms.ItemValue
INNER JOIN dbo.ProductKeyWordItems prodkeys
ON [prodkeys].[KeyWord] = searchterms.ItemValue
INNER JOIN [Production].[Product] P
ON P.[ProductID] = prodkeys.[ProductID]
WHERE
xcl.KeyWord IS NULL -- include only records where the keyword search hits on a non-excluded word.



Here are the stats:

Table 'KeyWordsToExclude'. Scan count 1, logical reads 79, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'ProductKeyWordItems'. Scan count 11, logical reads 24, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Product'. Scan count 1, logical reads 15, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 9 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.



Mauve
Mauve
Ten Centuries
Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)

Group: General Forum Members
Points: 1246 Visits: 2049
There is a lot of reinvention of the wheel. I've worked with full-text in both Oracle (9i, 10g) and SQL Server (2000, 2005, 2008) as well as external, i.e., not-integrated, full-text systems such as AltaVista, etc.

Now that SQL Server, in 2008, has an integrated full-text engine similar to what Oracle had in 9i (circa 2000), full-text in SQL Server is pretty robust. However, it still lacks a lot of Oracle's functionality.

We use it for plain text, HTML, and binary (Microsoft Office files, PDFs, etc.).

Eliminate the Stop (noise) word filtering as it is so 90's. I.e., index everything.

As for performance and capacity, just look at what Iron Mountain is doing.


(PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.
belgarion
belgarion
Valued Member
Valued Member (65 reputation)Valued Member (65 reputation)Valued Member (65 reputation)Valued Member (65 reputation)Valued Member (65 reputation)Valued Member (65 reputation)Valued Member (65 reputation)Valued Member (65 reputation)

Group: General Forum Members
Points: 65 Visits: 249
Another technique to try ...

On tables where you have columns that you'd otherwise build a FTS index, instead create a trigger that populates a single table where you string together exactly what you want. I.e. if searching has a set of filter criteria as well as the text, add in 'keywords' for the filters.

E.g. you want all people whose name are like 'Fred Smith' that live in California and were born in 1969. You'd create a FTS string like 'Fred Smith zClfrnz zDOB19690501' where zClfrnz and zDOB19690501 are the keywords created by the trigger from attributes that are probably already in memory.

When you search your FTS contains clause looks like this: 'Fred AND Smith AND zClfrnz AND zDOB1969*'. This is blisteringly fast.

Obviously, this technique falls down when a range search is also required. But not anything like as bad as you'd think. Say you want to search on 'Fred Smith's in California whose age is between 45 and 50. You'd add a column to your new FTS table that inlcuded the DOB, possibly with a specific index or a covering index that has some of the most common range value searches if they are commonly used in conjunction. Another approach where the number of items in a rang is small is simply to string together the range of interest, e.g. 'Fred AND Smith AND zClfrnz AND (zDOB1967* or zDOB1968* or zDOB1969* or zDOB1970* or zDOB1971*)'

Another thing that works well is 'joining' FTS searches. In this situation you still have just one FTS table but have triggers on multiple entity tables populating it. I.e. you want to find all sales people whose name is 'Joe Patel' who work in California and who are servicing a 'Fred Smith'. The salesPerson table creates a 'Joe Patel zslsprsnz zClfrnz' and the fred smith row is extended to include this relationship, i.e. red Smith zClfrnz zDOB19690501 zslsprsnzJoeSmith'.

At any time to want to add in new search keywords or searchable attributes, you run a dummy update on the source table and the trigger regenerates the search string value. Note you need to get a bit funky finding the existing search value if you're doing updates rather than deletes and inserts.

A 2 vCPU, 8GB SQL Server was easily handling 100 AJAX trns per second over a DB with 750k rows in the search table. Two different search application types too. One for building products and one for real estate.

The above won't work for all situations but if your searchable data is in database columns it generally works a treat as you can combine the best of relational DB with the best of FTS.
Thomas Keller
Thomas Keller
Valued Member
Valued Member (52 reputation)Valued Member (52 reputation)Valued Member (52 reputation)Valued Member (52 reputation)Valued Member (52 reputation)Valued Member (52 reputation)Valued Member (52 reputation)Valued Member (52 reputation)

Group: General Forum Members
Points: 52 Visits: 158
One reason for doing this instead of built-in Full-Text Search, is that once you have isolated your keywords, you can suggest alternatives when a user has misspelled one. See my article on fuzzy-string matching. (I linked this article from the discussion on mine.)
duvvit
duvvit
Forum Newbie
Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)

Group: General Forum Members
Points: 8 Visits: 89
Doesn't accommodate for wildcards though. It is only parsing and matching FULL words.

He% and Nu% would not work.
If putting a LIKE in-between KWI.KW and #KW.Keyword, you may think it works, but it is not matching BOTH keywords, only one of the above.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search