Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««1234

Key Word Searches Expand / Collapse
Author
Message
Posted Wednesday, March 9, 2011 1:31 PM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Wednesday, October 8, 2014 8:52 AM
Points: 438, Visits: 909
quickdraw (3/1/2011)
@ Antonio,

Just so I can file this away in my brain for later:

What is your max sustained throughput (total users and max queries per minute)? How many CPUs?

Thanks.


trying to track just full text queries is tough for me since our app performs a lot of caching. i'll see if a dba can expose something via tracing. however, we typically have 150+ users on at any given time and searching is a common task.
Post #1075873
Posted Wednesday, March 9, 2011 2:18 PM


Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Tuesday, July 30, 2013 8:55 AM
Points: 21, Visits: 96
Sharepoint Search service uses SQL Server FTS. Based on results listed

http://technet.microsoft.com/en-us/library/cc262574(office.12).aspx

you should be able to do 20 requests per second on a 15 million item library with one server.


20 RPS is quite a bit, even for a public website. If your getting that many, you probably got a datacenter or two and a team of DBAs

Post #1075896
Posted Friday, September 7, 2012 7:29 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, October 29, 2013 12:12 PM
Points: 5, Visits: 28
While the discussion has been about comparing this technique to words and text, it is also a way to handle leading wildcard searches on part numbers. I have done something like this by building an index of all reasonable suffix substrings of a part number, clipping off one leading character at a time until there's nothing "significant" left. Then if a user wants to search for, say, '%74%138%', I clip off the leading wild characters and do a LIKE on the suffix index. (I could further optimize by guessing that '138%' would be a shorter scan than '74%', and applying the original '%74%138%' pattern to the original part number.) I quickly get all the matches like 'MC74AC138N', 'SN74LS138D', etc. I don't imagine that FTS would be applicable to this problem.

While it sounds like I'd be multiplying the size of my index by the number of characters in a part number, it's really not so bad. Short part numbers have fewer suffix substrings, and you can add some restrictions if you disallow putting a wildcard character in the middle of certain sequences, like a series of digits. For about 125,000 part numbers I have less than 600,000 suffixes to part number mappings.
Post #1355971
Posted Friday, September 7, 2012 9:13 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Wednesday, March 12, 2014 2:56 PM
Points: 12, Visits: 107
We loaded something like 150 million records from Library of Congress and had a co-located 2 post rack running a quad processor something-or-other with a RAID 5 setup on Oracle 8i. Users ranged into the thousands online at once doing queries. RPS peaked over 50 per second if memory serves (this was 1999).

We filtered stop words, and plurals, but not misspellings nor synonyms.

Using materialized views is very smart. Be aware of disk space with lots of records. I'm on a project now that has over a billion records in one table. We have to plan very carefully for index growth and maintenance (rebuilds). Adding a materialized view that copies and concatenates text columns (and then builds FTS indexes on that, which might be in addition to FTS indexes on individual columns) could take up considerable space.

For 15 million records with an average row length of 65 characters you're talking close to a gigabyte of space for the materialized view. Not a problem for an in-house server, but for a hosted server or a cloud database, that could cost you actual coin per month.

My point there isn't to talk you out of materialized views. I love that idea. Just look at the whole picture when designing things, and plan for growth/scale.


Post #1356055
Posted Friday, September 7, 2012 11:52 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Today @ 10:41 AM
Points: 1,945, Visits: 3,127
I liked the article, but if I have to search docments, I do not use SQL (gasp!). It is like cutting wood with a screwdriver and not a saw. Tables are not documetns and do not store bulk data in a format that searches easily, thre is no semantic web support, etc. Lexis, Nexis and West Law ar enot in SQL.

In addition to full text searches, I like KWIC (Keyword in Context) and keyword lists from a limited vocabulary.


Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL
Post #1356153
Posted Friday, September 7, 2012 1:02 PM
Say Hey Kid

Say Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey Kid

Group: General Forum Members
Last Login: Saturday, September 27, 2014 12:26 PM
Points: 676, Visits: 433
I liked the article. A new technique to add to my toolset. I did play around with it to remove the Temp Tables and at least on my dev server, there was quite a performance increase.

DECLARE @Keywords VARCHAR(8000)
SET @Keywords = 'The Red Rider Carbon Action 200 shot range model air rifle' ;

SELECT
DISTINCT
P.ProductID
,P.Name
,P.Color
,P.Size
,P.ListPrice
FROM
-- convert the keywords to a table. Replaces #KW
dbo.DelimitedSplit8KNEW(REPLACE(REPLACE(@Keywords, '/', ' '), ',',
''), ' ') searchterms
LEFT JOIN dbo.KeyWordsToExclude xcl
ON xcl.KeyWord = searchterms.ItemValue
INNER JOIN dbo.ProductKeyWordItems prodkeys
ON [prodkeys].[KeyWord] = searchterms.ItemValue
INNER JOIN [Production].[Product] P
ON P.[ProductID] = prodkeys.[ProductID]
WHERE
xcl.KeyWord IS NULL -- include only records where the keyword search hits on a non-excluded word.

Here are the stats:

Table 'KeyWordsToExclude'. Scan count 1, logical reads 79, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'ProductKeyWordItems'. Scan count 11, logical reads 24, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Product'. Scan count 1, logical reads 15, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 9 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.



Post #1356195
Posted Friday, September 7, 2012 1:27 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Monday, November 10, 2014 7:22 AM
Points: 1,187, Visits: 1,984
There is a lot of reinvention of the wheel. I've worked with full-text in both Oracle (9i, 10g) and SQL Server (2000, 2005, 2008) as well as external, i.e., not-integrated, full-text systems such as AltaVista, etc.

Now that SQL Server, in 2008, has an integrated full-text engine similar to what Oracle had in 9i (circa 2000), full-text in SQL Server is pretty robust. However, it still lacks a lot of Oracle's functionality.

We use it for plain text, HTML, and binary (Microsoft Office files, PDFs, etc.).

Eliminate the Stop (noise) word filtering as it is so 90's. I.e., index everything.

As for performance and capacity, just look at what Iron Mountain is doing.



(PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.
Post #1356213
Posted Sunday, September 9, 2012 5:19 PM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Thursday, May 8, 2014 4:43 PM
Points: 57, Visits: 249
Another technique to try ...

On tables where you have columns that you'd otherwise build a FTS index, instead create a trigger that populates a single table where you string together exactly what you want. I.e. if searching has a set of filter criteria as well as the text, add in 'keywords' for the filters.

E.g. you want all people whose name are like 'Fred Smith' that live in California and were born in 1969. You'd create a FTS string like 'Fred Smith zClfrnz zDOB19690501' where zClfrnz and zDOB19690501 are the keywords created by the trigger from attributes that are probably already in memory.

When you search your FTS contains clause looks like this: 'Fred AND Smith AND zClfrnz AND zDOB1969*'. This is blisteringly fast.

Obviously, this technique falls down when a range search is also required. But not anything like as bad as you'd think. Say you want to search on 'Fred Smith's in California whose age is between 45 and 50. You'd add a column to your new FTS table that inlcuded the DOB, possibly with a specific index or a covering index that has some of the most common range value searches if they are commonly used in conjunction. Another approach where the number of items in a rang is small is simply to string together the range of interest, e.g. 'Fred AND Smith AND zClfrnz AND (zDOB1967* or zDOB1968* or zDOB1969* or zDOB1970* or zDOB1971*)'

Another thing that works well is 'joining' FTS searches. In this situation you still have just one FTS table but have triggers on multiple entity tables populating it. I.e. you want to find all sales people whose name is 'Joe Patel' who work in California and who are servicing a 'Fred Smith'. The salesPerson table creates a 'Joe Patel zslsprsnz zClfrnz' and the fred smith row is extended to include this relationship, i.e. red Smith zClfrnz zDOB19690501 zslsprsnzJoeSmith'.

At any time to want to add in new search keywords or searchable attributes, you run a dummy update on the source table and the trigger regenerates the search string value. Note you need to get a bit funky finding the existing search value if you're doing updates rather than deletes and inserts.

A 2 vCPU, 8GB SQL Server was easily handling 100 AJAX trns per second over a DB with 750k rows in the search table. Two different search application types too. One for building products and one for real estate.

The above won't work for all situations but if your searchable data is in database columns it generally works a treat as you can combine the best of relational DB with the best of FTS.
Post #1356517
Posted Tuesday, September 25, 2012 4:29 PM


SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Friday, November 14, 2014 6:26 AM
Points: 32, Visits: 124
One reason for doing this instead of built-in Full-Text Search, is that once you have isolated your keywords, you can suggest alternatives when a user has misspelled one. See my article on fuzzy-string matching. (I linked this article from the discussion on mine.)
Post #1364351
Posted Tuesday, October 21, 2014 2:59 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Saturday, October 25, 2014 2:18 AM
Points: 4, Visits: 63
Doesn't accommodate for wildcards though. It is only parsing and matching FULL words.

He% and Nu% would not work.
If putting a LIKE in-between KWI.KW and #KW.Keyword, you may think it works, but it is not matching BOTH keywords, only one of the above.
Post #1627265
« Prev Topic | Next Topic »

Add to briefcase «««1234

Permissions Expand / Collapse