Keyword Searching in SQL Server

  • Comments posted here are about the content posted at http://www.sqlservercentral.com/columnists/mAhmadi/2875.asp

  • Hi,

    I'm a bit of a newbie - so I've seen SQL has the ability to do Full Text Searching. I've never used it but why did you not use it? Just curious as I'll need to implement something like this soon.

    Thanks

  • If You want to refine the search You can put all articles into an n-dimensional vector-space, where n ist the number of all (distinct) keywords in Your (keyword-) table. Each Entry of Your keyword log (LogEntry_Keyword) can then be understood as a vector. Euclidean "near" vectors will then assumedly contain related content. Have fun

  • Don't get the impression that the methodology described here is by any means a substitute for full text search - it certainly is not. In the article we are simply defining a keyword as the text between space characters and we are in control of what tags we use to describe a log entry. In a real-world scenario you are more likely going to need the abilities of the full text engine's parser (word breaker) as well as the efficiency and versatility of full text querying. This is in fact a self-contained solution to the problem of keyword search, but it is a bare-bones solution. I use it mainly for keeping track of information that can be described with a handful of keywords.

    Mike

  • This is an interesting idea, and I think I get the author's point - why use the FTS "sledgehammer" for simpler tasks that don't require all the extra functionality? One thing that might add value to the tokenization/matching the author presents when compared to FTS is the ability to do approximate matching: phonetic, edit distance, n-gram, or common substring matching (or maybe some combination of these). You can actually get very good performance and good accuracy matches from a set-based n-gram solution.

  • This is a brilliant concept, and I'm amazed I have not seen it anywhere else!

    There are some known issues with full-text search:

    - when searching, you need to look for "Words", you cannot use arbitrary substrings

    - you cannot (easily?) "partition/order" a full-text index by a key, eg "UserID" or "ClientID" - in a shared-tenant architecture (SaaS environment with multiple/many clients in a single DB) this can be a very serious issue!

    - Administration/Maintenance is very painful in SQL Server 2000 and earlier (have not tried 2005 but reputedly much better)

    If instead of using a Trigger to do the "tokenizing" in this solution you used a scheduled job, along with trigger to maintain an "UpdateRequired" flag of some sort on the record, for the job to look at, you would basically be building your own "text search light" system, suitable for all sorts of uses...

    It does have major disadvantages of course:

    - will use much more space for the tokens than full-text search would

    - will be less efficient when tokenizing

    - will be less powerful when tokenizing (no word root identification etc)

    - will probably/possibly be slower when returning matches on entire table (but faster on subset by a key that you specify)

    All in all a great option to keep in mind though I think - does anyone see other major disadvantages (or advantages) that I am missing?

    Thanks,

    Tao

    http://poorsql.com for T-SQL formatting: free as in speech, free as in beer, free to run in SSMS or on your version control server - free however you want it.

  • There appears to be a bug in the search stored procedure ... it was only searching on the first term given to it in the search string.

    It was also eroding the string by one character every loop, so would break after the padding string was totally eroded away

    To solve the problems change:

    The line:

    SET @kws = SUBSTRING(@kw, CHARINDEX(' ', @kws) + 1, LEN(@kws) - CHARINDEX(' ', @kws) - 1)

    To this (remembering to remove the -1 at the end):

    SET @kws = SUBSTRING(@kws, CHARINDEX(' ', @kws) + 1, LEN(@kws) - CHARINDEX(' ', @kws) )

  • :w00t: fascinating concept.

    Which versions of SQL is this intended for? I am getting error from SQL Query Analyzer 2000 when try to execute the trigger section on a db running in SQL Server 2005 :

    Server: Msg 207, Level 16, State 1, Procedure trgInsertLogEntry, Line 21

    Invalid column name 'tags'.

    Points to line:

    SET @tags = (SELECT tags FROM INSERTED)

    Where did table "INSERTED" get made?

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply