Full Text: AND vs. OR

  • I'm working on a website where I'm trying to implement Full Text (SQL Server 2005) searching capabilities against a few columns in one particular table.

    For the sake of my question, let's just assume that the the front end simply allows the user to enter keywords in a text box.

    I pass the keywords into a stored procedure containing a query which looks something like this:

    select l.* from Listings l

    join freetexttable(Listings, *, @keywords) ft on l.List_Key = ft.[Key]

    ORDER BY ft.Rank desc, UPPER(l.Company)

    This works OK if there's one keyword submitted. It's when there's more than one word that things don't behave as I'd like. For example, if the user types 'wool', I get ~50 results, but if the type 'organic wool', I get ~1325 results.

    So seemingly, the results are being derived by taking rows that match either keyword, not both (which is what I would prefer). Is there a (simple) way to control that behavior?

  • Consider the CONTAINS and CONTAINSTABLE vs. FREETEXT.

    See the BOL for details. Also look at Hilary Cotter's articles:

    http://www.simple-talk.com/sql/learn-sql-server/sql-server-full-text-search-language-features/

    Your UI will need to provide either a radio button or separate fields (similar to Google's "advanced search" page) with the choices:

    - Any of the terms entered (i.e., an OR codition)

    - All of the terms entered (i.e, an AND condition)

    - An exact phrase (i.e., a phrase search -- quoted string)

    Then the code will need to construct an appropriate SQL query. Watch out for SQL Injection! Variables can be used but there are limitations.

    That is what we do along with a set of checkboxes where the user can pick which fields to search. E.g., Book Title, Book Author, Book contents.


    [font="Arial Narrow"](PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.[/font]

  • We make extensive use of free text searching where I work. John is right, you need to construct a dynamic SQL query using the data supplied by the user.

    For example, if the user enters "organic wool" you will need to construct a query saying "organic AND wool". If you want to weight the results, then it becomes "ISABOUT('organic', 0.5) AND ISABOUT('wool', 0.7)" or whatever you need.

    If this suddenly looks quite complex compared to what you originally thought, then sorry but that is what needs to be done to really exploit FTS.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

  • So are you saying that, for instance if the code doing the search for this example was in ASP.NET, that you'd have to dynamically construct the SQL statement within that code, and then just run it through ASP.NET instead of calling a stored procedure?

    Or can you just dynamically pass a chunk of the SQL query to a stored proc and then concatenate a "hard-coded" stored proc with what you passed in to make sure your AND or OR is working as it should?

  • EdVassie (1/16/2008)


    We make extensive use of free text searching where I work. John is right, you need to construct a dynamic SQL query using the data supplied by the user.

    For example, if the user enters "organic wool" you will need to construct a query saying "organic AND wool". If you want to weight the results, then it becomes "ISABOUT('organic', 0.5) AND ISABOUT('wool', 0.7)" or whatever you need.

    If this suddenly looks quite complex compared to what you originally thought, then sorry but that is what needs to be done to really exploit FTS.

    lou-612048 (3/6/2010)


    So are you saying that, for instance if the code doing the search for this example was in ASP.NET, that you'd have to dynamically construct the SQL statement within that code, and then just run it through ASP.NET instead of calling a stored procedure?

    Or can you just dynamically pass a chunk of the SQL query to a stored proc and then concatenate a "hard-coded" stored proc with what you passed in to make sure your AND or OR is working as it should?

    Ed, Lou: What you are both describing is, so far, the operational definition of SQL Injection, which is a very bad thing. I am hoping that you will be adding instructions here for anyone reading this on how to prevent that in this approach.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • And that was the reason for my post, because, you know, I'm confused.

    I have a full-text search that works in an ASP environment that takes whatever's entered into the search field, doing some sanitization, then breaking apart the search phrase into words and dynamically creating a SQL query which is then run from the code (rather than calling a stored proc).

    I know full well that this is not the right way to do it, but I've been unable to find any reference on how to do what I'm doing purely within the stored proc.

    So this is where I am, basically:

    SELECT DISTINCT Products.ID, Products.Name, Suppliers.Name AS SupplierName, Categories.Name AS CatName

    FROM (Products INNER JOIN Suppliers ON Products.SupplierID = Suppliers.ID) INNER JOIN Categories ON Products.CatID = Categories.ID

    WHERE CONTAINS(Products.*, 'FORMSOF(INFLECTIONAL, black) AND FORMSOF(INFLECTIONAL, boxes)')

    This returns records that contain both of the words 'black' and 'boxes' wherever they are in the FT index, even if they're not together.

    And this is the behavior I want, except for the fact that the app code is running this rather than calling a stored proc.

    So the question is, how do I do, in SQL, what my ASP code is doing? Pass a sanitized phrase (such as 'black boxes') to a parameterized stored proc and have it do the same thing? My understanding is there are no arrays in SQL Server, so I can't break apart the phrase within the stored proc into an array and just loop through the array to build the query dynamically entirely within SQL.

    Keep in mind that I don't know if the search is going to be for 'boxes' or 'black boxes' or 'rectangular black box' etc. so I can't hard-code the number of possible search words to be searched for.

  • lou-612048 (3/6/2010)


    ...

    I have a full-text search that works in an ASP environment that takes whatever's entered into the search field, doing some sanitization, then breaking apart the search phrase into words and dynamically creating a SQL query which is then run from the code (rather than calling a stored proc).

    I know full well that this is not the right way to do it, but I've been unable to find any reference on how to do what I'm doing purely within the stored proc.

    ...

    So the question is, how do I do, in SQL, what my ASP code is doing? ...

    OK, I see. Well, good question. I'm glad you had the judgement to ask and not just ignore the issue.

    Pass a sanitized phrase (such as 'black boxes') to a parameterized stored proc and have it do the same thing?

    This is the method used by most applications programmers, I believe because it is the one most easily retro-fitted from the application code after the fact. I do not recommend it however for two reasons: 1) I do not believe that it is truly secure in the way that "secure" is usually meant and required by security specialists. It is merely "believed to be" secure, and even then, "only if you do it right". In other words, no one has ever formally demonstrated that it is secure, nor has the vendor (Microsoft) ever affirmed or documented a fail-proof way of doing it, implementers just "think" that they have got every possible trick covered (I frankly doubt it). Plus, if you get it wrong (forget something, make a mistake), then you're definitely NOT secure.

    The other reason I dislike this approach is that technically, it breaks your application. It means that there's all kinds of legitimate things that you users cannot search for, and I doubt that that's how the functional spec describes your app.

    My understanding is there are no arrays in SQL Server, so I can't break apart the phrase within the stored proc into an array and just loop through the array to build the query dynamically entirely within SQL.

    There's no such things as *explicit* arrays, but there are at least three different acceptable ways to emulate them (temporary tables, XML, and delimited strings). But I wouldn't recommend looping in any event. Here's an article by a really smart guy who explains one of these techniques (delimited strings): Passing Parameters as (almost) 1, 2, and 3 Dimensional Arrays[/url]

    Keep in mind that I don't know if the search is going to be for 'boxes' or 'black boxes' or 'rectangular black box' etc. so I can't hard-code the number of possible search words to be searched for.

    Understood. Could you possibly post this code? Then I believe I could make a better and more specific recommendation, as currently there is no single solution that is best for all cases.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Oops, never mind about the code request. I infer from the discussion above, that you do not yet have the full features done for the code side either?

    If that's the case, then I'd say go read the Arrays article that I posted above, and then let us know if that is sufficient for you or if you still have more questions.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Thanks Barry...much appreciated.

    Yes, the code is done and I'll post a modified version of it maybe tomorrow (getting ready to have friends over for dinner and the peppers are almost done roasting 🙂

  • Ok, so here's the code that works in ASP. As you can see, there's a part there where I take the basic query and append additional AND FORMSOF... elements for each word in the search that a user has performed.

    <%

    SearchTerm = Request.Form("SearchTerm")

    SearchTerm = Replace(SearchTerm, "&", " ")

    arrTerm = Split(SearchTerm, " ")

    '--------------------------------------

    'Build the Search string from the array

    '--------------------------------------

    bitFirst = True

    strPhrase = ""

    For Each strWord In arrTerm

    '------------------------------------

    'This function does some sanitization

    '------------------------------------

    strWord = StripSearch(strWord)

    '----------------------------------------------------------------------------

    'If this word has an apostrophe in it, remove everything after the apostrophe

    '----------------------------------------------------------------------------

    If InStr(strWord, "''") Then

    arrWord = Split(strWord, "''")

    strWord = arrWord(0)

    End If

    '--------------------

    'Noise word test here

    '--------------------

    If LCase(strWord) = "of" OR LCase(strWord) = "and" Then

    'Do nothing

    Else

    '----------------------------------------------------------

    'Add elements to the query for multiple words in the search

    '----------------------------------------------------------

    If bitFirst = True Then

    If Len(strWord) > 1 Then

    strPhrase = "'FORMSOF(INFLECTIONAL, " & strWord & ")"

    End If

    bitFirst = False

    Else

    If Len(strWord) > 1 Then

    strPhrase = strPhrase & " AND FORMSOF(INFLECTIONAL, " & strWord & ")"

    End If

    End If

    End If

    Next

    If strPhrase <> "" Then

    strPhrase = strPhrase & "'"

    Else

    strPhrase = "':'"

    End If

    '-------------------------------------------------------------

    'See if we'll have to add something to the sql statement based

    'on the area selected for the search

    '-------------------------------------------------------------

    If AreaID <> 0 Then

    sqlLoc = " AND Suppliers.TownID=" & AreaID

    sqlA = "SELECT Name FROM Neighborhoods WHERE ID=" & AreaID

    Set rsA = conn.Execute(sqlA)

    AreaName = rsA("Name")

    rsA.Close

    Set rsA = Nothing

    SearchTerm = SearchTerm & " in " & AreaName

    Else

    sqlLoc = ""

    End If

    '-----------------

    'Get a count first

    '-----------------

    sql = "SELECT COUNT(DISTINCT Products.ID) " &_

    "FROM (Products INNER JOIN Suppliers ON Products.SupplierID = Suppliers.ID) " &_

    "INNER JOIN Categories ON Products.CatID = Categories.ID " &_

    "WHERE CONTAINS(Products.*, " & strPhrase & ") " &_

    "AND Products.Hide=0 AND Suppliers.Active<>0" & sqlLoc

    Set rs = conn.Execute(sql)

    If NOT rs.EOF Then

    intTotalRecs = rs(0)

    End If

    rs.Close

    '----------------------------------------

    'Get the items that match the search term

    '----------------------------------------

    sql = "SELECT DISTINCT Products.ID AS ProdID, Products.Name AS ProdName, Products.Description, " &_

    "Categories.ID AS CatID, Categories.Name AS CatName, Suppliers.Name AS SupplierName, Suppliers.Logo, " &_

    "Products.Image, Categories.Priority, Categories.Depth, Suppliers.ID AS SupplierID, Suppliers.SubID, Suppliers.SubStart " &_

    "FROM (Products INNER JOIN Suppliers ON Products.SupplierID = Suppliers.ID) " &_

    "INNER JOIN Categories ON Products.CatID = Categories.ID " &_

    "WHERE CONTAINS(Products.*, " & strPhrase & ") " &_

    "AND Products.Hide=0 AND Suppliers.Active<>0" & sqlLoc

    strMatch = "items"

    strPType = "SF"

    'This ORDER BY statement orders alphabetically by supplier, then product name

    sql = sql & " ORDER BY Suppliers.SubID DESC, Suppliers.Name, Products.Name"

    Set rs = conn.Execute(sql)

    Do While NOT rs.EOF

    ID = rs("ProdID")

    Name = rs("ProdName")

    lngRID = rs("SupplierID")

    strRName = rs("SupplierName")

    Response.Write Name & " (" & ID & ") from " & strRName & " (" & lngRID & ")

    " & chr(13)

    rs.MoveNext

    Loop

    rs.Close

    dbClose()

    %>

  • What is "FORMSOF(INFLECTIONAL. "?

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Oh I see, it's a CONTAINS operator...

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Well I think I've got it. You can call the following procedure from your code, it should do what you need:

    CREATE PROCEDURE FTSearchProducts(@strPhrase as VARCHAR(MAX), @AreaID as Int = NULL)

    AS

    IF @AreaID is NULL

    BEGIN

    SELECT DISTINCT

    Products.ID AS ProdID,

    Products.Name AS ProdName,

    Products.Description,

    Categories.ID AS CatID,

    Categories.Name AS CatName,

    Suppliers.Name AS SupplierName,

    Suppliers.Logo,

    Products.Image,

    Categories.Priority,

    Categories.Depth,

    Suppliers.ID AS SupplierID,

    Suppliers.SubID,

    Suppliers.SubStart,

    COUNT(*) OVER(PARTITION BY -1) AS TotalRecs

    FROM Products

    INNER JOIN Suppliers ON Products.SupplierID = Suppliers.ID

    INNER JOIN Categories ON Products.CatID = Categories.ID

    WHERE CONTAINS(Products.*, @strPhrase )

    AND Products.Hide=0

    AND Suppliers.Active<>0

    ORDER BY Suppliers.SubID DESC, Suppliers.Name, Products.Name

    END

    ELSE

    BEGIN

    SELECT DISTINCT

    Products.ID AS ProdID,

    Products.Name AS ProdName,

    Products.Description,

    Categories.ID AS CatID,

    Categories.Name AS CatName,

    Suppliers.Name AS SupplierName,

    Suppliers.Logo,

    Products.Image,

    Categories.Priority,

    Categories.Depth,

    Suppliers.ID AS SupplierID,

    Suppliers.SubID,

    Suppliers.SubStartm,

    COUNT(*) OVER(PARTITION BY -1) AS TotalRecs

    FROM Products

    INNER JOIN Suppliers ON Products.SupplierID = Suppliers.ID

    INNER JOIN Categories ON Products.CatID = Categories.ID

    WHERE CONTAINS(Products.*, @strPhrase )

    AND Products.Hide=0

    AND Suppliers.Active<>0

    AND Suppliers.TownID = @AreaID

    ORDER BY Suppliers.SubID DESC, Suppliers.Name, Products.Name

    END

    As it turns out, you code was already pretty close: the '<contains_search_condition>' of the CONTAINS function is a safe parameter, so you can basically pass it in the way you are building it now strPhrase, that is). If you pass the AreaID in it will take care of that as well, and I added TotalRecs as the last column (obviously, if no records come back then it should be 0).

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Really? Wow, that's a little silly of me then not to have simply tried that before posting here haha.

    I genuinely appreciate your help on this and the time you took to read, give it thought, reply and even work up a solution in SQL.

    Thanks much...I'll post back after I have a chance to implement it.

    Have a great week 🙂

  • A few constructive criticisms, Barry:

    Running a full-text search with a variable as a parameter can be problematic. You might have noticed that the full-text engine can provide very good quality estimates to the optimizer when a literal string is used, even if the full-text query is complex.

    When a variable is used to provide the search terms, the optimizer resorts to a guess - which might be the ubiquitous 10,000 rows - I don't recall off-hand.

    If the query optimizer uses a guess, it might mistakenly produce a plan that it driven by some other input, and ends up calling the full-text query many, many times. I have seen this happen on production systems.

    In any case, since the full-text search is generally the most expensive part of a plan, we usually want to ensure that it is performed only once. This can be done by using CONTAINSTABLE rather than CONTAINS, and forcing the order of the joins.

    Alternatively, run the FT CONTAINSTABLE query first and save the KEY values returned in a table (usually a variable rather than a #temp table). This default cardinality estimate of '1' for table variables is normally sufficient for SQL Server to choose it as the driving table in the main query.

    It may also be possible to rely on OPTION (RECOMPILE) for the monolithic query, but that seems less robust to me, and has problems of its own in SQL Server 2008 up to SP1. I do not recall whether RECOMPILE is enough to get accurate statistics from the full-text service, though I would expect it is.

    Dynamic SQL is also an option here, of course.

    COUNT(*) OVER (PARTITION BY -1) is a little redundant: COUNT(*) OVER () is equivalent. Aside from that small syntactical point, COUNT(*) OVER produces a plan which features one of those stacked Table-Spool arrangements reminiscent of recursive CTEs (in fact they run in a different mode, but you see my point). It can be more efficient to use ROW_NUMBER twice, as shown in the following demonstration code:

    USE tempdb;

    GO

    DROP TABLE dbo.Sample;

    GO

    CREATE TABLE dbo.Sample

    (

    row_id INTEGER IDENTITY PRIMARY KEY,

    group_id INTEGER NOT NULL,

    padding CHAR(200) NOT NULL DEFAULT(SPACE(200))

    );

    GO

    WITH Numbers (n)

    AS (

    SELECT TOP (50000)

    ROW_NUMBER() OVER (ORDER BY (SELECT 0))

    FROM master.sys.columns C1,

    master.sys.columns C2,

    master.sys.columns C3

    )

    INSERT dbo.Sample

    (group_id)

    SELECT n % 25 + 1

    FROM Numbers;

    GO

    DECLARE @Bitbucket INTEGER;

    SET STATISTICS IO, TIME ON;

    SELECT @Bitbucket = COUNT(*) OVER ()

    FROM dbo.Sample;

    SELECT @Bitbucket =

    -1 +

    ROW_NUMBER() OVER (ORDER BY row_id ASC) +

    ROW_NUMBER() OVER (ORDER BY row_id DESC)

    FROM dbo.Sample

    SET STATISTICS IO, TIME OFF;

    DROP TABLE dbo.Sample;

    Results:

    -- COUNT(*) OVER ()

    Table 'Worktable'. Scan count 3, logical reads 100781, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Sample'. Scan count 1, logical reads 1358, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    CPU time = 219 ms, elapsed time = 224 ms.

    -- Double ROW_NUMBER()

    Table 'Sample'. Scan count 1, logical reads 1358, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    CPU time = 78 ms, elapsed time = 190 ms.

    Finally, the search criteria string is NVARCHAR, so should be passed as such to avoid an implicit conversion. It is possible for things to go 'wobbly' over such seemingly small details...

    Paul

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply