Full Text Search and words with symbols

  • Hi,

    Is there a way to make full text search which supports search terms for words with symbols, like C#, C++ etc.?

    I see that even this forum, if I try search for C++, it returns everything where letter c exists. So, it's not possible I guess?

    Regards

  • Hi,

    I used the instruction containstable (myTable, MyColumn, 'C++' )

    it's work fine with 'C++', 'C#' ...

    You can see this

    http://technet.microsoft.com/en-us/library/ms189760.aspx

    Best regards

  • There are lots of posts suggesting you can't do this but it seems to work fine in 2012.

    EG:

    USE IMPORT;

    -- CREATE FULLTEXT CATALOG ft AS DEFAULT;

    DROP TABLE test;

    CREATE TABLE test (ID int not null,string varchar(max));

    INSERT INTO test VALUES (1,'this is a bit of text with the term C# somewhere in it'),(2,'and this one uses C++ instead.'),(3,'whilst this just has a c in it somewhere')

    CREATE UNIQUE INDEX ixID ON test (ID);

    CREATE FULLTEXT INDEX ON test (string) KEY INDEX ixID;

    SELECT * FROM test WHERE CONTAINS(string,'C#');

    SELECT * FROM test WHERE CHARINDEX('C#',string)>0;

    Returns:

    ID string

    1this is a bit of text with the term C# somewhere in it

    ID string

    1this is a bit of text with the term C# somewhere in it

    However (BIG however).....

    If you change the above script and replace all instances of C# with, say, X# and all instances of C++ with X++ then you'll find that it doesn't work any more. What this means is that Microsoft must have a list of "words" that full text indexing is able to pick up on: if your search terms, as in the OP, are included in that list then you're fine, but if they are not then you're no further forward. Someone with more experience of full text editting can probably tell you if and how you might control the "word list"........

  • I just tried a query like

    SELECT ID, string FROM test

    INNER JOIN containstable(test, string, 'C#')

    AS KEY_TBL ON test.ID = KEY_TBL.

    It is very interesting that it returns correct results on SQL Server 2012, but ignores # on SQL Server 2008.

    andyscott (2/10/2014)


    However (BIG however).....

    If you change the above script and replace all instances of C# with, say, X# and all instances of C++ with X++ then you'll find that it doesn't work any more. What this means is that Microsoft must have a list of "words" that full text indexing is able to pick up on: if your search terms, as in the OP, are included in that list then you're fine, but if they are not then you're no further forward. Someone with more experience of full text editting can probably tell you if and how you might control the "word list"........

    Maybe there is some internal English dictionary which contains all "real" words.

    It looks like all words that I need are included in that list, so I am fine with it.

    Thanks guys! 😎

  • Maybe this can help:

    Creating Custom Dictionaries for special terms to be indexed 'as-is' in SQL Server 2008 Full-Text Indexes

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • I suspect that on 2008 (R2) the # is treated as a wildcard for characters that are number and this might explain why C# is handled differently by 2008 and 2012. By default SS strips leading zeros off numbers when word breaking so searching for 00123 would find documents containing 123. To get around this in 2008 you can use a custom dictionary that includes "0#" (without quotes) and then 00123 would be considered to be a word rather than a number and the leading zeros would not be stripped by the word breaker. However this behaviour has changed in 2012.

    Sorry no real references to back this up, mostly by trial and error. 🙁 It would be nice if this was all documented somewhere, but I haven't been able to find much at all.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply