Killing FTS

  • Comments posted to this topic are about the item Killing FTS

  • a good first step in my book would be to just build in a dll with a good set of the .Net regular expression engine & functions already.

    are there good .Net clones of Lucene?

    Sure I can Google some code and roll my own dll, and plead with the dba to allow just this one exception to be loaded into the SQLCLR... but why? MS finally added TRIM(), so... ?

    How long has Oracle included dbms_regex?

  • Corey, you may want to take a look at https://www.sqlsharp.com/ by Solomon Rutzky.

    I think Steve is right.  It's time to replace FTS.  I did an experiment to try and get it to handle phonetic matching but you are still left with an outdated product.  I think I made a dog miaow.

    I think that the new graph extensions and columnar storage provide some interesting possibilities for basing a replacement capability.  In some cases it is good enough to say that a term searched for has a high probability of existing in a document so hyperloglog indexing would make sense.  I've lost my bookmark to an article that actually did implement hyperloglog in SQL Server.

    ElasticSearch is a good product though I wouldn't call it the easiest thing to query when you go beyond simple questions.

    A couple of years back there was a white paper for PostGres that described a new index type that sounded ideal for free text searching.  There were a couple of Polish mathematicians that were working on it.  If anyone can find a reference to that then please share.

     

  • I think this is unfair for at least three reasons:

    1. Microsoft added Semantic Search to SQL Server 2012, this adds a lot of new functionality (see https://docs.microsoft.com/en-us/sql/relational-databases/search/semantic-search-sql-server?view=sql-server-2017 )

    2. In principle you can load PDF files, DOC files, DOCX files and other formats into a folder which is linked to a SQL Server FILETABLE. You can then use all the FTS commands (and Semantic Search commands) to directly search these files, using the wonders of FILESTREAM and FILETABLE.

    3. Also you can do this:

    SELECT * FROM sys.dm_fts_parser('FORMSOF(INFLECTIONAL,"go")', 1033, 0, 0)

    to find the alternate forms of a word.

    • This reply was modified 4 years, 10 months ago by  William Rayer.
  • We've only ever used FTS for our website's search feature and I don't have any other experience for comparison, but I do know we'd miss the thesaurus feature (\Program Files\Microsoft SQL\MSSQL11.MSSQLSERVER\FTData\tsenu.xml).  This is where we can define terms that span the whole 'Items' table instead of relying on meta tags for each item we add to our database.  For example, we sell "magnet cable" and in our industry, the old dogs refer to this as "baloney cable".  I've defined this in our thesaurus so that "magnet" and "baloney" and "bologna" are interchangeable (granted, we do not sell any meat products so this is a safe arrangement in our world).  This is much easier than remembering to add these terms when we put a new magnet cable into our items table.

    Is this a feature in other offerings?

  • Good article. I tried implementing FTS in a database at my previous job, with very limited success. I concluded that it was my own fault for not understanding how to do it (which was true) and other priorities took my attention. I hadn't even heard of Elasticsearch, until I read your article, Steve. Thanks for pointing that out! Will look into it.

    Kindest Regards, Rod Connect with me on LinkedIn.

  • For Azure SQL databases, there is Azure Search. There is also a pattern for integrating on-premises SQL Server with Azure Search using Azure Data Factory.

    https://docs.microsoft.com/en-us/azure/data-factory/v1/data-factory-azure-search-connector#json-example-copy-data-from-on-premises-sql-server-to-azure-search-index

     

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • I feel somewhat prophetic:

    http://jonmorisissqlblog.blogspot.com/2016/03/text-searchingprocessing.html

    "I can't help but wonder if there's a strategy to get Azure full-text indexes both stored on and hitting Hadoop on the back-end.  How would such a solution compare to the many big data search options that are currently available?"

    If you need search, just use something built off of Lucene, it's clearly the best of breed.

    Is Azure Search Lucene based?

    https://docs.microsoft.com/en-us/azure/search/search-lucene-query-architecture

    Can anyone verify that?

    It would be nice if they rolled those features into on-prem, I know Oracle has a Lucene based full text search component.

  • Jon.Morisi wrote:

    ...

    If you need search, just use something built off of Lucene, it's clearly the best of breed. Is Azure Search Lucene based? ...

    Azure Search supports Lucene compatible API and indexes in a similar way that Azure CosmosDB supports APIs that are based on SQL, MongoDB, Cassandra, or Gremlin.

    https://docs.microsoft.com/en-us/azure/search/search-lucene-query-architecture

    https://docs.microsoft.com/en-us/azure/search/query-lucene-syntax

     

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply