Using LIKE for text search

  • Due to customer request, we moved an Oracle8 database to MSSQL2008. Everything work fine, except some surprising performance issues.

    The database contains linked tables xxx (130000 recs) and yyy (3000 recs).

    I've read a lot about inefficient MSSQL text queries (especially with leading %), but:

    - to me, the measured processing time don't make any sense

    - Oracle processed these requests in 10 millisecs !!

    Answer time : 600 millisecs for searching 3000 records

    SELECT ... FROM xxx AS A INNER JOIN yyy AS B ON A.ID = B.ID WHERE A.NORM LIKE '%xyz%'

    Answer time : 650 millisecs for searching 130000 records

    SELECT ... FROM xxx AS A WHERE A.NORM LIKE '%xyz%'

    Answer time : 30 millisecs for searching 3000 records

    SELECT ... FROM xxx AS A WHERE (A.ID <= 3000) AND (A.NORM LIKE '%xyz%')

    Answer time : 25 millisec for the JOIN and recordset transfer alone

    SELECT ... FROM xxx AS A INNER JOIN yyy AS B ON A.ID = B.ID

    In other words:

    Searching 3000 records costs just as much as searching 130000 records.

    On the other hand, searching 3000 records without INNER JOIN needs acceptable 3 * Oracle.

    The INNER JOIN itself seems to need something like 10 to 15 millisecs.

    I would expect the 1st SQL statement to be processed in the order

    - find 3000 records by INNER JOIN

    - search text within these 3000 records

    Looking at the measured times, it seems to me, that MSSQL reversed that order

    - search text with 130000 table A records

    - INNER JOIN the result with table B

    Both strategies would result in the same answer, but with enormous performance differences.

    Not being an MSSQL expert, can anyone explain this behaviour to me ?

    Your help would be greatly appreciated.

    Thanx in advance

    Peter

  • First thoughts were related to indexes, have they been re-indexed lately? Also was the server "cold" in that it had been recently restarted? Also does the SQL Server have sufficient memory.

    You can't tell how a query will be executed just by looking at it, MSSQL does not use syntax based optimization, it uses cost based. You will have to show the query plan to find out exactly how it executed, this will also tell you how and what indexes were used.

    How wide are the columns, if <900 characters they could be indexed themselves using normal indexes. Also, LIKE operations are highly inefficient and usually generate a lot of IO, you might look at full-text indexes on those columns.

    Just the thoughts off the top of my head..

    CEWII

  • Thank you for your quick reply.

    Yes, indexes are defined for all relevant fields as ID, NORM, etc.

    The indexes are recalculated every night.

    The NORM field is varchar2(200).

    Both mentioned tables are part of the (relatively static) master data.

    This data type works with a very low change rate.

    MSSQL runs on ESX with 2 kernels and 3.6 GB.

    The Windows2003 server shows 1 GB free memory.

    The virtual host is a brandnew HP server.

    I experimentally used the optimizer tool to check indexes and create statistics, but without success.

    The 600ms penalty seems to be carved in stone...

    Peter

  • Per the query plan how is the LIKE operator being accomplished? What operation is it saying?

    CEWII

  • Also, what does your execution plan look like? and io stats?



    Shamless self promotion - read my blog http://sirsql.net

  • Using the German version of MSSQL, I might run into translation problems...

    The menu "show estimated execution plan" shows:

    SELECT cost 0%

    Nested Loops (Inner Join) cost 0%

    Clustered Index seek cost 1% Nested Loops (Inner Join) cost 13%

    Index Scan (nonclustered) cost 86%

    Key Search (clustered) cost 1%

    Hope, this clarifies

    Peter

  • if you set statistics io on and execute the query what do you get as regards scan counts and reads?



    Shamless self promotion - read my blog http://sirsql.net

  • I'm not sure if I understand correctly...

    Executing the SQL in Management Studio produces the following trace info:

    'Worktable'-table. Scan count 0, logical read operations 0, physical read operations 0, Read-Ahead-read operations 0, logical LOB-read operations 0, physical LOB-read operations 0, Read-Ahead-LOB-read operations 0.

    'ACTIONTEMPLATES'-table. Scan count 1, logical read operations 17, physical read operations 0, Read-Ahead-read operations 0, logical LOB-read operations 0, physical LOB-read operations 0, Read-Ahead-LOB-read operations 0.

    'STATICNODES'-table. Scan count 1, logical read operations 822, physical read operations 0, Read-Ahead-read operations 0, logical LOB-read operations 0, physical LOB-read operations 0, Read-Ahead-LOB-read operations 0.

    Is this the information, you need ?

  • It is a start. You might want to screen print the output from a "Include Actual Query Plan". I'm interested in the access method of the LIKE operation, I'm wondering if it is doing a table scan, or an index scan, or and index seek or something.

    CEWII

  • Please excuse my late response, but I've been off for a while with other projects.

    I just want to give some followup to the discussed problem.

    Not being able to find the reason for the very slow LIKE execution, I implemented an intermediate problem solution.

    Removing the LIKE clause from the SQL query and moving it to the client-side filter reduced execution time from 600 to 60 msec.

    Additionally implementing a client-side cache brought a further reduction down to 2 to 5 msec.

    Thus, these simple optimization steps brought an overall 99.5% reduction in processing time.

    As a nice side effect, these steps reduced the server load significantly.

    This may not be the solution, I originally had in mind.

    But then: When the customer is happy, I'm happy.

    Thanks a lot for your support.

    Peter

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply