Relational Table SQL Query Optimization/Performance Issue?

  • ahmad

    Valued Member

    Points: 62

    Hi Every One

    I have query optimization issue as follows.

    I have two related tables.

    Commodity (This table has 75 Thousand records)

    CommodityID (PK)

    CommodityName

    CommoditySupplier (This table has approx 800 Million records)

    CommoditySupplierID (PK)

    SupplierID (FK)

    CommodityID (FK)

    I am executing following query to list SupplierIDs having CommodityName 'SearchString'

    SELECT SuppllierID FROM CommoditySupplier WHERE CommodityID IN (SELECT CommodityID FROM Commodity WHERE CommodityName Like '%SearchString%')

    Problem is, it is extremely slow. It takes more than 30-40 minutes to execute. Sometimes even more than that.

    please guide me to optimize the query. Or Should I change the structure in someway?

    Thanks in advance.

  • DesNorton

    SSC-Insane

    Points: 22848

    I would start by using an INNER JOIN, rather than a sub-query

    SELECT      cs.SuppllierID
    FROM CommoditySupplier AS cs
    INNER JOIN Commodity AS c
    ON c.CommodityID = cs.CommodityID
    WHERE c.CommodityName LIKE '%SearchString%';

    However, your LIKE '%xxxx%' will always force a table scan, which will make the query slow on such large tables.

    You need to stop searching where CommodityName name "contains" some value, and start searching where CommodityName "startsWith" some value.

    SELECT      cs.SuppllierID
    FROM CommoditySupplier AS cs
    INNER JOIN Commodity AS c
    ON c.CommodityID = cs.CommodityID
    WHERE c.CommodityName LIKE 'SearchString%'; -- NOTE no leading wild card

    You can then add an index to assist with your search

    CREATE NONCLUSTERED INDEX ix_Commodity_CommodityName
    ON Commodity (CommodityName)
    INCLUDE (CommodityID);
  • DesNorton

    SSC-Insane

    Points: 22848

    Because the logical query processing order will process the join before the where, it MIGHT improve performance slightly if you search for the Commodities before searching for the Suppliers.

    IF OBJECT_ID(N'tempdb..#Commodity') IS NOT NULL
    BEGIN
    DROP TABLE #Commodity;
    END;
    CREATE TABLE #Commodity (CommodityID int NOT NULL PRIMARY KEY CLUSTERED);

    INSERT INTO #Commodity ( CommodityID )
    SELECT CommodityID
    FROM Commodity
    WHERE CommodityName LIKE '%SearchString%'; -- THIS IS A BAD IDEA that will cause a full table scan

    SELECT cs.SuppllierID
    FROM #Commodity AS c
    INNER JOIN CommoditySupplier AS cs
    ON c.CommodityID = cs.CommodityID
    GROUP BY cs.SuppllierID;
  • ScottPletcher

    SSC Guru

    Points: 98206

    To help resolve this problem across the board, for all your queries, you really need to re-cluster the CommoditySupplier table.  This is a classic case of the "default clustering by identity column" problem.  Unfortunately, since it's such a large table, this will take quite a while.  You'd almost certainly want to try it first in a test environment anyway.  Change the clustered index as shown below, then retry the query.  Review the query plan to make sure SQL is searching the Commodity table first, then doing (joined) seeks on the CommoditySupplier table.

    --first, drop all nonclustered indexes (if any)

    ALTER TABLE dbo.CommoditySupplier DROP CONSTRAINT PK_CommoditySupplier;

    CREATE UNIQUE CLUSTERED INDEX CL_CommoditySupplier ON dbo.CommoditySupplier

    ( CommodityID, SuppliedID /*, CommoditySupplierID */

    /*Add CSID *ONLY* IF *REQUIRED* TO MAKE THE FIRST TWO COLUMNS UNIQUE*/ )

    WITH ( /*DATA_COMPRESSION = PAGE,*/ FILLFACTOR = 98, SORT_IN_TEMPDB = ON )

    ON [<same_filegroup_name_as_original_PK_was_on>];

    --re-create the PK, but nonclustered

    ALTER TABLE dbo.CommoditySupplier ADD CONSTRAINT PK_CommoditySupplier PRIMARY KEY NONCLUSTERED ( CommoditySupplierID ) WITH ( /*DATA_COMPRESSION = ROW,*/ FILLFACTOR = 99, ... ) ON [...same_fg_name...];

    --recreate all nonclustered indexes (if any)

    SQL DBA,SQL Server MVP(07, 08, 09) Prosecutor James Blackburn, in closing argument in the Fatal Vision murders trial: If in the future, you should cry a tear, cry one for them [the murder victims]. If in the future, you should say a prayer, say one for them. And if in the future, you should light a candle, light one for them.

  • Jeff Moden

    SSC Guru

    Points: 995102

    ScottPletcher wrote:

    To help resolve this problem across the board, for all your queries, you really need to re-cluster the CommoditySupplier table.  This is a classic case of the "default clustering by identity column" problem.

    Yeah... totally agree with that.

    But... even that isn't going to help this query because the search criteria is using a leading wildcard character.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems
    Create a Tally Function (fnTally)

  • ScottPletcher

    SSC Guru

    Points: 98206

    True, but that table only has 75K rows.  Create an index with only CommodityID and CommodityName and, yes, SQL will still have to do an index scan, but the overall query still shouldn't take anywhere close to 30 mins, unless there are an unusually large number of matching IDs.

    SQL DBA,SQL Server MVP(07, 08, 09) Prosecutor James Blackburn, in closing argument in the Fatal Vision murders trial: If in the future, you should cry a tear, cry one for them [the murder victims]. If in the future, you should say a prayer, say one for them. And if in the future, you should light a candle, light one for them.

  • The Dixie Flatline

    SSC Guru

    Points: 53231

    Might be worthwhile to first query the commodities table and put the results into a #temp table indexed on commodityID, then join to the #temp table.   Depends on how many results are usually returned from the %search%.

    Apologies, Des.   I didn't notice the PRIMARY KEY in your create table statement.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Jonathan AC Roberts

    SSCoach

    Points: 16992

    First I would try adding this index to table CommoditySupplier:

    CREATE INDEX IX_CommoditySupplier_CommodityID_INC_SupplierID
    ON CommoditySupplier(CommodityID) INCLUDE (SupplierID);

    If that doesn't improve performance enough I would, in addition to the above index, execute the query like this with the assistance of a temporary table.

    IF OBJECT_ID('tempdb..#CommodityID','U') IS NOT NULL DROP TABLE #CommodityID

    SELECT DISTINCT CommodityID
    INTO #CommodityID
    FROM Commodity
    WHERE CommodityName Like '%SearchString%'

    CREATE UNIQUE INDEX IX_#CommodityID_1 ON #CommodityID(CommodityID)

    SELECT cs.SupplierID
    FROM CommoditySupplier cs
    INNER JOIN #CommodityID c
    ON c.CommodityID = cs.CommodityID

    DROP TABLE #CommodityID

     

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply