Relational Table SQL Query Optimization/Performance Issue?

Question

Relational Table SQL Query Optimization/Performance Issue?

ahmad

Valued Member

Points: 62
More actions
November 11, 2019 at 12:39 pm

#3696682

Hi Every One
I have query optimization issue as follows.
I have two related tables.
Commodity (This table has 75 Thousand records)
CommodityID (PK)
CommodityName
CommoditySupplier (This table has approx 800 Million records)
CommoditySupplierID (PK)
SupplierID (FK)
CommodityID (FK)
I am executing following query to list SupplierIDs having CommodityName 'SearchString'
SELECT SuppllierID FROM CommoditySupplier WHERE CommodityID IN (SELECT CommodityID FROM Commodity WHERE CommodityName Like '%SearchString%')
Problem is, it is extremely slow. It takes more than 30-40 minutes to execute. Sometimes even more than that.
please guide me to optimize the query. Or Should I change the structure in someway?
Thanks in advance.

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

DesNorton SSC-Insane Points: 24639 More actions · Answer 1

I would start by using an INNER JOIN, rather than a sub-query

SELECT      cs.SuppllierID
FROM        CommoditySupplier AS cs
INNER JOIN  Commodity         AS c
        ON  c.CommodityID = cs.CommodityID
WHERE       c.CommodityName LIKE '%SearchString%';

However, your LIKE '%xxxx%' will always force a table scan, which will make the query slow on such large tables.

You need to stop searching where CommodityName name "contains" some value, and start searching where CommodityName "startsWith" some value.

SELECT      cs.SuppllierID
FROM        CommoditySupplier AS cs
INNER JOIN  Commodity         AS c
        ON  c.CommodityID = cs.CommodityID
WHERE       c.CommodityName LIKE 'SearchString%'; -- NOTE no leading wild card

You can then add an index to assist with your search

CREATE NONCLUSTERED INDEX ix_Commodity_CommodityName
ON Commodity (CommodityName)
INCLUDE (CommodityID);

How to post data/code on a forum to get the best help.

DesNorton SSC-Insane Points: 24639 More actions · Answer 2

Because the logical query processing order will process the join before the where, it MIGHT improve performance slightly if you search for the Commodities before searching for the Suppliers.

IF OBJECT_ID(N'tempdb..#Commodity') IS NOT NULL
BEGIN
  DROP TABLE #Commodity;
END;
CREATE TABLE #Commodity (CommodityID int NOT NULL PRIMARY KEY CLUSTERED);

INSERT INTO #Commodity ( CommodityID )
SELECT CommodityID
FROM   Commodity
WHERE  CommodityName LIKE '%SearchString%'; -- THIS IS A BAD IDEA that will cause a full table scan

SELECT      cs.SuppllierID
FROM        #Commodity        AS c
INNER JOIN  CommoditySupplier AS cs
        ON  c.CommodityID = cs.CommodityID
GROUP BY    cs.SuppllierID;

How to post data/code on a forum to get the best help.

ScottPletcher SSC Guru Points: 101223 More actions · Answer 3

To help resolve this problem across the board, for all your queries, you really need to re-cluster the CommoditySupplier table. This is a classic case of the "default clustering by identity column" problem. Unfortunately, since it's such a large table, this will take quite a while. You'd almost certainly want to try it first in a test environment anyway. Change the clustered index as shown below, then retry the query. Review the query plan to make sure SQL is searching the Commodity table first, then doing (joined) seeks on the CommoditySupplier table.

--first, drop all nonclustered indexes (if any)

ALTER TABLE dbo.CommoditySupplier DROP CONSTRAINT PK_CommoditySupplier;

CREATE UNIQUE CLUSTERED INDEX CL_CommoditySupplier ON dbo.CommoditySupplier

( CommodityID, SuppliedID /*, CommoditySupplierID */

/*Add CSID *ONLY* IF *REQUIRED* TO MAKE THE FIRST TWO COLUMNS UNIQUE*/ )

WITH ( /*DATA_COMPRESSION = PAGE,*/ FILLFACTOR = 98, SORT_IN_TEMPDB = ON )

ON [<same_filegroup_name_as_original_PK_was_on>];

--re-create the PK, but nonclustered

ALTER TABLE dbo.CommoditySupplier ADD CONSTRAINT PK_CommoditySupplier PRIMARY KEY NONCLUSTERED ( CommoditySupplierID ) WITH ( /*DATA_COMPRESSION = ROW,*/ FILLFACTOR = 99, ... ) ON [...same_fg_name...];

--recreate all nonclustered indexes (if any)

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Jeff Moden SSC Guru Points: 1004683 More actions · Answer 4

ScottPletcher wrote:

To help resolve this problem across the board, for all your queries, you really need to re-cluster the CommoditySupplier table. This is a classic case of the "default clustering by identity column" problem.

Yeah... totally agree with that.

But... even that isn't going to help this query because the search criteria is using a leading wildcard character.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ScottPletcher SSC Guru Points: 101223 More actions · Answer 5

True, but that table only has 75K rows. Create an index with only CommodityID and CommodityName and, yes, SQL will still have to do an index scan, but the overall query still shouldn't take anywhere close to 30 mins, unless there are an unusually large number of matching IDs.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

The Dixie Flatline SSC Guru Points: 53354 More actions · Answer 6

Might be worthwhile to first query the commodities table and put the results into a #temp table indexed on commodityID, then join to the #temp table. Depends on how many results are usually returned from the %search%.

Apologies, Des. I didn't notice the PRIMARY KEY in your create table statement.

This reply was modified 6 years, 1 months ago by The Dixie Flatline.
This reply was modified 6 years, 1 months ago by The Dixie Flatline.

__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

Jonathan AC Roberts SSCoach Points: 18407 More actions · Answer 7

First I would try adding this index to table CommoditySupplier:

CREATE INDEX IX_CommoditySupplier_CommodityID_INC_SupplierID
          ON CommoditySupplier(CommodityID) INCLUDE (SupplierID);

If that doesn't improve performance enough I would, in addition to the above index, execute the query like this with the assistance of a temporary table.

IF OBJECT_ID('tempdb..#CommodityID','U') IS NOT NULL DROP TABLE #CommodityID

SELECT DISTINCT CommodityID 
  INTO #CommodityID
  FROM Commodity 
 WHERE CommodityName Like '%SearchString%'

 CREATE UNIQUE INDEX IX_#CommodityID_1 ON #CommodityID(CommodityID)

 SELECT cs.SupplierID 
   FROM CommoditySupplier cs
  INNER JOIN #CommodityID c
          ON c.CommodityID = cs.CommodityID
  
DROP TABLE #CommodityID