Need index suggestion

Question

Need index suggestion

samantha.sqldba

SSC Journeyman

Points: 80
More actions
November 29, 2018 at 7:08 am

#376494

Hi All,
Need index suggestion for the below query which can improve the performance. (Assume that table has enough data)
SELECT SalesDate, SalesPersonID FROM dbo.Sales WHERE ProductID='PD1’ Order by productname;
I Would like to create index as below. Am I correct or anything needs to be changed?
CREATE INDEX <IDX___> ON dbo.Sales(ProductID) INCLUDE (SalesDate, SalesPersonID, productname)
Thanks,
Sam

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 1

Sam

Looks like you need to do a bit of normalising first - you shouldn't have ProductID and productname in a table of sales. Assuming changing that isn't an option, get rid of the ORDER BY clause, since all rows will have the same productname, won't they. Since you've taken that column out of the query, you no longer need it in the index, either.

John

Jonathan AC Roberts SSCoach Points: 18263 More actions · Answer 2

samantha.sqldba - Thursday, November 29, 2018 7:08 AM
Hi All,
Need index suggestion for the below query which can improve the performance. (Assume that table has enough data)
SELECT SalesDate, SalesPersonID FROM dbo.Sales WHERE ProductID='PD1â€™ Order by productname;
I Would like to create index as below. Am I correct or anything needs to be changed?
CREATE INDEX <IDX___> ON dbo.Sales(ProductID) INCLUDE (SalesDate, SalesPersonID, productname)
Thanks,
Sam

CREATE INDEX <IDX___> ON dbo.Sales(ProductID,productname ) INCLUDE (SalesDate, SalesPersonId)

samantha.sqldba SSC Journeyman Points: 80 More actions · Answer 3

samantha.sqldba

SSC Journeyman

Points: 80

November 29, 2018 at 7:54 pm

#2015038

Thank you all.

Jeff Moden SSC Guru Points: 1003872 More actions · Answer 4

John Mitchell-245523 - Thursday, November 29, 2018 7:16 AM
Sam
Looks like you need to do a bit of normalising first - you shouldn't have ProductID and productname in a table of sales. Assuming changing that isn't an option, get rid of the ORDER BY clause, since all rows will have the same productname, won't they. Since you've taken that column out of the query, you no longer need it in the index, either.
John

I'm thinking that having ProductID and ProductName in a sales table would actually be a form of denormalization for the purpose of rapid display. Since sales tables border on the fringe of being a history table, it's not a problem but it's also not normalization.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1003872 More actions · Answer 5

Jonathan AC Roberts - Thursday, November 29, 2018 7:51 AM
samantha.sqldba - Thursday, November 29, 2018 7:08 AM
Hi All,
Need index suggestion for the below query which can improve the performance. (Assume that table has enough data)
SELECT SalesDate, SalesPersonID FROM dbo.Sales WHERE ProductID='PD1â€™ Order by productname;
I Would like to create index as below. Am I correct or anything needs to be changed?
CREATE INDEX <IDX___> ON dbo.Sales(ProductID) INCLUDE (SalesDate, SalesPersonID, productname)
Thanks,
Sam
CREATE INDEX <IDX___> ON dbo.Sales(ProductID,productname ) INCLUDE (SalesDate, SalesPersonId)

I'm curious. If you have the ProductID, why do you believe that ProductName needs to be a Key Column rather than just as an INCLUDE?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jonathan AC Roberts SSCoach Points: 18263 More actions · Answer 6

Jeff Moden - Friday, November 30, 2018 8:02 AM
Jonathan AC Roberts - Thursday, November 29, 2018 7:51 AM
samantha.sqldba - Thursday, November 29, 2018 7:08 AM
Hi All,
Need index suggestion for the below query which can improve the performance. (Assume that table has enough data)
SELECT SalesDate, SalesPersonID FROM dbo.Sales WHERE ProductID='PD1â€™ Order by productname;
I Would like to create index as below. Am I correct or anything needs to be changed?
CREATE INDEX <IDX___> ON dbo.Sales(ProductID) INCLUDE (SalesDate, SalesPersonID, productname)
Thanks,
Sam
CREATE INDEX <IDX___> ON dbo.Sales(ProductID,productname ) INCLUDE (SalesDate, SalesPersonId)
I'm curious. If you have the ProductID, why do you believe that ProductName needs to be a Key Column rather than just as an INCLUDE?

It's needed for the "Order by productname"

Jeff Moden SSC Guru Points: 1003872 More actions · Answer 7

Jonathan AC Roberts - Friday, November 30, 2018 8:21 AM
Jeff Moden - Friday, November 30, 2018 8:02 AM
Jonathan AC Roberts - Thursday, November 29, 2018 7:51 AM
samantha.sqldba - Thursday, November 29, 2018 7:08 AM
Hi All,
Need index suggestion for the below query which can improve the performance. (Assume that table has enough data)
SELECT SalesDate, SalesPersonID FROM dbo.Sales WHERE ProductID='PD1â€™ Order by productname;
I Would like to create index as below. Am I correct or anything needs to be changed?
CREATE INDEX <IDX___> ON dbo.Sales(ProductID) INCLUDE (SalesDate, SalesPersonID, productname)
Thanks,
Sam
CREATE INDEX <IDX___> ON dbo.Sales(ProductID,productname ) INCLUDE (SalesDate, SalesPersonId)
I'm curious. If you have the ProductID, why do you believe that ProductName needs to be a Key Column rather than just as an INCLUDE?
It's needed for the "Order by productname"

Understood but I don't believe it needs to be a part of the key to be effective for sorting. The good part is, you got me thinking about this and I'm going to play with it a bit over the weekend. Thanks, Jonathan.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 8

No, I agree - it would need to be the first column in the key. But like I said before, it's not needed at all, unless the product ID PD1 has more than one product name.

John

Jonathan AC Roberts SSCoach Points: 18263 More actions · Answer 9

John Mitchell-245523 - Friday, November 30, 2018 8:43 AM
No, I agree - it would need to be the first column in the key. But like I said before, it's not needed at all, unless the product ID PD1 has more than one product name.
John

Because ProductId is specified in the SQL as a fixed value, SQL Server can find the product key and read the values in order to end up with the sort already done. At least the optimiser should be able to work that out. So I think the order of the columns in the index is correct.
SELECT SalesDate, SalesPersonID FROM dbo.Sales WHERE ProductID='PD1â€™ Order by productname;
You're right about it not being needed if there is only one productname per ProductId, but then I thought to myself the Order by would not needed either. So I assumed that there is more than one productname per productId.

Jonathan AC Roberts SSCoach Points: 18263 More actions · Answer 10

John Mitchell-245523 - Friday, November 30, 2018 8:43 AM
No, I agree - it would need to be the first column in the key. But like I said before, it's not needed at all, unless the product ID PD1 has more than one product name.
John

This demonstrates it works, it takes a few minutes to execute though:

if OBJECT_ID('tempdb..#TestTable','U') IS NOT NULL drop table #TestTable go select t2.N ProductId,convert(varchar(50),NEWID()) productName, getdate() SalesDate, t1.N SalesPersonId into #TestTable from dbo.tally t1 inner join dbo.tally t2 on t2.N <= 10 inner join dbo.tally t3 on t3.N <= 10 inner join dbo.tally t4 on t4.N <= 100 inner join dbo.tally t5 on t5.N <= 10 inner join dbo.tally t6 on t6.N <= 10 inner join dbo.tally t7 on t7.N <= 10 where t1.N <= 10

create index IX_#TestTable_1 on #TestTable(ProductId) INCLUDE (productName, SalesDate, SalesPersonId) set statistics io,time on select top(100) * from #TestTable WHERE ProductId = 3 ORDER BY productName set statistics io,time off drop index IX_#TestTable_1 on #TestTable

create index IX_#TestTable_2 on #TestTable(ProductId, productName) INCLUDE (SalesDate, SalesPersonId) set statistics io,time on select top(100) * from #TestTable WHERE ProductId = 3 ORDER BY productName

Query with index keyed only on ProductId
SQL Server Execution Times:
CPU time = 3293 ms, elapsed time = 1036 ms.

Query with index keyed on ProductId, productName
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 35 ms.

Jeff Moden SSC Guru Points: 1003872 More actions · Answer 11

Heh... ok. I'm confused. Here's the original query posted by the OP.

SELECT SalesDate, SalesPersonID FROM dbo.Sales WHERE ProductID='PD1’ Order by productname;

Can someone please tell me why anyone would find the output given by the above query to be more useful because it was sorted by ProductName? Without having the ProductName in the output, the sort is totally useless to anyone having to work with the output.

--================================================================================================= -- Create the test table to simulate the real one according to what we know. -- Please refer to the following articles for how to quickly make gobs of random but -- constrained data. -- http://www.sqlservercentral.com/articles/Data+Generation/87901/ -- http://www.sqlservercentral.com/articles/Test+Data/88964/ --================================================================================================= --===== If the test table already exists, drop it to make reruns in SSMS easier. IF OBJECT_ID('tempdb..#Sales','U') IS NOT NULL DROP TABLE #Sales ; --===== Create and populate the test table on-the-fly. -- Depending on your machine, this will take somewhere between 24 and 40 seconds to create the -- 5 million rows. Details are in the tail comments. WITH cteGenData AS ( SELECT TOP 5000000 SalesDate = RAND(CHECKSUM(NEWID())) -- 2010-01-01 up to 2020-01-01 * DATEDIFF(dd,'2010','2020') -- (Temporal Closed/Open) + CONVERT(DATETIME,'2010') ,SalesPersonID = ABS(CHECKSUM(NEWID())%100)+1 -- 1 thru 100 ,ProductNo = ABS(CHECKSUM(NEWID())%50)+1 -- 1 thru 50 FROM sys.all_columns ac1 --Cross Join forms a "Pseudo Cursor" row source which CROSS JOIN sys.all_columns ac2 --replaces a loop ) --=== Finalize the output for the test table. -- The ISNULLs make the columns NOT NULL in the test table. -- Note that the ProductID and ProductName will always match, just like in real life. SELECT SalesDate = ISNULL(SalesDate,0) ,SalesPersonID = ISNULL(SalesPersonID,0) ,ProductID = ISNULL( CHAR(ProductNo%26+65) -- First letter based on ProductNo + CONVERT(VARCHAR(10),ProductNo) ,'') ,ProductName = ISNULL( CHAR(ProductNo%26+65) -- First letter based on ProductNo +'-Product Name # ' -- Literal + CONVERT(VARCHAR(10),ProductNo)-- ProductNo converted to characters ,'') ,OtherColumns = CONVERT(CHAR(100),'Simto other columns bulk') INTO #Sales FROM cteGenData ORDER BY SalesDate ; --===== Let's see the first 100 rows so we know what we're dealing with. SELECT TOP 100 * FROM #Sales ; --================================================================================================= -- The following is identical to the original query except for the literal for ProductID. -- So, here's my question... DOES IT MAKE ANY SENSE AT ALL TO SORT THIS OUTPUT BY PRODUCTNAME? -- Without anything having to do with the product in the SELECT list, ordering by ProductName -- makes no sense to me what-so-ever! --================================================================================================= SELECT SalesDate ,SalesPersonID FROM #Sales WHERE ProductID = 'U20' ORDER BY ProductName ;

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 12

Jonathan AC Roberts - Friday, November 30, 2018 10:11 AM
John Mitchell-245523 - Friday, November 30, 2018 8:43 AM
No, I agree - it would need to be the first column in the key. But like I said before, it's not needed at all, unless the product ID PD1 has more than one product name.
John
This demonstrates it works, it takes a few minutes to execute though:
if OBJECT_ID('tempdb..#TestTable','U') IS NOT NULL drop table #TestTable go select t2.N ProductId,convert(varchar(50),NEWID()) productName, getdate() SalesDate, t1.N SalesPersonId into #TestTable from dbo.tally t1 inner join dbo.tally t2 on t2.N <= 10 inner join dbo.tally t3 on t3.N <= 10 inner join dbo.tally t4 on t4.N <= 100 inner join dbo.tally t5 on t5.N <= 10 inner join dbo.tally t6 on t6.N <= 10 inner join dbo.tally t7 on t7.N <= 10 where t1.N <= 10
create index IX_#TestTable_1 on #TestTable(ProductId) INCLUDE (productName, SalesDate, SalesPersonId) set statistics io,time on select top(100) * from #TestTable WHERE ProductId = 3 ORDER BY productName set statistics io,time off drop index IX_#TestTable_1 on #TestTable
create index IX_#TestTable_2 on #TestTable(ProductId, productName) INCLUDE (SalesDate, SalesPersonId) set statistics io,time on select top(100) * from #TestTable WHERE ProductId = 3 ORDER BY productName
Query with index keyed only on ProductId
SQL Server Execution Times:
CPU time = 3293 ms, elapsed time = 1036 ms.
Query with index keyed on ProductId, productName
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 35 ms.

Mmmm... good shout. One test, a thousand opinions and all that!

John

Jeff Moden SSC Guru Points: 1003872 More actions · Answer 13

Still looking for why anyone would find the sorting to be useful considering the output and the fact that there's noting to preserve that order if used elsewhere.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jonathan AC Roberts SSCoach Points: 18263 More actions · Answer 14

Jeff Moden - Monday, December 3, 2018 8:41 AM
Still looking for why anyone would find the sorting to be useful considering the output and the fact that there's noting to preserve that order if used elsewhere.

You are correct. The OP's accepted answer is: "get rid of the ORDER BY clause, since all rows will have the same productname"