Need index suggestion

  • Hi All,

    Need index suggestion for the below query which can improve the performance. (Assume that table has enough data)

    SELECT SalesDate, SalesPersonID FROM dbo.Sales WHERE ProductID='PD1’ Order by productname;

    I Would like to create index as below. Am I correct or anything needs to be changed?
    CREATE INDEX <IDX___> ON dbo.Sales(ProductID) INCLUDE (SalesDate, SalesPersonID, productname)

    Thanks,

    Sam

  • Sam

    Looks like you need to do a bit of normalising first - you shouldn't have ProductID and productname in a table of sales.  Assuming changing that isn't an option, get rid of the ORDER BY clause, since all rows will have the same productname, won't they.  Since you've taken that column out of the query, you no longer need it in the index, either.

    John

  • samantha.sqldba - Thursday, November 29, 2018 7:08 AM

    Hi All,

    Need index suggestion for the below query which can improve the performance. (Assume that table has enough data)

    SELECT SalesDate, SalesPersonID FROM dbo.Sales WHERE ProductID='PD1’ Order by productname;

    I Would like to create index as below. Am I correct or anything needs to be changed?
    CREATE INDEX <IDX___> ON dbo.Sales(ProductID) INCLUDE (SalesDate, SalesPersonID, productname)

    Thanks,

    Sam

    CREATE INDEX <IDX___> ON dbo.Sales(ProductID,productname ) INCLUDE (SalesDate, SalesPersonId)

  • Thank you all.

  • John Mitchell-245523 - Thursday, November 29, 2018 7:16 AM

    Sam

    Looks like you need to do a bit of normalising first - you shouldn't have ProductID and productname in a table of sales.  Assuming changing that isn't an option, get rid of the ORDER BY clause, since all rows will have the same productname, won't they.  Since you've taken that column out of the query, you no longer need it in the index, either.

    John

    I'm thinking that having ProductID and ProductName in a sales table would actually be a form of denormalization for the purpose of rapid display.  Since sales tables border on the fringe of being a history table, it's not a problem but it's also not normalization.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jonathan AC Roberts - Thursday, November 29, 2018 7:51 AM

    samantha.sqldba - Thursday, November 29, 2018 7:08 AM

    Hi All,

    Need index suggestion for the below query which can improve the performance. (Assume that table has enough data)

    SELECT SalesDate, SalesPersonID FROM dbo.Sales WHERE ProductID='PD1’ Order by productname;

    I Would like to create index as below. Am I correct or anything needs to be changed?
    CREATE INDEX <IDX___> ON dbo.Sales(ProductID) INCLUDE (SalesDate, SalesPersonID, productname)

    Thanks,

    Sam

    CREATE INDEX <IDX___> ON dbo.Sales(ProductID,productname ) INCLUDE (SalesDate, SalesPersonId)

    I'm curious.  If you have the ProductID, why do you believe that ProductName needs to be a Key Column rather than just as an INCLUDE?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Friday, November 30, 2018 8:02 AM

    Jonathan AC Roberts - Thursday, November 29, 2018 7:51 AM

    samantha.sqldba - Thursday, November 29, 2018 7:08 AM

    Hi All,

    Need index suggestion for the below query which can improve the performance. (Assume that table has enough data)

    SELECT SalesDate, SalesPersonID FROM dbo.Sales WHERE ProductID='PD1’ Order by productname;

    I Would like to create index as below. Am I correct or anything needs to be changed?
    CREATE INDEX <IDX___> ON dbo.Sales(ProductID) INCLUDE (SalesDate, SalesPersonID, productname)

    Thanks,

    Sam

    CREATE INDEX <IDX___> ON dbo.Sales(ProductID,productname ) INCLUDE (SalesDate, SalesPersonId)

    I'm curious.  If you have the ProductID, why do you believe that ProductName needs to be a Key Column rather than just as an INCLUDE?

    It's needed for the "Order by productname"

  • Jonathan AC Roberts - Friday, November 30, 2018 8:21 AM

    Jeff Moden - Friday, November 30, 2018 8:02 AM

    Jonathan AC Roberts - Thursday, November 29, 2018 7:51 AM

    samantha.sqldba - Thursday, November 29, 2018 7:08 AM

    Hi All,

    Need index suggestion for the below query which can improve the performance. (Assume that table has enough data)

    SELECT SalesDate, SalesPersonID FROM dbo.Sales WHERE ProductID='PD1’ Order by productname;

    I Would like to create index as below. Am I correct or anything needs to be changed?
    CREATE INDEX <IDX___> ON dbo.Sales(ProductID) INCLUDE (SalesDate, SalesPersonID, productname)

    Thanks,

    Sam

    CREATE INDEX <IDX___> ON dbo.Sales(ProductID,productname ) INCLUDE (SalesDate, SalesPersonId)

    I'm curious.  If you have the ProductID, why do you believe that ProductName needs to be a Key Column rather than just as an INCLUDE?

    It's needed for the "Order by productname"

    Understood but I don't believe it needs to be a part of the key to be effective for sorting.  The good part is, you got me thinking about this and I'm going to play with it a bit over the weekend.  Thanks, Jonathan.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • No, I agree - it would need to be the first column in the key.  But like I said before, it's not needed at all, unless the product ID PD1 has more than one product name.

    John

  • John Mitchell-245523 - Friday, November 30, 2018 8:43 AM

    No, I agree - it would need to be the first column in the key.  But like I said before, it's not needed at all, unless the product ID PD1 has more than one product name.

    John

    Because ProductId is specified in the SQL as a fixed value, SQL Server can find the product key and read the values in order to end up with the sort already done. At least the optimiser should be able to work that out. So I think the order of the columns in the index is correct.
    SELECT SalesDate, SalesPersonID FROM dbo.Sales WHERE ProductID='PD1’ Order by productname;
    You're right about it not being needed if there is only one productname per ProductId, but then I thought to myself the Order by would not needed either. So I assumed that there is more than one productname per productId.

  • John Mitchell-245523 - Friday, November 30, 2018 8:43 AM

    No, I agree - it would need to be the first column in the key.  But like I said before, it's not needed at all, unless the product ID PD1 has more than one product name.

    John

    This demonstrates it works, it takes a few minutes to execute though:

    if OBJECT_ID('tempdb..#TestTable','U') IS NOT NULL
      drop table #TestTable
    go
    select t2.N ProductId,convert(varchar(50),NEWID()) productName, getdate() SalesDate, t1.N SalesPersonId
    into #TestTable
    from dbo.tally t1
    inner join dbo.tally t2 on t2.N <= 10
    inner join dbo.tally t3 on t3.N <= 10
    inner join dbo.tally t4 on t4.N <= 100
    inner join dbo.tally t5 on t5.N <= 10
    inner join dbo.tally t6 on t6.N <= 10
    inner join dbo.tally t7 on t7.N <= 10
    where t1.N <= 10

    create index IX_#TestTable_1 on #TestTable(ProductId) INCLUDE (productName, SalesDate, SalesPersonId)
    set statistics io,time on
    select top(100) * from #TestTable WHERE ProductId = 3 ORDER BY productName
    set statistics io,time off
    drop index IX_#TestTable_1 on #TestTable

    create index IX_#TestTable_2 on #TestTable(ProductId, productName) INCLUDE (SalesDate, SalesPersonId)
    set statistics io,time on
    select top(100) * from #TestTable WHERE ProductId = 3 ORDER BY productName

    Query with index keyed only on ProductId
    SQL Server Execution Times:
     CPU time = 3293 ms, elapsed time = 1036 ms.

    Query with index keyed on ProductId, productName
    SQL Server Execution Times:
     CPU time = 0 ms, elapsed time = 35 ms.

  • Heh... ok.  I'm confused.  Here's the original query posted by the OP. 

    SELECT SalesDate, SalesPersonID FROM dbo.Sales WHERE ProductID='PD1’ Order by productname;

    Can someone please tell me why anyone would find the output given by the above query to be more useful because it was sorted by ProductName?  Without having the ProductName in the output, the sort is totally useless to anyone having to work with the output.


    --=================================================================================================
    --      Create the test table to simulate the real one according to what we know.
    --      Please refer to the following articles for how to quickly make gobs of random but
    --      constrained data.
    --      http://www.sqlservercentral.com/articles/Data+Generation/87901/
    --      http://www.sqlservercentral.com/articles/Test+Data/88964/
    --=================================================================================================
    --===== If the test table already exists, drop it to make reruns in SSMS easier.
         IF OBJECT_ID('tempdb..#Sales','U') IS NOT NULL
       DROP TABLE #Sales
    ;
    --===== Create and populate the test table on-the-fly.
         -- Depending on your machine, this will take somewhere between 24 and 40 seconds to create the
         -- 5 million rows. Details are in the tail comments.
       WITH cteGenData AS
    (
     SELECT TOP 5000000
             SalesDate      = RAND(CHECKSUM(NEWID()))           -- 2010-01-01 up to 2020-01-01
                            * DATEDIFF(dd,'2010','2020')        -- (Temporal Closed/Open)
                            + CONVERT(DATETIME,'2010')
            ,SalesPersonID  = ABS(CHECKSUM(NEWID())%100)+1      -- 1 thru 100
            ,ProductNo      = ABS(CHECKSUM(NEWID())%50)+1       -- 1 thru 50
       FROM      sys.all_columns ac1  --Cross Join forms a "Pseudo Cursor" row source which
      CROSS JOIN sys.all_columns ac2  --replaces a loop
    ) --=== Finalize the output for the test table.
         -- The ISNULLs make the columns NOT NULL in the test table.
         -- Note that the ProductID and ProductName will always match, just like in real life.
     SELECT  SalesDate      = ISNULL(SalesDate,0)
            ,SalesPersonID  = ISNULL(SalesPersonID,0)
            ,ProductID      = ISNULL(
                                CHAR(ProductNo%26+65)           -- First letter based on ProductNo
                                + CONVERT(VARCHAR(10),ProductNo)
                              ,'')
            ,ProductName    = ISNULL(
                                CHAR(ProductNo%26+65)           -- First letter based on ProductNo
                                +'-Product Name # '             -- Literal
                                + CONVERT(VARCHAR(10),ProductNo)-- ProductNo converted to characters
                              ,'')
            ,OtherColumns   = CONVERT(CHAR(100),'Simto other columns bulk')
       INTO #Sales
       FROM cteGenData
      ORDER BY SalesDate
    ;
    --===== Let's see the first 100 rows so we know what we're dealing with.
     SELECT TOP 100 *
       FROM #Sales
    ;
    --=================================================================================================
    --      The following is identical to the original query except for the literal for ProductID.
    --      So, here's my question... DOES IT MAKE ANY SENSE AT ALL TO SORT THIS OUTPUT BY PRODUCTNAME?
    --      Without anything having to do with the product in the SELECT list, ordering by ProductName
    --      makes no sense to me what-so-ever!
    --=================================================================================================
     SELECT  SalesDate
            ,SalesPersonID
       FROM #Sales
      WHERE ProductID = 'U20'
      ORDER BY ProductName
    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jonathan AC Roberts - Friday, November 30, 2018 10:11 AM

    John Mitchell-245523 - Friday, November 30, 2018 8:43 AM

    No, I agree - it would need to be the first column in the key.  But like I said before, it's not needed at all, unless the product ID PD1 has more than one product name.

    John

    This demonstrates it works, it takes a few minutes to execute though:

    if OBJECT_ID('tempdb..#TestTable','U') IS NOT NULL
      drop table #TestTable
    go
    select t2.N ProductId,convert(varchar(50),NEWID()) productName, getdate() SalesDate, t1.N SalesPersonId
    into #TestTable
    from dbo.tally t1
    inner join dbo.tally t2 on t2.N <= 10
    inner join dbo.tally t3 on t3.N <= 10
    inner join dbo.tally t4 on t4.N <= 100
    inner join dbo.tally t5 on t5.N <= 10
    inner join dbo.tally t6 on t6.N <= 10
    inner join dbo.tally t7 on t7.N <= 10
    where t1.N <= 10

    create index IX_#TestTable_1 on #TestTable(ProductId) INCLUDE (productName, SalesDate, SalesPersonId)
    set statistics io,time on
    select top(100) * from #TestTable WHERE ProductId = 3 ORDER BY productName
    set statistics io,time off
    drop index IX_#TestTable_1 on #TestTable

    create index IX_#TestTable_2 on #TestTable(ProductId, productName) INCLUDE (SalesDate, SalesPersonId)
    set statistics io,time on
    select top(100) * from #TestTable WHERE ProductId = 3 ORDER BY productName

    Query with index keyed only on ProductId
    SQL Server Execution Times:
     CPU time = 3293 ms, elapsed time = 1036 ms.

    Query with index keyed on ProductId, productName
    SQL Server Execution Times:
     CPU time = 0 ms, elapsed time = 35 ms.

    Mmmm... good shout.  One test, a thousand opinions and all that!

    John

  • Still looking for why anyone would find the sorting to be useful considering the output and the fact that there's noting to preserve that order if used elsewhere.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Monday, December 3, 2018 8:41 AM

    Still looking for why anyone would find the sorting to be useful considering the output and the fact that there's noting to preserve that order if used elsewhere.

    You are correct. The OP's accepted answer is: "get rid of the ORDER BY clause, since all rows will have the same productname"

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply