Recommended index causing query to run twice as long / HASH vs NESTED LOOP

  • Hey all,

    Been working on some performance tuning and running into a wall at this point. I've taken some initial passes at rewriting the query in question and have reduced run time without touching indexing by more than half (from 40 to 16 seconds). When viewing the missing index recommendations in all three spots (DTA, DMV's, execution plan), SQL is telling me that I have an opportunity for a better index for this query.

    After review of said recommendation, it's pretty straight forward that it's a better index than the existing (or at least I thought). The existing index that is used is twice as large and the order isn't as ideal; whereas the new index is much more focused on the query at hand. So, seems like a no brainer to go ahead and add this index yeah? I've done so and have experimented a multitude of ways; however now that the optimizer wants to use this new index, run time is now averaging around 33 seconds!?

    Looking at the execution plans, there is a distinct change ... with the new index, it has added an additional Nested Loop to the mix vs. in the old, there were just the two Hash Match (Aggregate)'s. I can't say I'm overly educated on all the differences; however I tried adding JOIN hints and specifying each type only resulted in an even longer query time.

    At this stage, I'm simply trying to better understand why my plan looks worse with a better index and why the nested loop was introduced into the mix.

    Thank you!

    Execution plans (top is pre index, bottom is post index):

    [/URL]

    Query:

    SELECT

    YEAR(i.[InvoiceDate])AS [Year]

    ,MONTH(i.[InvoiceDate])AS [TimePeriod]

    ,SUM(i.[Amount] * ISNULL(cur.[Rate],1))AS [Revenue]

    FROM dbo.[ftInvoices] i

    JOIN dbo.[DimCurrency] cur

    ON i.[CurrencyID] = cur.[CurrencyID]

    WHERE i.[Amount] > 0

    AND i.[Quantity] > 0

    AND i.[Cost] > 0

    AND i.[InvoiceDate] BETWEEN '01/01/0001' AND '12/31/9999'

    GROUP BY YEAR(i.[InvoiceDate]), MONTH(i.[InvoiceDate])

    Original Index:

    CREATE INDEX [ix_InvoiceDate_Quantity_Amount] ON [dbo].[ftInvoices]

    (

    [InvoiceDate] ASC,

    [Quantity] ASC,

    [Amount] ASC

    )

    INCLUDE ( [ContractPricingFlag],

    [Cost],

    [CurrencyID],

    [CustID],

    [DimFourId],

    [DimOneId],

    [DimThreeId],

    [DimTwoId],

    [ExchangeRate],

    [Exclude],

    [ExtendedCost],

    [InvoiceNum],

    [LineNum],

    [ListPrice],

    [ProdID],

    [UOMID])

    New Index:

    CREATE NONCLUSTERED INDEX [IX__ftInvoices__CurrencyID__InvoiceDate__Quantity__Amount__Cost]

    ON [dbo].[ftInvoices] ([CurrencyID],[InvoiceDate],[Quantity],[Amount],[Cost])

  • How many records is this query returning?

    Also, please post the actual execution plans, table definitions and sample data please.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Without the execution plan to look at details, it's hard to know. Recommended indexes are just suggestions. First, I'd take a look at the SELECT operator to see if you're looking at full optimization or something else. Calculations such as you're doing for the GROUP BY will prevent efficient use of the indexes. If you need to group in this fashion, I would suggest storing the data in columns already in the format you need. Then you can index those columns.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thanks for the responses. The execution plans were and are in the original post... Do you not see them? Once my way in to work, but I see them on my mobile. Anyway I'll add more response when I get to my desk.

  • That's a screen capture of an execution plan. Not the execution plan. There are properties behind all those operators that tell a more complete store than just the graphical pictures.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Sorry for the confusion, see attached.

    Thanks

  • Grant Fritchey (3/13/2013)


    Without the execution plan to look at details, it's hard to know. Recommended indexes are just suggestions. First, I'd take a look at the SELECT operator to see if you're looking at full optimization or something else. Calculations such as you're doing for the GROUP BY will prevent efficient use of the indexes. If you need to group in this fashion, I would suggest storing the data in columns already in the format you need. Then you can index those columns.

    Hi Grant, can you elaborate on your statement about "storing the data in columns already in the format you need"? You're referencing using the GROUP BY YEAR/MONTH correct? I'm curious to better understand how I can improve this.

    Thanks

  • Also, I have been successful with recreating the performance of the original index simply by moving the InvoiceDate to the first position of the index. Typically I have targeted the join column as the first position; however in this situation, the performance tells otherwise. I'm assuming this is because of the date functions on that column?

  • What I meant was store the Month & Day values as those, either during the insert/update process or as calculated columns. Performing functions on columns in any kind of criteria situation (JOIN, WHERE, GROUP BY, ORDER BY) prevents good index use. So if you need data in that format, store it that way.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey (3/13/2013)


    What I meant was store the Month & Day values as those, either during the insert/update process or as calculated columns. Performing functions on columns in any kind of criteria situation (JOIN, WHERE, GROUP BY, ORDER BY) prevents good index use. So if you need data in that format, store it that way.

    To elaborate a bit, looking at your query and index this looks like it is part of a data warehouse? A typical data warehouse has a DimDate table, which would have columns for the calendar date, calendar year, calendar month, maybe simmilar columns for accounting year and month, etc. that describe this date. Then, joining from the Invoices table to the DimDate table would make it easier to do the grouping without having functions in your GROUP BY clause, and you'd only need the index on Year,Month once on DimDate instead of each fact table that had a date you might want to group on.

    For a great resource on dimensional modeling, check out this and other Kimball Group articles:

    http://www.kimballgroup.com/2009/05/29/the-10-essential-rules-of-dimensional-modeling/

  • Chris Harshman (3/13/2013)


    Grant Fritchey (3/13/2013)


    What I meant was store the Month & Day values as those, either during the insert/update process or as calculated columns. Performing functions on columns in any kind of criteria situation (JOIN, WHERE, GROUP BY, ORDER BY) prevents good index use. So if you need data in that format, store it that way.

    To elaborate a bit, looking at your query and index this looks like it is part of a data warehouse? A typical data warehouse has a DimDate table, which would have columns for the calendar date, calendar year, calendar month, maybe simmilar columns for accounting year and month, etc. that describe this date. Then, joining from the Invoices table to the DimDate table would make it easier to do the grouping without having functions in your GROUP BY clause, and you'd only need the index on Year,Month once on DimDate instead of each fact table that had a date you might want to group on.

    For a great resource on dimensional modeling, check out this and other Kimball Group articles:

    http://www.kimballgroup.com/2009/05/29/the-10-essential-rules-of-dimensional-modeling/

    Thanks for that explanation. That makes a lot of sense, and I hadn't considered that. I hope the OP finds it as useful.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Chris Harshman (3/13/2013)


    Grant Fritchey (3/13/2013)


    What I meant was store the Month & Day values as those, either during the insert/update process or as calculated columns. Performing functions on columns in any kind of criteria situation (JOIN, WHERE, GROUP BY, ORDER BY) prevents good index use. So if you need data in that format, store it that way.

    To elaborate a bit, looking at your query and index this looks like it is part of a data warehouse? A typical data warehouse has a DimDate table, which would have columns for the calendar date, calendar year, calendar month, maybe simmilar columns for accounting year and month, etc. that describe this date. Then, joining from the Invoices table to the DimDate table would make it easier to do the grouping without having functions in your GROUP BY clause, and you'd only need the index on Year,Month once on DimDate instead of each fact table that had a date you might want to group on.

    For a great resource on dimensional modeling, check out this and other Kimball Group articles:

    http://www.kimballgroup.com/2009/05/29/the-10-essential-rules-of-dimensional-modeling/

    Oh the wonderful world of a "typical" environment 🙂 We're actively underway with a revamp; however for now, there is no Date dimension ... soon!

  • I'm already quite far passed this original thread as I'm exploring many other options for performance; however my original question does still stand that I'm still cloudy on. I posted the execution plans too ... curious if either of you have any feedback on why the major difference. My assumption at this time is really the order of the columns on the index. What SQL was telling me vs. what performs better, are definitely different. Historically I would usually focus my first column of an index based on the join, which is what I and SQL wanted to do here; however performance is much better if it's on the date which I believe is due to the datepart functions and grouping.

  • Adam Bean (3/13/2013)


    ...curious if either of you have any feedback on why the major difference. My assumption at this time is really the order of the columns on the index. What SQL was telling me vs. what performs better, are definitely different. Historically I would usually focus my first column of an index based on the join, which is what I and SQL wanted to do here; however performance is much better if it's on the date which I believe is due to the datepart functions and grouping.

    My best guess based on the execution plans and indexes you've shared is that the performance difference is because of the importance of the InvoiceDate. Since you are using that date in the GROUP BY(even if through functions) SQL Server can better utilize the data in its indexed order if InvoiceDate is first than if CurrencyID is first. While it is generally true that indexes on the join columns will improve performance, in this case the limmiting factor of what rows are in the resultset and how the resultset is computed is through the WHERE and GROUP BY clause references to ftInvoices table, and the DimCurrency table is just describing the data from ftInvoice. If there were WHERE clause data on DimCurrency that significantly reduced the expected resultset then it would be optimal to have the CurrencyID as the first column in the index on ftInvoice.

  • Adam hit it right on the head.

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply