Baffling query plan on slow query

  • ScottPletcher (12/2/2014)


    Robert Frasca (12/2/2014)


    ScottPletcher (12/2/2014)


    Robert Frasca (12/2/2014)


    ScottPletcher (12/1/2014)


    I guess the SourceFactTimeCard table is fairly wide?

    I suggest creating a covering index on:

    SourceFactTimeCard ( TimecardIndex, EmployeeID )

    If SQL then uses that covering index -- it should -- but still does a loop/"look up" join rather than a MERGE join between s and t, I'd try forcing a MERGE join just to see if you get better performance, i.e.:

    ...

    FROM StageLedgDesc s

    INNER MERGE JOIN [RG_TestDataMart].[dbo].[SourceFactTimeCard] t ON t.TimecardIndex = s.lindex

    ...

    This worked perfectly. Thanks for the suggestion.

    Which? Just the covering index? Or did explicitly specifying "MERGE" also gain performance? I'd like to know to better my understanding of SQL plans/performance.

    Edit: Also, what was the new query elapsed time?

    The covering index worked. The new execution time on one of the big queries dropped to 11 minutes including returning all 19 million rows to the screen. (I was on a machine with more memory.) I'm running the full job now including the part that writes the rows to the fact table and the smaller 2 million row query loaded in about 4 minutes soup-to-nuts. This is a dev server without much memory or storage. We have all solid state storage in production so I expect that to run much faster there. I have high hopes that the bigger table will take on the order of an hour and the biggest table to take on the order of three hours in dev. Unfortunately, I have no control over storage in the dev environment as it is a virtual server. I'm actually using this job to lobby management for solid state storage in the test environment to mirror production.

    It's funny, the original query said it was doing an index seek. The new query plan does the same thing but now it's using the covering index. My best guess is what you suggested. The SourceFactTimeCard table was fairly wide as it contained about 30 measures. If this hadn't worked I was going to try creating a new version of SourceFactTimeCard with just the primary key and the employee id, an informal bridge table as it were. The fact table that I'm building is a factless fact table designed to capture narratives associated with a time card and nothing else but I was trying to reuse the SourceFactTimeCard as that is the source for several other subject matter specific fact tables.

    Interesting.

    Did you try forcing a MERGE join just to see the results? I'd be curious to know if SQL can use a MERGE join and, if so, how it performs. Thanks for following up on this.

    And if you do that, will it use the index in an ordered fashion or will it put a Sort operation in. If the latter, I suspect you'll see much worse performance. But I'm very interested too.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey (12/2/2014)


    ScottPletcher (12/2/2014)


    Robert Frasca (12/2/2014)


    ScottPletcher (12/2/2014)


    Robert Frasca (12/2/2014)


    ScottPletcher (12/1/2014)


    I guess the SourceFactTimeCard table is fairly wide?

    I suggest creating a covering index on:

    SourceFactTimeCard ( TimecardIndex, EmployeeID )

    If SQL then uses that covering index -- it should -- but still does a loop/"look up" join rather than a MERGE join between s and t, I'd try forcing a MERGE join just to see if you get better performance, i.e.:

    ...

    FROM StageLedgDesc s

    INNER MERGE JOIN [RG_TestDataMart].[dbo].[SourceFactTimeCard] t ON t.TimecardIndex = s.lindex

    ...

    This worked perfectly. Thanks for the suggestion.

    Which? Just the covering index? Or did explicitly specifying "MERGE" also gain performance? I'd like to know to better my understanding of SQL plans/performance.

    Edit: Also, what was the new query elapsed time?

    The covering index worked. The new execution time on one of the big queries dropped to 11 minutes including returning all 19 million rows to the screen. (I was on a machine with more memory.) I'm running the full job now including the part that writes the rows to the fact table and the smaller 2 million row query loaded in about 4 minutes soup-to-nuts. This is a dev server without much memory or storage. We have all solid state storage in production so I expect that to run much faster there. I have high hopes that the bigger table will take on the order of an hour and the biggest table to take on the order of three hours in dev. Unfortunately, I have no control over storage in the dev environment as it is a virtual server. I'm actually using this job to lobby management for solid state storage in the test environment to mirror production.

    It's funny, the original query said it was doing an index seek. The new query plan does the same thing but now it's using the covering index. My best guess is what you suggested. The SourceFactTimeCard table was fairly wide as it contained about 30 measures. If this hadn't worked I was going to try creating a new version of SourceFactTimeCard with just the primary key and the employee id, an informal bridge table as it were. The fact table that I'm building is a factless fact table designed to capture narratives associated with a time card and nothing else but I was trying to reuse the SourceFactTimeCard as that is the source for several other subject matter specific fact tables.

    Interesting.

    Did you try forcing a MERGE join just to see the results? I'd be curious to know if SQL can use a MERGE join and, if so, how it performs. Thanks for following up on this.

    And if you do that, will it use the index in an ordered fashion or will it put a Sort operation in. If the latter, I suspect you'll see much worse performance. But I'm very interested too.

    I tried the MERGE JOIN on the smaller (2 million) dataset and it was about 10 percent slower than the fastest. As Grant said, it did throw a sort operation in and that was listed as 86% of the effort. This is with the covering index still in place.

    The size of the dataset is a significant component. While the SSIS job is running faster it is still too slow. As I said, the smaller dataset ran about 4 minutes soup to nuts. The larger dataset is has been running nearly two hours but has only stored about 3 million rows. The SSIS service is sucking up about 25% of the CPU on a steady basis but there isn't much activity in SQL Server. Starting to think that a significant part of this is an underpowered virtual server/BIOS. Of course, proving that is the hard part...

    "Beliefs" get in the way of learning.

  • Robert Frasca (12/2/2014)


    Grant Fritchey (12/2/2014)


    ScottPletcher (12/2/2014)


    Robert Frasca (12/2/2014)


    ScottPletcher (12/2/2014)


    Robert Frasca (12/2/2014)


    ScottPletcher (12/1/2014)


    I guess the SourceFactTimeCard table is fairly wide?

    I suggest creating a covering index on:

    SourceFactTimeCard ( TimecardIndex, EmployeeID )

    If SQL then uses that covering index -- it should -- but still does a loop/"look up" join rather than a MERGE join between s and t, I'd try forcing a MERGE join just to see if you get better performance, i.e.:

    ...

    FROM StageLedgDesc s

    INNER MERGE JOIN [RG_TestDataMart].[dbo].[SourceFactTimeCard] t ON t.TimecardIndex = s.lindex

    ...

    This worked perfectly. Thanks for the suggestion.

    Which? Just the covering index? Or did explicitly specifying "MERGE" also gain performance? I'd like to know to better my understanding of SQL plans/performance.

    Edit: Also, what was the new query elapsed time?

    The covering index worked. The new execution time on one of the big queries dropped to 11 minutes including returning all 19 million rows to the screen. (I was on a machine with more memory.) I'm running the full job now including the part that writes the rows to the fact table and the smaller 2 million row query loaded in about 4 minutes soup-to-nuts. This is a dev server without much memory or storage. We have all solid state storage in production so I expect that to run much faster there. I have high hopes that the bigger table will take on the order of an hour and the biggest table to take on the order of three hours in dev. Unfortunately, I have no control over storage in the dev environment as it is a virtual server. I'm actually using this job to lobby management for solid state storage in the test environment to mirror production.

    It's funny, the original query said it was doing an index seek. The new query plan does the same thing but now it's using the covering index. My best guess is what you suggested. The SourceFactTimeCard table was fairly wide as it contained about 30 measures. If this hadn't worked I was going to try creating a new version of SourceFactTimeCard with just the primary key and the employee id, an informal bridge table as it were. The fact table that I'm building is a factless fact table designed to capture narratives associated with a time card and nothing else but I was trying to reuse the SourceFactTimeCard as that is the source for several other subject matter specific fact tables.

    Interesting.

    Did you try forcing a MERGE join just to see the results? I'd be curious to know if SQL can use a MERGE join and, if so, how it performs. Thanks for following up on this.

    And if you do that, will it use the index in an ordered fashion or will it put a Sort operation in. If the latter, I suspect you'll see much worse performance. But I'm very interested too.

    I tried the MERGE JOIN on the smaller (2 million) dataset and it was about 10 percent slower than the fastest. As Grant said, it did throw a sort operation in and that was listed as 86% of the effort. This is with the covering index still in place.

    The size of the dataset is a significant component. While the SSIS job is running faster it is still too slow. As I said, the smaller dataset ran about 4 minutes soup to nuts. The larger dataset is has been running nearly two hours but has only stored about 3 million rows. The SSIS service is sucking up about 25% of the CPU on a steady basis but there isn't much activity in SQL Server. Starting to think that a significant part of this is an underpowered virtual server/BIOS. Of course, proving that is the hard part...

    That could be an issue with log growth. Did you make sure to pre-allocate enough formatted log space before beginning the massive INSERT? Alternatively, maybe you could load in batches of 2M and back up the log between each batch, if you need log backups.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Viewing 3 posts - 16 through 18 (of 18 total)

You must be logged in to reply to this topic. Login to reply