December 1, 2014 at 9:13 am
I have a table with just under two million rows, one column of which is a varchar(max). (Using this table because it's the smallest) I need to retrieve the data from this table and join it performing dimensional lookups (joins) into a fact table.
I'm running this query once with varchar(max) column commented out and once including the varchar(max) column. In both cases, according to the query plan, it is spending a huge percentage of its time doing a clustered index seek. According the IO STATISTICS, it is performing over 8,000,000 logical reads with the varchar(max) column and slightly less than that without the varchar(max) column. ViewCubeDimMatter and ViewCubeDimEmployeeAll have filtered non-clustered indexes on the natural key and are performing an inconsequential number of logical reads (less than 1000).
I've attached the two query plans produced. WithMax and WithoutMax respectively.
This is the query, the only difference being that the Without Max query has the ", s.Notes" column commented out.
In both cases, the query is ridiculously slow but the "without" version is slightly faster and does choose to use parallelism although it also arbitrarily adds a SORT to it. (As you can see, there is no ORDER BY clause.)
I have two other tables with 29 million and 48 million rows respectively that I need to load that use this identical query and it takes them 12 to 20 hours each to load their fact tables in SSIS. I have determined that the bottleneck is the query not the write to the fact table. This is just a stupid amount of time to the point that this solution isn't viable yet the query is simple and the amount of data is relatively small. Any thoughts?
The table SourceFactTimeCard contains a row for each of the (48 + 29 + 2) million rows and timecardindex is the clustered index.
SELECT t.TimecardIndex AS TimeCardPK
, COALESCE(m.MatterPK,-1) AS MatterPK
, COALESCE(e.EmployeePK,-1) AS EmployeePK
, s.Notes
FROM StageLedgDesc s
JOIN [RG_TestDataMart].[dbo].[SourceFactTimeCard] t ON t.TimecardIndex = s.lindex
LEFT JOIN dbo.ViewCubeDimMatter m ON m.ClientMatterSymbol = s.tmatter AND m.IsActiveRecord = 1
LEFT JOIN dbo.ViewCubeDimEmployeeAll e ON e.EmployeeID = t.EmployeeID AND e.IsActiveRecord = 1
OPTION (MAXDOP 3)
"Beliefs" get in the way of learning.
December 1, 2014 at 9:26 am
1) I don't see attached query plans
2) what are the estimated and actual rows throughout the query?
3) is the 8M-read seek from a nested loop join or some other form such as a single hit on the table?
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
December 1, 2014 at 9:36 am
That's weird. I'll try attaching them again. I did the attach and upload. I'm not sure whether there's another step.
"Beliefs" get in the way of learning.
December 1, 2014 at 10:49 am
So it's a seek. And actual ~= estimated rows.
Therefore my guess is that the table is so fat that the cost of scanning is less than the iterative seeks. You can test that by forcing a hash join on the table and seeing what the estimated and perhaps actual cost of the plan is. Duration too.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
December 1, 2014 at 10:54 am
The pain point on the WithMAX plan is the fact that it's doing a Loops join against 1.9 million rows. That's effectively a cursor. But, it looks like your full query is hidden inside those views. I suspect one of them has JOIN hints or something that is killing your performance here, but I can't be sure.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
December 1, 2014 at 11:05 am
Grant Fritchey (12/1/2014)
The pain point on the WithMAX plan is the fact that it's doing a Loops join against 1.9 million rows. That's effectively a cursor. But, it looks like your full query is hidden inside those views. I suspect one of them has JOIN hints or something that is killing your performance here, but I can't be sure.
Those aren't actual views. Those are tables that were persisted from views. Left the name the same to keep from confusing the existing developers although that will change soon.
I don't understand why it's doing loops, that's really what my dilemma is here. This is a pretty straightforward INNER JOIN on the primary key.
"Beliefs" get in the way of learning.
December 1, 2014 at 11:26 am
Robert Frasca (12/1/2014)
Grant Fritchey (12/1/2014)
The pain point on the WithMAX plan is the fact that it's doing a Loops join against 1.9 million rows. That's effectively a cursor. But, it looks like your full query is hidden inside those views. I suspect one of them has JOIN hints or something that is killing your performance here, but I can't be sure.Those aren't actual views. Those are tables that were persisted from views. Left the name the same to keep from confusing the existing developers although that will change soon.
I don't understand why it's doing loops, that's really what my dilemma is here. This is a pretty straightforward INNER JOIN on the primary key.
Oops. Missed the object name the first time I looked at it. Sorry.
Well, it's a simple query. You don't have any choice except scans since there's nothing filtering anything. I'm unsure why it chose to put a Loop there, but heck, I'd try a query hint, HASH JOIN, to see what you get. I would also try running it once without the COALESCE statements, just as a test. Those Compute Scalara operators are not costed, but they're not free either. Other than the query hint, you're kind of dependent on hardware resources on this one because of the nature of your query.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
December 1, 2014 at 1:06 pm
Grant Fritchey (12/1/2014)
Robert Frasca (12/1/2014)
Grant Fritchey (12/1/2014)
The pain point on the WithMAX plan is the fact that it's doing a Loops join against 1.9 million rows. That's effectively a cursor. But, it looks like your full query is hidden inside those views. I suspect one of them has JOIN hints or something that is killing your performance here, but I can't be sure.Those aren't actual views. Those are tables that were persisted from views. Left the name the same to keep from confusing the existing developers although that will change soon.
I don't understand why it's doing loops, that's really what my dilemma is here. This is a pretty straightforward INNER JOIN on the primary key.
Oops. Missed the object name the first time I looked at it. Sorry.
Well, it's a simple query. You don't have any choice except scans since there's nothing filtering anything. I'm unsure why it chose to put a Loop there, but heck, I'd try a query hint, HASH JOIN, to see what you get. I would also try running it once without the COALESCE statements, just as a test. Those Compute Scalara operators are not costed, but they're not free either. Other than the query hint, you're kind of dependent on hardware resources on this one because of the nature of your query.
Well, I tried the Hash Join and that doubled the execution time. I tried it without the COALESCE but that seemed to have no impact. I've tried running one of the queries that uses a larger source table but if I return the output to my session eventually it sucks up all of the memory on my machine and dies (after about 4 hours).
I tried adding TOP n clauses but the performance degrades dramatically at about TOP 5,000,000. I tried running it in SSIS and instead of writing to a table just used a ROW COUNT transform. After six hours against the 29,000,000 I just killed the job.
Just for grins, I attached the Estimated Query Plan, which looks almost the same except it seems to understand that the index seek should only return one row.
I've attached it.
"Beliefs" get in the way of learning.
December 1, 2014 at 3:09 pm
So the big fat table sucked hind-tit on the table scan, eh?? Not surprising. 🙂
How many reads was it?
Iterative (logical IO) hits on the same 8K page (via nested loop join or bookmark lookup) can sometimes be a big win over scan/hash.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
December 1, 2014 at 4:43 pm
I guess the SourceFactTimeCard table is fairly wide?
I suggest creating a covering index on:
SourceFactTimeCard ( TimecardIndex, EmployeeID )
If SQL then uses that covering index -- it should -- but still does a loop/"look up" join rather than a MERGE join between s and t, I'd try forcing a MERGE join just to see if you get better performance, i.e.:
...
FROM StageLedgDesc s
INNER MERGE JOIN [RG_TestDataMart].[dbo].[SourceFactTimeCard] t ON t.TimecardIndex = s.lindex
...
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
December 1, 2014 at 5:51 pm
Nah, the other plan knew that it was only returning a single row from the seek too.
Pretty sure Scott's onto something. Indexes sure won't hurt and if you can get ordered returns a merge is an excellent join operation.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
December 2, 2014 at 7:35 am
ScottPletcher (12/1/2014)
I guess the SourceFactTimeCard table is fairly wide?I suggest creating a covering index on:
SourceFactTimeCard ( TimecardIndex, EmployeeID )
If SQL then uses that covering index -- it should -- but still does a loop/"look up" join rather than a MERGE join between s and t, I'd try forcing a MERGE join just to see if you get better performance, i.e.:
...
FROM StageLedgDesc s
INNER MERGE JOIN [RG_TestDataMart].[dbo].[SourceFactTimeCard] t ON t.TimecardIndex = s.lindex
...
This worked perfectly. Thanks for the suggestion.
"Beliefs" get in the way of learning.
December 2, 2014 at 8:43 am
Robert Frasca (12/2/2014)
ScottPletcher (12/1/2014)
I guess the SourceFactTimeCard table is fairly wide?I suggest creating a covering index on:
SourceFactTimeCard ( TimecardIndex, EmployeeID )
If SQL then uses that covering index -- it should -- but still does a loop/"look up" join rather than a MERGE join between s and t, I'd try forcing a MERGE join just to see if you get better performance, i.e.:
...
FROM StageLedgDesc s
INNER MERGE JOIN [RG_TestDataMart].[dbo].[SourceFactTimeCard] t ON t.TimecardIndex = s.lindex
...
This worked perfectly. Thanks for the suggestion.
Which? Just the covering index? Or did explicitly specifying "MERGE" also gain performance? I'd like to know to better my understanding of SQL plans/performance.
Edit: Also, what was the new query elapsed time?
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
December 2, 2014 at 10:32 am
ScottPletcher (12/2/2014)
Robert Frasca (12/2/2014)
ScottPletcher (12/1/2014)
I guess the SourceFactTimeCard table is fairly wide?I suggest creating a covering index on:
SourceFactTimeCard ( TimecardIndex, EmployeeID )
If SQL then uses that covering index -- it should -- but still does a loop/"look up" join rather than a MERGE join between s and t, I'd try forcing a MERGE join just to see if you get better performance, i.e.:
...
FROM StageLedgDesc s
INNER MERGE JOIN [RG_TestDataMart].[dbo].[SourceFactTimeCard] t ON t.TimecardIndex = s.lindex
...
This worked perfectly. Thanks for the suggestion.
Which? Just the covering index? Or did explicitly specifying "MERGE" also gain performance? I'd like to know to better my understanding of SQL plans/performance.
Edit: Also, what was the new query elapsed time?
The covering index worked. The new execution time on one of the big queries dropped to 11 minutes including returning all 19 million rows to the screen. (I was on a machine with more memory.) I'm running the full job now including the part that writes the rows to the fact table and the smaller 2 million row query loaded in about 4 minutes soup-to-nuts. This is a dev server without much memory or storage. We have all solid state storage in production so I expect that to run much faster there. I have high hopes that the bigger table will take on the order of an hour and the biggest table to take on the order of three hours in dev. Unfortunately, I have no control over storage in the dev environment as it is a virtual server. I'm actually using this job to lobby management for solid state storage in the test environment to mirror production.
It's funny, the original query said it was doing an index seek. The new query plan does the same thing but now it's using the covering index. My best guess is what you suggested. The SourceFactTimeCard table was fairly wide as it contained about 30 measures. If this hadn't worked I was going to try creating a new version of SourceFactTimeCard with just the primary key and the employee id, an informal bridge table as it were. The fact table that I'm building is a factless fact table designed to capture narratives associated with a time card and nothing else but I was trying to reuse the SourceFactTimeCard as that is the source for several other subject matter specific fact tables.
"Beliefs" get in the way of learning.
December 2, 2014 at 10:36 am
Robert Frasca (12/2/2014)
ScottPletcher (12/2/2014)
Robert Frasca (12/2/2014)
ScottPletcher (12/1/2014)
I guess the SourceFactTimeCard table is fairly wide?I suggest creating a covering index on:
SourceFactTimeCard ( TimecardIndex, EmployeeID )
If SQL then uses that covering index -- it should -- but still does a loop/"look up" join rather than a MERGE join between s and t, I'd try forcing a MERGE join just to see if you get better performance, i.e.:
...
FROM StageLedgDesc s
INNER MERGE JOIN [RG_TestDataMart].[dbo].[SourceFactTimeCard] t ON t.TimecardIndex = s.lindex
...
This worked perfectly. Thanks for the suggestion.
Which? Just the covering index? Or did explicitly specifying "MERGE" also gain performance? I'd like to know to better my understanding of SQL plans/performance.
Edit: Also, what was the new query elapsed time?
The covering index worked. The new execution time on one of the big queries dropped to 11 minutes including returning all 19 million rows to the screen. (I was on a machine with more memory.) I'm running the full job now including the part that writes the rows to the fact table and the smaller 2 million row query loaded in about 4 minutes soup-to-nuts. This is a dev server without much memory or storage. We have all solid state storage in production so I expect that to run much faster there. I have high hopes that the bigger table will take on the order of an hour and the biggest table to take on the order of three hours in dev. Unfortunately, I have no control over storage in the dev environment as it is a virtual server. I'm actually using this job to lobby management for solid state storage in the test environment to mirror production.
It's funny, the original query said it was doing an index seek. The new query plan does the same thing but now it's using the covering index. My best guess is what you suggested. The SourceFactTimeCard table was fairly wide as it contained about 30 measures. If this hadn't worked I was going to try creating a new version of SourceFactTimeCard with just the primary key and the employee id, an informal bridge table as it were. The fact table that I'm building is a factless fact table designed to capture narratives associated with a time card and nothing else but I was trying to reuse the SourceFactTimeCard as that is the source for several other subject matter specific fact tables.
Interesting.
Did you try forcing a MERGE join just to see the results? I'd be curious to know if SQL can use a MERGE join and, if so, how it performs. Thanks for following up on this.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
Viewing 15 posts - 1 through 15 (of 18 total)
You must be logged in to reply to this topic. Login to reply