SQL Server TempDB ballooning to more than 75 GB

Question

SQL Server TempDB ballooning to more than 75 GB

mrinal.technology

Old Hand

Points: 304
More actions
July 30, 2013 at 4:43 pm

#395227

Hi,
In one of our database while executing a procedure serving an important business logic, which has 11 joins, then a filtering on a specific column, sorting using order by on a specific column. In the join condition, as I see them separately, there are millions of rows fetched. Now I see that TempDB is ballooning to 75 GB size, which is the available disk space and the query fails.
What could be the possible reason:
- Is the TempDB, behaving incorrectly, can I really expect that kind of size (75 GB).
- If the TempDB behavior is correct, is there something I can do to mitigate the situation, will the faster execution of the query, having more statistics, relevant index, more seeking than scanning of index / table, will that help is solving the situation.
I know a possible situation would be relocate the db to a much a disk space, but I want to figure out the tuning options first, since I do not know, what is the maximum size TempDb will bloat up to
Any relevant suggestion would be great.
thanks,
Mrinal

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 1

mrinal.technology (7/30/2013)
Hi,
In one of our database while executing a procedure serving an important business logic, which has 11 joins, then a filtering on a specific column, sorting using order by on a specific column. In the join condition, as I see them separately, there are millions of rows fetched. Now I see that TempDB is ballooning to 75 GB size, which is the available disk space and the query fails.
What could be the possible reason:
- Is the TempDB, behaving incorrectly, can I really expect that kind of size (75 GB).
- If the TempDB behavior is correct, is there something I can do to mitigate the situation, will the faster execution of the query, having more statistics, relevant index, more seeking than scanning of index / table, will that help is solving the situation.
I know a possible situation would be relocate the db to a much a disk space, but I want to figure out the tuning options first, since I do not know, what is the maximum size TempDb will bloat up to
Any relevant suggestion would be great.
thanks,
Mrinal

The usual cause of this is an inadvertent Cartesian product, where you don't have one-to-one joins but many-many join that blows the row counts through the roof (billions+ total rows are easy to hit with relatively few rows in each table).

LOTS of things use tempdb, and you probably have quite a few in this query. Sorting, HASH joins, aggregates, table spools, interim work tables, etc.

I would also look for bad cardinality estimates, ensure stats are up to date, etc. SO many things could be in play here.

Oh, I would also STRONGLY recommend that you hire a performance tuning professional for a bit of help (and mentoring). WIN(company gets better performance)-WIN(you learn about performance tuning/refactoring)-WIN(consultant gets some money).

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

mrinal.technology Old Hand Points: 304 More actions · Answer 2

Thanks Kevin, we would be looking forward for the performance tuning sometime in the future once we have all the database components / schema in place, right now I am focusing on a query.

Following is the query for the reference:

ELECT

S.SHIPMENTID,

LMapped.LANEID AS ServID,

LOrig.LANEID AS OrigServID,

SCOrig.DOWID AS DepDay,

DATEPART(HOUR, S.DEPCUTOFFTIME) * 60 + DATEPART(MI, S.DEPCUTOFFTIME) AS DepTime,

SCOrig.DAYSINROUTE AS DIR,

(((SCOrig.DOWID + SCOrig.DAYSINROUTE - 1) % 7) + 1) AS ArrDay,

DATEPART(HOUR, S.ARRCUTOFFTIME) * 60 + DATEPART(MI, S.ARRCUTOFFTIME) AS ArrTime,

S.TOTALWEIGHT AS ComSWeight,

ceiling(DATEDIFF(MI, S.DEPCUTOFFTIME, DATEADD(MI, (OT.TIMEOFFSET - DT.TIMEOFFSET) * 60 + SCOrig.DAYSINROUTE * 1440, S.ARRCUTOFFTIME))) AS ComSMinute

FROM

dbo.PLANSHIPMENT AS S

JOIN dbo.SERVICECOMMITMENT AS SCOrig

ON SCOrig.COMMITMENTID = S.COMMITMENTID AND SCOrig.SCENARIOID = S.SCENARIOID

JOIN dbo.LANE AS LOrig

ON LOrig.LANEID = SCOrig.LANEID AND LOrig.SCENARIOID = SCOrig.SCENARIOID

JOIN dbo.PARTNERGATEWAY AS GW

ON

GW.ORIGTERMINALID = LOrig.ORIGTERMINALID AND

GW.DESTTERMINALID = LOrig.DESTTERMINALID AND

GW.SCENARIOID = LOrig.SCENARIOID

JOIN dbo.LANE AS LMapped

ON

LMapped.ORIGTERMINALID = GW.ORIGGWTERMINALID AND

LMapped.DESTTERMINALID = GW.DESTGWTERMINALID AND

LMapped.SCENARIOID = GW.SCENARIOID

JOIN dbo.SERVICECOMMITMENT AS SCMapped

ON

SCMapped.LANEID = LMapped.LANEID AND

SCMapped.DOWID = SCOrig.DOWID AND

SCMapped.SCENARIOID = LMapped.SCENARIOID

JOIN dbo.TERMINAL AS MappedOrig

ON MappedOrig.NODEID = GW.ORIGGWTERMINALID AND MappedOrig.SCENARIOID =

GW.SCENARIOID

JOIN dbo.TERMINAL AS MappedDest

ON MappedDest.NODEID = GW.DESTGWTERMINALID AND MappedDest.SCENARIOID =

GW.SCENARIOID

JOIN dbo.NODE AS Orig

ON Orig.NODEID = MappedOrig.NODEID AND Orig.SCENARIOID = MappedOrig.SCENARIOID

JOIN dbo.NODE AS Dest

ON Dest.NODEID = MappedDest.NODEID AND Dest.SCENARIOID = MappedDest.SCENARIOID

JOIN dbo.TIMEZONE AS OT

ON Orig.TIMEZONEID = OT.TIMEZONEID

JOIN dbo.TIMEZONE AS DT

ON Dest.TIMEZONEID = DT.TIMEZONEID

WHERE S.SCENARIOID = @p_scenarioid // Integer

ORDER BY S.SHIPMENTID

Also as per my analysis all tables are providing data via Cartesian product, tables are independent of each other, which I assume could be the reason, that's why such a huge data is processed in memory. Is my analysis correct. Also looking at the estimated execution plan, a clustered index scan takes 45% time, non clustered index seek takes 35% time and a nested join takes 16% time, but for the clustered index scan it doesn't specify the exact table, can it be on an intermediary table generated, which is my guess

SQLRNNR SSC Guru Points: 281334 More actions · Answer 3

Please provide the actual execution plan with this query so we can offer better advice.

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

Grant Fritchey SSC Guru Points: 398696 More actions · Answer 4

It doesn't look like you have cartesian joins. You have defined join criteria. A cartesian is when you don't have join criteria defined and every row is joined on every row.

Instead, guessing since I can't see the execution plan, you're returning, as you stated, millions of rows. You have an order by statement. For millions of rows, especially if there is no index to help with ordering, it's pretty likely that this is what is causing your ballooning. Why are you returning millions of rows? Is this a report of some sort? I'd suggest getting a better filtering mechanism in place to ensure fewer rows returned. That will help all by itself. If we're talking about a data load process, then you may want to look at other mechanisms.

All guesses though without seeing the execution plan.

Oh, and one other thing, those costs in an execution plan are just estimated values, not necessarily accurately reflecting real execution metrics.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

mrinal.technology Old Hand Points: 304 More actions · Answer 5

Following is the estimated SQL Plan:

https://www.dropbox.com/s/f69nsh46eij3ht0/LoadComExecutionPlan.sqlplan

As of now I think we are able to reach the crux of the issue, this db was ported from Oracle and we noticed that there are lot of number fields, which have been converted from Number type in Oracle to Number(38) in SQL Server, when they could have been int or even tinyint, after making those changes we are able to execute the query much faster and it does not bloat the tempdb

Grant Fritchey SSC Guru Points: 398696 More actions · Answer 6

So much for my stupid Cartesian product statement. You do have that problem. First warning right at the top of the plan indicates you don't have JOIN criteria. Apologies Kevin.

I'd fix that right away.

And, you have another warning, one of your tables has no statistics on columns. Do you have auto create/auto update stats turned off? Unless you have a very well tested and documented reason, I'd turn them back on. If you do, then you need to create one set of stats on the column indicated, again, another warning.

Also, this query did not go through a full set of optimization. Instead it timed out in the optimizer. That means the plan you have could be unstable. You might want to look to see if there is any way to simplify the query. Although, fixing the missing statistics and the missing join criteria could help.

You say this is returning millions of rows? The optimizer thinks, based on statistics, that it's only returning 2. Again, look to see if you have auto update stats turned off or if you need to run sp_updatestats or even UPDATE STATISTICS ... WITH FULL SCAN on some of the tables. If your statistics are off, you will get very poor performing queries.

That's where I'd start. Later, you might want to look to possibly address the key lookup and other potential issues.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 7

Thanks Kevin, we would be looking forward for the performance tuning sometime in the future once we have all the database components / schema in place, right now I am focusing on a query.

That is honestly a REALLY, REALLY bad position to take!!! The earlier in the process you "start doing things right" the:

1) easier it is to fix things

2) less risky it is to fix things

3) less time it takes to fix things (i.e. COST to pay the consultant)

4) easier it is to FIND actual root causes of issues in the first place

5) etc.

SOOOO many times I have gone to new clients that were TOTALLY FUBAR'D and it took WAY more effort to find/fix problems than it should have. And honestly it often happens that broken stuff simply CANNOT be fixed because it is either too risky or the required downtime is unacceptable.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 8

One more thing. If statistics staleness aren't in play here as Grant suggests, the other possibilities are:

a) data value skew, which could be helped by OPTION (RECOMPILE) on the statement

b) you simply have too much joined together. Beasts like this very often benefit from one or sometimes multiple temporary tables (NOT table variables!!) that get interim result sets that allow the optimizer to get better overall estimates for what is REALLY gonna happen and thus avoid DISASTROUSLY bad query plans.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Erland Sommarskog SSC-Insane Points: 24038 More actions · Answer 9

Grant Fritchey (7/31/2013)
So much for my stupid Cartesian product statement. You do have that problem. First warning right at the top of the plan indicates you don't have JOIN criteria.

I am not so sure. I think the optimizer produces that cross join on its own initiative.

Note that the scan on is on an indexed view with the name _dta_mv_20 which seems like something that DTA has suggested.

It would help to see the actual execution plan, as that can reveal bad estimates.

Then again, things improved when they replaced all numeric(38) with better data types, although it is not clear to me if the result was good enough.

[font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

Grant Fritchey SSC Guru Points: 398696 More actions · Answer 10

Erland Sommarskog (7/31/2013)
Grant Fritchey (7/31/2013)
So much for my stupid Cartesian product statement. You do have that problem. First warning right at the top of the plan indicates you don't have JOIN criteria.
I am not so sure. I think the optimizer produces that cross join on its own initiative.
Note that the scan on is on an indexed view with the name _dta_mv_20 which seems like something that DTA has suggested.
It would help to see the actual execution plan, as that can reveal bad estimates.
Then again, things improved when they replaced all numeric(38) with better data types, although it is not clear to me if the result was good enough.

Yeah, you could be right. There's no indication of that object within the query. There are a bunch of other DTA indexes. It looks like the DTA was run and all it's suggestions were implemented. Is it possible the view being referenced just has a cross join inside it?

As with so many other things, I think we're back to statistics. And possibly bad DTA recommendations.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning