Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12

Odd performance query behavior on very large 40 TB partitioned table in distributed view, maybe it's tipping? Expand / Collapse
Author
Message
Posted Thursday, July 18, 2013 8:44 PM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Today @ 2:37 PM
Points: 4,438, Visits: 6,343
Hey Erland!! What are you doing here - are you lost? Welcome to SSC.com!

Best,

Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru at GMail
Post #1475256
Posted Thursday, July 18, 2013 8:54 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Yesterday @ 12:17 PM
Points: 49, Visits: 205
Scripts attached. Did you want ALL indexes for EACH monthly shard as well? They should all be identical. I have only included JUL2013 indexes, but here are all table definitions and the view. I have also attached the calling stored procedure I came up with.

Please let me know if I have omitted anything important.

I greatly appreciate any insights!


  Post Attachments 
VIEW tblDWProbe.txt (4 views, 9.32 KB)
CLUSTERED INDEX ClusteredIndexReportStamp.txt (4 views, 294 bytes)
NONCLUSTERED PRIMARY KEY PK_tblDWProbe.txt (3 views, 334 bytes)
NONCLUSTERED INDEX IX1_tblDWProbe_vehId_repSt_paramId_paramValue.txt (4 views, 381 bytes)
NONCLUSTERED INDEX IX1_tblDWProbe_vehicleId.txt (3 views, 298 bytes)
JUL2013 TABLE tblDWProbe.txt (3 views, 1.65 KB)
JUN2013 TABLE tblDWProbe.txt (2 views, 1.65 KB)
MAY2013 TABLE tblDWProbe.txt (1 view, 1.65 KB)
APR2013 TABLE tblDWProbe.txt (1 view, 1.64 KB)
MAR2013 TABLE tblDWProbe.txt (1 view, 1.64 KB)
FEB2013 TABLE tblDWProbe.txt (2 views, 1.64 KB)
JAN2013 TABLE tblDWProbe.txt (1 view, 1.65 KB)
DEC2012 TABLE tblDWProbe.txt (1 view, 1.65 KB)
NOV2012 TABLE tblDWProbe.txt (2 views, 1.64 KB)
OCT2012 TABLE tblDWProbe.txt (1 view, 1.64 KB)
SEP2012 TABLE tblDWProbe.txt (1 view, 1.64 KB)
AUG2012 TABLE tblDWProbe.txt (1 view, 1.64 KB)
JUL2012 TABLE tblDWProbe.txt (1 view, 1.64 KB)
JUN2012 TABLE tblDWProbe.txt (1 view, 1.64 KB)
MAY2012 TABLE tblDWProbe.txt (1 view, 1.64 KB)
APR2012 TABLE tblDWProbe.txt (1 view, 1.64 KB)
MAR2012 TABLE tblDWProbe.txt (1 view, 1.64 KB)
FEB2012 TABLE tblDWProbe.txt (2 views, 1.64 KB)
JAN2012 TABLE tblDWProbe.txt (2 views, 1.64 KB)
sp_GetProbeParameterForRangeByVehicleId.txt (3 views, 1.97 KB)
Post #1475260
Posted Friday, July 19, 2013 2:55 PM


SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Today @ 3:28 PM
Points: 817, Visits: 743
TheSQLGuru (7/18/2013)
Hey Erland!! What are you doing here - are you lost?


Yes, I lost an NNTP bridge to the MSDN forums. I figured that if I have to answer questions from a web UI, I might as well do it here.


Erland Sommarskog, SQL Server MVP, www.sommarskog.se
Post #1475692
Posted Friday, July 19, 2013 3:43 PM


SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Today @ 3:28 PM
Points: 817, Visits: 743
To be able to troubleshoot something, I need full information, yes. Here I was wondering whether the only the JUL2013 was indexed - that certainly asks for trouble. And I would have appreciated to get everything in one script.

Anyway, I looked closer at the complex plan, and this is a good plan. If you look at the Filter operators in each branch, they have a startup expression, which means that the shard in question is only visited if the date falls within the interval for the shard.

I don't understand the indexing. Why is there a separate clustered index on reportStamp, and then a NC primary key on reportStamp and Id? Why not just make the PK clustered?

Nothing good can come out of these hints. NOLOCK is a hint which is very difficult to use and which can lead to all sorts of nasty surprises. And you should not need it for a query that reads a single row. And FORCESEEK is also pointless here. There is a index all set up to use.

So why is this query slow some times? We would need to see the plan from such an occasion. If someone fiddles with the constraints so that they are not trusted anymore, the startup expression will disappear. However, it is not impossible that the inappropriate definition of the index is the culprit. It should really be:

CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>] ON [dbo].[tblDWProbe]
(
[parameterId],
[vehicleId],
[reportStamp])
INCLUDE ([paramValue])

Since there is equality conditions on parameterId and vehicleId, they should come before reportStamp. Thus, I would recommend that you change the index on all shards.


Erland Sommarskog, SQL Server MVP, www.sommarskog.se
Post #1475700
Posted Friday, July 19, 2013 9:17 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Yesterday @ 12:17 PM
Points: 49, Visits: 205
Erland Sommarskog (7/19/2013)
And I would have appreciated to get everything in one script.


Apologies, I gathered these around midnight local time, and this simple idea of a single file completely escaped me. My bad.

Erland Sommarskog (7/19/2013)
I don't understand the indexing. Why is there a separate clustered index on reportStamp, and then a NC primary key on reportStamp and Id? Why not just make the PK clustered?


I don't either! I've been here about a month, and its one of those situations where the team has turned over a few times and documentation is non-existent. I created the July shard, but at the time I didn't yet know enough to feel comfortable making any changes, so I created July as a simple clone of June. I will create the clustered index as you suggest for the August shard.

Given we do a lot of date range based queries, is it more optimal to create a clustered primary key as:

ALTER TABLE [dbo].[tblDWProbe] ADD  CONSTRAINT [PK_tblDWProbe] PRIMARY KEY CLUSTERED 
(
[reportStamp] ASC,
[dwProbeId] ASC
)

Or

ALTER TABLE [dbo].[tblDWProbe] ADD  CONSTRAINT [PK_tblDWProbe] PRIMARY KEY CLUSTERED 
(
[dwProbeId] ASC,
[reportStamp] ASC
)

Or does the key column order not really matter when used in scans?

Erland Sommarskog (7/19/2013)

Nothing good can come out of these hints. NOLOCK is a hint which is very difficult to use and which can lead to all sorts of nasty surprises. And you should not need it for a query that reads a single row. And FORCESEEK is also pointless here. There is a index all set up to use.


The leading application is always parsing binary files and inserting into this table. I have been told that SELECT with READ COMMITTED blocks the inserts, but have not verified the real impact. It is on my list of things to do to enable READ COMMITTED SNAPSHOT, but the disk subsystem TempDB and its log resides on will need to be shored up before I can enable.

Erland Sommarskog (7/19/2013)
Since there is equality conditions on parameterId and vehicleId, they should come before reportStamp. Thus, I would recommend that you change the index on all shards.


This is an indexing nuance I am not familiar with. So I understand correctly, are you saying for composite indexes, key columns used in equality comparisons are more efficient if they are listed first?

For our business use case we sometimes query for vehicleId within a date range (usually last 7 days) and never a specific date, without also knowing the parameterId.

So I will probably create something like:

CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>] ON [dbo].[tblDWProbe]
(
[vehicleId],
[reportStamp],
[parameterId]
)
INCLUDE ([paramValue]) ON PSchemeName([reportStamp])

Am I missing out if I do this?

Also, do I need to explicitly declare the partition scheme when creating this index if the table is already partitioned? Finding conflicting advice online.

Thanks again!
Post #1475724
Posted Saturday, July 20, 2013 2:55 AM


SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Today @ 3:28 PM
Points: 817, Visits: 743
jallmond (2013-07-19)
[quote]
I will create the clustered index as you suggest for the August shard.

In the long run, you probably want the same across the board, but of course it will take a maintenance window to make those changes.

Given we do a lot of date range based queries, is it more optimal to create a clustered primary key as:

ALTER TABLE [dbo].[tblDWProbe] ADD  CONSTRAINT [PK_tblDWProbe] PRIMARY KEY CLUSTERED 
(
[reportStamp] ASC,
[dwProbeId] ASC
)

Or

ALTER TABLE [dbo].[tblDWProbe] ADD  CONSTRAINT [PK_tblDWProbe] PRIMARY KEY CLUSTERED 
(
[dwProbeId] ASC,
[reportStamp] ASC
)

Or does the key column order not really matter when used in scans?

In scans it does not matter, but you don't want scans, you want seeks! If your typical query is "give me all rows between certain timestamps", then you want reportStamp first. If you look up by dwProbeId, then you should have an index where dwProbeId is first. But you don't seem to have an index like that today.

The leading application is always parsing binary files and inserting into this table. I have been told that SELECT with READ COMMITTED blocks the inserts,

And? With the correct index, this is a single-row read. You can take the blocking, no problem. Better be blocked than get incorrect results or error 601.

This is an indexing nuance I am not familiar with. So I understand correctly, are you saying for composite indexes, key columns used in equality comparisons are more efficient if they are listed first?

Yes. Just consider that you are looking at an output with thousands of rows, and you want to find the rows for a certain vehicle, parameter and date range. If the data is sorted by vehicle, parameter and date, you have the data in a single contiguous interval. If the data is sorted on vehicle, date, parameter, you need to filter each date value for the parameter which takes longer time and is more error-prone.

For our business use case we sometimes query for vehicleId within a date range (usually last 7 days) and never a specific date, without also knowing the parameterId.

Then you need an index on vehicle and reportStamp only as well. Now, I don't know how many different parameter values there are for a specific vehicle at a certain date, so the exact impact is difficult to measure. But there can certainly be a problem. Particularly if some parameter values come in frequently, and others only twice a day and you are looking for one of the latter.

Also, do I need to explicitly declare the partition scheme when creating this index if the table is already partitioned? Finding conflicting advice online.

So partitioned views, distributed or not, are funny creatures. Nowhere do you explicily say "this is a partitioned view" and SQL Server gives you an error message if you have messed up. You just create the tables and the views, and if you did not do everything right, SQL Server will simply handle the view as a regular view with no magic.

Partitioned tables are another matter. Here you need to explicitly define a partition function and a partition scheme.


Erland Sommarskog, SQL Server MVP, www.sommarskog.se
Post #1475738
Posted Thursday, August 1, 2013 8:35 AM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Wednesday, September 10, 2014 5:46 AM
Points: 316, Visits: 910
I think your problems might be associated with statistics.

It seems that you have automated statistics updating enabled. This means that when 20% of the rows in a table are updated, the statistics for the table is automatically updated.

This update can take a lot of time for very large tables. This might be one reason why your query is sometimes slow.

I Suggest that you turn off automatic statistics updating for this table, and instead add a job that manually updates statistics every night (for the active shard).
Post #1479972
« Prev Topic | Next Topic »

Add to briefcase ««12

Permissions Expand / Collapse