RE: Odd performance query behavior on very large 40 TB partitioned table in distributed view, maybe it's tipping?

Ten Centuries

Points: 1181

July 19, 2013 at 9:17 pm

Erland Sommarskog (7/19/2013)
And I would have appreciated to get everything in one script.

Apologies, I gathered these around midnight local time, and this simple idea of a single file completely escaped me. My bad.

Erland Sommarskog (7/19/2013)
I don't understand the indexing. Why is there a separate clustered index on reportStamp, and then a NC primary key on reportStamp and Id? Why not just make the PK clustered?

I don't either! 🙂 I've been here about a month, and its one of those situations where the team has turned over a few times and documentation is non-existent. I created the July shard, but at the time I didn't yet know enough to feel comfortable making any changes, so I created July as a simple clone of June. I will create the clustered index as you suggest for the August shard.

Given we do a lot of date range based queries, is it more optimal to create a clustered primary key as:

ALTER TABLE [dbo].[tblDWProbe] ADD CONSTRAINT [PK_tblDWProbe] PRIMARY KEY CLUSTERED

(

[reportStamp] ASC,

[dwProbeId] ASC

)

ALTER TABLE [dbo].[tblDWProbe] ADD CONSTRAINT [PK_tblDWProbe] PRIMARY KEY CLUSTERED

(

[dwProbeId] ASC,

[reportStamp] ASC

)

Or does the key column order not really matter when used in scans?

Erland Sommarskog (7/19/2013)
Nothing good can come out of these hints. NOLOCK is a hint which is very difficult to use and which can lead to all sorts of nasty surprises. And you should not need it for a query that reads a single row. And FORCESEEK is also pointless here. There is a index all set up to use.

The leading application is always parsing binary files and inserting into this table. I have been told that SELECT with READ COMMITTED blocks the inserts, but have not verified the real impact. It is on my list of things to do to enable READ COMMITTED SNAPSHOT, but the disk subsystem TempDB and its log resides on will need to be shored up before I can enable.

Erland Sommarskog (7/19/2013)
Since there is equality conditions on parameterId and vehicleId, they should come before reportStamp. Thus, I would recommend that you change the index on all shards.

This is an indexing nuance I am not familiar with. So I understand correctly, are you saying for composite indexes, key columns used in equality comparisons are more efficient if they are listed first?

For our business use case we sometimes query for vehicleId within a date range (usually last 7 days) and never a specific date, without also knowing the parameterId.

So I will probably create something like:

CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>] ON [dbo].[tblDWProbe]

(

[vehicleId],

[reportStamp],

[parameterId]

)

INCLUDE ([paramValue]) ON PSchemeName([reportStamp])

Am I missing out if I do this?

Also, do I need to explicitly declare the partition scheme when creating this index if the table is already partitioned? Finding conflicting advice online.

Thanks again!