Performance of clustered index and clustered primary key

Question

Performance of clustered index and clustered primary key

N_Muller

SSCarpal Tunnel

Points: 4037
More actions
March 30, 2017 at 10:37 am

#336687

I have a table with millions of rows that stores measurement data. Data get inserted in batches of thousand of records every 5 minutes as we received web service messages. I have no control over the data submitter, so it is possible that I have duplicate or multiple data sets for the same widget. I'm storing the data in a table that structurally looks like:
CREATE TABLE MyTable (
DataInsertTime DATETIME2(2),
WidgetID INT,
Value INT
)
Because of the possibility of duplicate data, I can't create a PK, unless I add an IDENTITY column or some other value that guarantees uniqueness. I always query the data for a given widget and point in time, so my queries always include DataInsertTime. My question is what would perform better? The table with no PK and a clustered index in DataInsertTime and WidgetID, or add the IDENTITY column and a clustered PK on DataInsertTime, WidgetID, ID (IDENTITY COLUMN)?
I have always stayed away from creating the PK if I didn't have to. The PK gives me nothing and I always thought that there would be an adverse impact on performance if SQL Server needs to assure uniqueness and manage an index with a additional column. Someone told me that adding the PK would always improve performance, even if I retrieve data from non-unique portion of the PK, such as DataInsertTime and WidgetID.
I would appreciate your comments. Thanks

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

ScottPletcher SSC Guru Points: 100949 More actions · Answer 1

You don't need a PK per se, but you should definitely create a clustered index headed by DateInsertTime. I think in this case you'll also need an IDENTITY column to make the clustering key unique yourself, otherwise I fear you'll end up with too many ghost rows if/when you delete data from the table; like you, I'd rather avoid the overhead of that, but it's probably best to do it in this case.

/*If data compression isn't available / you don't use it, naturally remove that part.*/
CREATE UNIQUE CLUSTERED INDEX MyTable__CL ON dbo.MyTable ( DateInsertTime, WidgetID, ID ) WITH ( DATA_COMPRESSION = PAGE, FILLFACTOR = 99 ) ON [PRIMARY]; /*change filegroup name if needed*/

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 2

Even if you don't add an explicit unique value to the clustered index, SQL Server will add a uniquifier to the clustered index if the values of the clustered index aren't unique. This is a requirement of the clustered index.
If you decide to a identity column make it the trailing column of the clustered index. You then have two choices, make the clustered index your primary key (yes, a primary key may have more than one column) or use the identity column as the primary key but be sure to make it a nonclustered index.

N_Muller SSCarpal Tunnel Points: 4037 More actions · Answer 3

Thanks for the response. Good point about data compression, I didn't think about that. Compression will slow down data inserts, but it will speed up data retrieval as it will reduce I/O.

N_Muller SSCarpal Tunnel Points: 4037 More actions · Answer 4

Again on data compression - eventually the data will have to be archived in a database on a different server. If both the operational and archived tables are compressed, how will SQL Server transfer the data from operational to archival? Will it decompress during the select and re-compress on the insert, or is it smart enough to know that both the source and destination tables on the insert/select are compressed and it doesn't decompress/re-compress? Again, the archive database is on a different server and connected through linked server. The archival is triggered from the archive server.

Any additional info is appreciated.

ScottPletcher SSC Guru Points: 100949 More actions · Answer 5

N_Muller - Thursday, March 30, 2017 12:25 PM
Thanks for the response. Good point about data compression, I didn't think about that. Compression will slow down data inserts, but it will speed up data retrieval as it will reduce I/O.

Definitely on data retrieval. And honestly, in many cases for us, it has actually sped up inserts. I'm certain that's because of (1) reduced I/O for writes, and (2) our CPUs have spare capacity most of the time, so we don't see what for many could be a hit to CPU responsiveness.

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

ScottPletcher SSC Guru Points: 100949 More actions · Answer 6

N_Muller - Thursday, March 30, 2017 12:39 PM
Again on data compression - eventually the data will have to be archived in a database on a different server. If both the operational and archived tables are compressed, how will SQL Server transfer the data from operational to archival? Will it decompress during the select and re-compress on the insert, or is it smart enough to know that both the source and destination tables on the insert/select are compressed and it doesn't decompress/re-compress? Again, the archive database is on a different server and connected through linked server. The archival is triggered from the archive server.
Any additional info is appreciated.

I believe SQL decompresses and re-compresses, as I believe it copies rows rather than pages, but I can't 100% say for sure.

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.