I recently was attending a presentation where the presenter stated that the statistic with stats_id = 1 is always for the primary key (PK). The presenter made this statement based on the behavior that he has seen. However, is it true?
The relationship between index_id and stats_id
The first thing that needs understood is the relationship between indexes and statistics. In the sys.stats documentation, it states that every index has a statistics row with index_id = stats_id, as seen in this screen shot from that article:
Is the index with index_id = 1 special?
So now we need to consider whether there is any special meaning to an index with an index_id of 1? The index_id values are in the sys.indexes documentation, as seen in this screen shot:
Here we can see that index_id 1 means that the index is a clustered index. Note that there is no mention of a primary key. So where did this observed behavior come from? We need to dig a bit further into how SQL Server makes a primary key constraint.
Primary Key constraints
In the constraint documentation, in the remarks section, the second paragraph states that the primary key and unique constraints have an index created. In fact, unique indexes enforce these two constraints. At the top of this article, it discusses when those constraints create clustered or non-clustered indexes:
PK defaults to clustered. If there is a clustered index already, PK defaults to non-clustered. Note that this is the default behavior. You can specify that it should be created as non-clustered.
Now we can understand why the presenter’s observed behavior was that a PK always creates a statistic with stats_id of 1. This is the default behavior if that table doesn’t already have a clustered index.
Let’s run through a few examples showing these conditions. Let’s create a table with a million rows, and run some tests against it:
USE tempdb; IF OBJECT_ID('dbo.Test') IS NOT NULL DROP TABLE dbo.Test; CREATE TABLE dbo.Test ( Col1 INTEGER NOT NULL, Col2 INTEGER NOT NULL); GO WITH Tens (N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1), Hundreds(N) AS (SELECT 1 FROM Tens t1, Tens t2), Millions(N) AS (SELECT 1 FROM Hundreds t1, Hundreds t2, Hundreds t3), Tally (N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM Millions) INSERT INTO dbo.Test (Col1, Col2) SELECT N, N FROM Tally;
Now that we have the test data, we will run a simple query to create a statistic:
SELECT * FROM dbo.Test WHERE Col2 = 3; SELECT OBJECT_NAME(object_id) AS ObjectName, name AS StatsName, stats_id FROM sys.stats WHERE OBJECT_ID = OBJECT_ID('dbo.Test');
This shows the following statistic:
This doesn’t say anything about the PK being stats_id 1 since there isn’t a PK. So, let’s create a clustered index, then add a PK:
-- add a CI and a PK CREATE CLUSTERED INDEX IX1 ON dbo.Test (Col2); ALTER TABLE dbo.Test ADD CONSTRAINT PK_Test PRIMARY KEY (Col1); -- look at the stats again SELECT OBJECT_NAME(object_id) AS ObjectName, name AS StatsName, stats_id FROM sys.stats WHERE OBJECT_ID = OBJECT_ID('dbo.Test');
Here we can see that the clustered index (IX1) does have stats_id of 1, and the PK has stats_id of 3. This proves that when there is an existing clustered index, creating the PK will create a statistic with a stats_id other than 1.
If you were to look at sys.indexes, you would see that these two indexes use index_id values of 1 and 3. The value 2 is skipped. It’s not because there used to be an index that was deleted after the index_id 3 index was created. It’s simply because of the relationship that index_id = stats_id, and there is already a statistic with stats_id = 2. When creating the index for the primary key, index_id 2 had to be skipped.
Now let’s recreate the test table (run the above code), and then add a non-clustered PK.
ALTER TABLE dbo.Test ADD CONSTRAINT PK_Test PRIMARY KEY NONCLUSTERED (Col1); -- look at the stats again SELECT OBJECT_NAME(object_id) AS ObjectName, name AS StatsName, stats_id FROM sys.stats WHERE OBJECT_ID = OBJECT_ID('dbo.Test');
This shows that the non-clustered PK (index_id > 1) created a statistics with a corresponding stats_id that is also > 1.
The only thing left to prove is that creating a PK where there is not a clustered index, and allowing the use of defaults, will create a stats_id = 1:
ALTER TABLE dbo.Test DROP CONSTRAINT PK_Test; ALTER TABLE dbo.Test ADD CONSTRAINT PK_Test PRIMARY KEY (Col1); -- look at the stats again SELECT OBJECT_NAME(object_id) AS ObjectName, name AS StatsName, stats_id FROM sys.stats WHERE OBJECT_ID = OBJECT_ID('dbo.Test');
This shows us that the statistic for the PK has a stats_id of 1:
When creating a primary key, it creates an underlying index and statistic for that index. The index will be the clustered index on the table if there is not a clustered index on the table already, and if the statement to create the primary key did not specify non-clustered. If there is an existing clustered index on the table, or it was specified to create a non-clustered PK, then a non-clustered index is created for the PK. The clustered index will have an index_id (and stats_id) of 1, and all other non-clustered indexes will have a >1 value.