RE: Index Creation Guidelines

SSC Guru

Points: 125444

October 21, 2005 at 7:20 am

Actually not sure about the 80% value but there is a threshold in which statistically the index will not be a value. But in your example you mention M, F, null and a military database. No matter the percentage of F in the system the statistics are still based on the overall uniqueness of the data in the index. So if you have said index and it has a million rows then the statistical uniqueness is .0003 percent. Due to this, more often than not it will choose a table scan or clustered index scan no matter how many times you run it. That said there are some conditional situations where things will behave differently than the statement leads to believe.

Ex. -- Try building an querying against this table.

CREATE TABLE dbo.MilPer (idx int identity(1,1) not null primary key,

sex char(1) null

)

CREATE NONCLUSTERED INDEX IX_MilPer ON dbo.MilPer

(

sex

&nbsp

set nocount on

--Insert 1,000,000 records all M

declare @x int

set @x = 0

while @x < 1000000

begin

insert dbo.MilPer (sex) values ('M')

set @x = @x + 1

end

--50000 updated to F

UPDATE

SET

sex = 'F'

FROM

dbo.MilPer M

INNER JOIN

(SELECT TOP 50000 X.idx FROM dbo.MilPer X ORDER BY NewId()) Y

M.idx = Y.idx

--20000 updated to null

UPDATE

SET

sex = Null

FROM

dbo.MilPer M

INNER JOIN

(SELECT TOP 20000 X.idx FROM dbo.MilPer X WHERE sex != 'F' ORDER BY NewId()) Y

M.idx = Y.idx

--Verify counts

select sex, count(*) from dbo.MilPer group by sex

-----Do your testing here.--------

--Now change the table to this.

ALTER TABLE dbo.MilPer ADD

[alt] char(1) NOT NULL CONSTRAINT DF_MilPer_at DEFAULT 'A'

-----Do your testing here.--------

You'll find that when sex and idx were the only columns it will use the non-clustered index. This is because it also happens that the non-clustered index will work as a covering index for SELECT * in that case because the idx column is tucked inside the non-clustered index because of the way clustered indexes relate to non-clustered.

However once you add the alt column you will find it will do a clustered index seek. This is because the non-clustered index no longer will work as a covering index and the statistic density (based on uniqueness) isn't going to pose high enough to have the system consdier it. If you instead do SELECT idx, sex you will get the non-clustered seek again. That said however neither method executes faster than the other because the data itself in the case isn't dense enough for thorough testing.