Index Types and Primary Keys versus Space Used

  • I recently received a script from a developer where there were comments through the object creations with comments such as "A CLUSTERED INDEX ON A TABLE WITH JUST ONE RECORD WILL TAKE UP MORE SPACE AND IS NOT NECESSARY" and "TABLE IS NOT USED IN QUERIES, AND THE VALUES IN THE COLUMNS ARE NOT SEQUENTIAL" (maybe not for the developer or user, but needed for tech support!!!).

    Anyway, I decided to create three examples to see what the impact on space was:

    CREATE TABLE [dbo].[FilesImportedNonClust](

    [FileName] [varchar](100) NOT NULL,

    [ImportDate] [datetime] NULL,

    [FileFormat] [varchar](10) NOT NULL,

    CONSTRAINT [PK_FilesImportedNonClust] PRIMARY KEY NONCLUSTERED

    (

    [FileName] ASC,

    [FileFormat] ASC

    )WITH (IGNORE_DUP_KEY = ON) ON [PRIMARY]

    CREATE TABLE [dbo].[FilesImportedClust](

    [FileName] [varchar](100) NOT NULL,

    [ImportDate] [datetime] NULL,

    [FileFormat] [varchar](10) NOT NULL,

    CONSTRAINT [PK_FilesImportedNonClust] PRIMARY KEY CLUSTERED

    (

    [FileName] ASC,

    [FileFormat] ASC

    )WITH (IGNORE_DUP_KEY = ON) ON [PRIMARY]

    CREATE TABLE [dbo].[FilesImportedNoKey](

    [FileName] [varchar](100) NOT NULL,

    [ImportDate] [datetime] NULL,

    [FileFormat] [varchar](10) NOT NULL

    ) ON [PRIMARY]

    I then put one record into each table:

    INSERT INTO dbo.FilesImportedNoKey VALUES ('MyFile', GetDate(), 'MyFormat')

    INSERT INTO dbo.FilesImportedNonClust VALUES ('MyFile', GetDate(), 'MyFormat')

    INSERT INTO dbo.FilesImportedClust VALUES ('MyFile', GetDate(), 'MyFormat')

    Then ran a modified version of the query from Eder Dias script, http://www.sqlservercentral.com/scripts/Administration/61183/:

    Select

    obj.name as TableName,

    (

    Select Sum(((used * 8192.00)/1024))

    from

    sys.sysindexes

    where

    id in (select objChild.id from sys.sysobjects objChild where objChild.name = obj.name)

    and indid in(0,1) --> Only Table Size

    ) TableSizeKb,

    (

    Select

    Sum(((used * 8192.00)/1024))

    from

    sys.sysindexes

    where

    id in (select objChild.id from sys.sysobjects objChild where objChild.name = obj.name)

    and indid not in (255,0,1) --> Only indexes

    ) IndexSizeKb

    From

    sys.sysobjects obj

    Where obj.type = 'U' --> only user tables

    AND obj.name LIKE 'FilesImport%'

    Order By

    TableSizeKb desc;

    The results are as follows:

    TableName TableSizeKbIndexSizeKb

    FilesImportedNonClust3216

    FilesImportedNoKey16NULL

    FilesImportedClust 16NULL

    I then put an additional 300 records into each table, with the results as follows:

    TableNameTableSizeKbIndexSizeKb

    FilesImportedNonClust5632

    FilesImportedClust 32NULL

    FilesImportedNoKey24NULL

    My questions:

    1) If a Clustered index is not specified, doesn't SQL Server create its own clustered index behind the scenes?

    2) Why is the NonClusterered table (with a primary key) versus the Clustered table (with a primary key) about 2x the size?

    3) Why does the table without the Primary Key take up less space than the table that is Clustered?

    4) Can anyone point me to online resource (or something I missed in BOL) to read up on this?

    Thanks! 🙂

    Argue for your limitations, and sure enough they're yours (Richard Bach, Illusions)

  • Very interesting.

    SQL does mark the physical records in a heap, AFAIK, with something to be sure they are unique. I have heard before that the optimizer likes clustered indexes, so they are recommended.

    A table with one (or few rows), might take up more space with a clustered index, but the time it takes to discuss this isn't worth the space used. Just create a clustered index. It isn't enough space to worry about.

    The nonclustered index for the PK means you create a heap, and then another index for the PK. A clustered PK is just the heap converted to clustered, no extra index.

  • Add a hearty "me too" to everything Steve said.

    This was just posted on Brad Magehee's blog today:

    http://technet.microsoft.com/en-us/library/cc917672.aspx

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Steve Jones - Editor (2/20/2009)


    SQL does mark the physical records in a heap, AFAIK, with something to be sure they are unique.

    RID (Row Identifier). An 8 byte combination of File ID, Page ID and Slot Index. It's stored in nonclusters as the row's address, not in the heap itself.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I did finally find some information in BOL (Creating Nonclustered Indexes) which helps explain things for me. For the benefit of people who do not know what a "heap" is (or are programmers new to SQL Server): simply (not getting into Extends, Pages, B-Trees, etc) a heap is just an "un-ordered" storage of data on the hard drive, versus a clustered index storing the data in a specified order.

    [FROM BOL]: Nonclustered indexes are implemented in the following ways:

    PRIMARY KEY and UNIQUE constraints

    When you create a PRIMARY KEY constraint, a unique clustered index on the column or columns is automatically created if a clustered index on the table does not already exist and you do not specify a unique nonclustered index. The primary key column cannot allow NULL values.

    When you create a UNIQUE constraint, a unique nonclustered index is created to enforce a UNIQUE constraint by default. You can specify a unique clustered index if a clustered index on the table does not already exist. For more information, see PRIMARY KEY Constraints and UNIQUE Constraints.

    Index independent of a constraint

    By default, a nonclustered index is created if clustered is not specified. The maximum number of nonclustered indexes that can be created per table is 249. This includes any indexes created by PRIMARY KEY or UNIQUE constraints, but does not include XML indexes.

    Nonclustered index on an indexed view

    After a unique clustered index has been created on a view, nonclustered indexes can be created. For more information, see Creating Indexed Views.

    Thanks for all of the inputs everyone.... it helped me do more search(s) online for data to explain this! 🙂 🙂

    Argue for your limitations, and sure enough they're yours (Richard Bach, Illusions)

  • Your Question 1):

    If a Clustered index is not specified, doesn't SQL Server create its own clustered index behind the scenes?

    answer:

    if you miss to specify the Index Type, there is no implicit speicifation from Mircosoft to create a Clustered/NonClustered Index behind the screen as it is purely depends on the user and requirement of the table creation, so simply it will be a "Heap" as you told.

    2) Why is the NonClusterered table (with a primary key) versus the Clustered table (with a primary key) about 2x the size?

    answer:

    as like "Gail" & "Steve Jones" told you, Non Clustered Index would occupy more space as it holds one Index and one Row Identifier

    whereas the clustered index dont have that structure so it will occupy less space and faster too..

    --- XXX ---

    I believe I have just given my opinion and not more anything better than the three experts above (Steve,Grant,Gail)

    Regards,

    Prabhu

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply