clustered index

  • The point in question is that how many copies of data you want to keep...if it is one....you can store it in only one order physically.....now by definition a clustered index is associated with the physical order.....the index illustrated by EdVassie is merely a clustered index with two dimensions...

    as Last Name, then First name in the case of an address book....wherein for his case it is type of object, then objectid or something.....these are basics friends.....

    well....logically or functionally you can have n number of indexes at place for your database, but physically it depends upon how many copies of data you want to keep....

    For more complex ways of data storage and retrieval, check out cubes in SQL Server Analysis Services.

    It is good to argue, but it is useless if you don't accept.

    Chandrachurh Ghosh
    DBA – MS SQL Server
    Ericsson India Global Services Limited
    Quality is not an act, it is a habit.

  • we cannot have any non-clustered index if we don't have a clustered index on a table....also a primary key is by default clustered indexed and by definition unique.....

    well...from your discussion I got that you need to retrieve data as fast as a clustered index but in different ways.....MS is clever enough to let you features to handle situations like this.......if all the columns required in a query is present in a non-clustered index the SQL server fetches the values from the NC index pool itself. It does not require to go to the main table. Also, we can include columns in NC indexes which also has the same effect. NC indexes are more versatile than indexed views.

    Hope this helps.....

    Chandrachurh Ghosh
    DBA – MS SQL Server
    Ericsson India Global Services Limited
    Quality is not an act, it is a habit.

  • chandrachurhghosh (4/16/2008)


    we cannot have any non-clustered index if we don't have a clustered index on a table

    That's not the case. There's no requirement in SQL server for a table to have a clustered index. Nonclustered indexes can be created on a heap just as they can on a cluster.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (4/16/2008)


    Primary key never affects the row order in data pages.

    *Snip*

    Does that help?

    A lot thanks I assumed that was what primary key implied.

    I understand the index page id doesn't need to be sequential ( on index defrag it will end up being though ? ) and it is better to link in a new page than shuffle all entries down their corresponding pages. I also understand that if the index pages are mostly cached not physically retrivied whether it is page id 301 or 62 doesn't have much effect on the performance retrieval using that index.

    I was confused about srienstr's post and the different B-Tree implmentation in DB2 and SQL server

    This site (link) states that DB2 only guarantees that a clustered index is initially clustered, clustering is not maintained. While this saves time on inserts (no page splits), I'm glad SQL Server allows index fragmentation instead.

    Why wouldn't the index fragment in exactly the same manner over time for DB2 Type-1 and 2 indexes or is he talking specifically about MDC ?

    Probably worth drawing a distinction between the clustering of the index only pages inside a SQL server clustered or non-clustered index and the clustering of the data pages as being two different things.

  • in 2005, you don't need an indexed view to get what amounts to a second clustered index on a table - you can get the same effect with included columns (new feature in 2005).

    create a non-clustered index that also includes every non-key column in the table. This will be a copy of the table that's physically stored in key order.

    not sure if anyone mentioned this about included columns already - didn't see it anyway.

    ---------------------------------------
    elsasoft.org

  • Ok GilaMonster.....I am giving an example....can you please tell me why this happens?

    CREATE TABLE [dbo].[test1](

    [id] [int] IDENTITY(1,1) NOT NULL,

    [test] [varchar](255) NULL

    )

    INSERT INTO test1([test])

    VALUES('1')

    INSERT INTO test1([test])

    VALUES('11')

    CREATE NONCLUSTERED INDEX [TEST1]

    ON [dbo].[test1]

    (

    [test] ASC

    )

    WITH (PAD_INDEX = OFF,

    STATISTICS_NORECOMPUTE = OFF,

    SORT_IN_TEMPDB = OFF,

    IGNORE_DUP_KEY = OFF,

    DROP_EXISTING = OFF,

    ONLINE = OFF,

    ALLOW_ROW_LOCKS = ON,

    ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    SELECT * FROM test1

    WHERE [test]='11'

    Check the query execution plan for the above query. It still does a table scan. GilaMonster, can you tell me why?

    Chandrachurh Ghosh
    DBA – MS SQL Server
    Ericsson India Global Services Limited
    Quality is not an act, it is a habit.

  • Because the index is not covering and the table scan is cheaper

    To do an index seek in this case requires 3 reads. 2 of the index tree and one of the table to fetch the column ID which is not in the index. A table scan just requires 1 read.

    If the index had ID either in the key or as an include, or the query just returned test, you'd get an index seek.

    To continue on your example...

    SELECT test FROM test1 WHERE test = '11' -- index seek on index TEST1

    CREATE NONCLUSTERED INDEX Test_Covering ON Test1 (Test) INCLUDE (ID)

    SELECT * FROM test1

    WHERE [test]='11' -- Index seek on index Test_Covering

    Does that make sense?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Now drop the index and create the following index on the table -

    CREATE NONCLUSTERED INDEX [TEST1]

    ON [dbo].[test1]

    (

    [test] ASC

    )

    INCLUDE ([id])

    WITH (PAD_INDEX = OFF,

    STATISTICS_NORECOMPUTE = OFF,

    SORT_IN_TEMPDB = OFF,

    IGNORE_DUP_KEY = OFF,

    DROP_EXISTING = OFF,

    ONLINE = OFF,

    ALLOW_ROW_LOCKS = ON,

    ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    Now check the query execution plan for the following query -

    SELECT * FROM test1

    WHERE [test]='11'

    I will tell you why.....it is no longer heap....a heap by definition is not indexed other than its reference in the IAM.....with the above non-clustered index you are saving another copy of the data in order as specified in the definition of the index.....

    Hence, even though you can create a non-clustered index on a heap...it is of no use, it will always result in a table scan unless the columns involved in the query are present in the index definition .....we already have a non-clustered index - the Index allocation Map (IAM)

    Chandrachurh Ghosh
    DBA – MS SQL Server
    Ericsson India Global Services Limited
    Quality is not an act, it is a habit.

  • That is not true.

    The definition of a heap is a table that does not have a clustered index, not a table with no index at all.

    It is very possible to have a heap with a nonclustered index and a query that refereces some columns not in the nonclustered index tree do an index seek (with Bookmark/RID lookup) on a heap.

    All it requires is for the optimiser decides that the seek/lookup is cheaper than the scan. That requires that the query affects something less than 1% of the table.

    Let's see this example....

    -- the PK is created as nonclustered. This is a heap with a single nonclustered index on it.

    CREATE TABLE [dbo].[IndexTest](

    [ID] [int] identity,

    [CreationDate] [datetime]

    CONSTRAINT [pk_IndexTest] PRIMARY KEY NONCLUSTERED ([ID] ASC)

    )

    -- Now, lots of data.. I do mean, lots.

    INSERT INTO IndexTest (CreationDate)

    SELECT DATEADD(dd,col1.column_id + col2.column_id,'2008/01/01') FROM sys.columns col1, sys.columns col2

    -- On my test DB, this inserted 484416 rows.

    -- Now, going to see if I can get an index seek, on a heap, even though I'm querying a column not in the NC index.

    SELECT * FROM IndexTest WHERE ID BETWEEN 2000 AND 3000 -- 1001 rows affected. Table scan

    SELECT * FROM IndexTest WHERE ID BETWEEN 200 AND 300 -- 101 rows affected. Index seek + Rid lookup

    The IAM is not an index. The IAM is a allocation mechanism within SQL that says, for each heap, cluster or NC index in the system, which extents are allocated to the object. It's a way for the storage engine to quickly tell which extents within the database file belong to which objects.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I accept that this aspect was unknown to me, that a table need not have a Clustered index to have a NC index...thanks for that....but does this change my basic statement that -

    The point in question is that how many copies of data you want to keep...if it is one....you can store it in only one order physically.....the index illustrated by EdVassie is merely a clustered index with two dimensions...

    as Last Name, then First name in the case of an address book....wherein for his case it is type of object, then objectid or something.....these are basics friends.....

    well....logically or functionally you can have n number of indexes at place for your database, but physically it depends upon how many copies of data you want to keep....

    Isn't this valid?

    Thanks for the discussion.

    Chandrachurh Ghosh
    DBA – MS SQL Server
    Ericsson India Global Services Limited
    Quality is not an act, it is a habit.

  • The OP asked why we can only have one clustered index per table.

    A lot of people responded why this is the case for SQL Server. I posted about DB2 MDC indexes to show there are other ways to tackle this problem.

    I repeat that the explanation I gave for MDC is very simplified, and that it really does achive what appears to be impossible - a row is stored once only and physically clustered in multiple dimensions. If anyone has further questions about how this is done, please find a DB2 forum to ask these questions.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

  • Well, to support my opinion about storage and clustered index, try out the following....

    create a view with schema binding......then try to create a non-clustered index on it......

    Chandrachurh Ghosh
    DBA – MS SQL Server
    Ericsson India Global Services Limited
    Quality is not an act, it is a habit.

  • chandrachurhghosh (4/17/2008)


    well....logically or functionally you can have n number of indexes at place for your database, but physically it depends upon how many copies of data you want to keep....

    Isn't this valid?

    Sure. In SQL Server, an index is a stores a seperate set of data for the columns in it. That's why too many indexes can have an effect on insert/update performance.

    The distinction is that a clustered index always has all the columns of the table in the leaaf pages (because it is the table) while a nonclustered index doesn't (but in SQL 2005, because of the include columns, it can)

    The other reason that a table cannot have 2 clustered indexes is because the clustering key is used as the row's 'address'

    In a heap, the RID identifies the location of the row and is composed of the fileID, the page ID and the slot index. This value is used by nonclustered idnees to locate the row, should a lookup be required.

    Once the table gets a clustered index, the RID falls away and the clustering key becomes the row identifier. The clustereing key is hence added to all nonclustered indexes in place of the RID

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • chandrachurhghosh (4/17/2008)


    Well, to support my opinion about storage and clustered index, try out the following....

    create a view with schema binding......then try to create a non-clustered index on it......

    Indexed views are a whole nother story. The requirements for creating an indexed view are that the first index is clustered. That is specific to indexed views and has no bearing on tables (heaps or otherwise).

    Very simple reason. A view without a clustered index does not have persisted data. It's just a saved select statement. Creating a clustered index on it means that SQL will persist the results of that view to disk, as if it were a table. Once its persisted, then NC indexes can be created on it.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I merely gave it as a support for my opinion that for a different order of data, SQL Server needs to store it in the order specified by the index definition........your concerns are true.....

    Chandrachurh Ghosh
    DBA – MS SQL Server
    Ericsson India Global Services Limited
    Quality is not an act, it is a habit.

Viewing 15 posts - 16 through 30 (of 33 total)

You must be logged in to reply to this topic. Login to reply