pages clustered/non-clustered index

  • quote:


    Darn it brain is fried with the new baby here. What do you mean by covering index, jpipes? For some reason my mind has gone blank (happening a lot recently too).


    isn't a covered index an index that contains all fields of a table, or the SELECT statement?

    Frank

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • See the way I recall it a covered index is one that pretty much gets the data rounded up you will always query agaisnt. However Non-Clustered indexes always contain the matching clustered index value unless there is no clustered index, then it contains the filegroup, page, and rowid of the values instead.

  • See

    http://www.sqlteam.com/item.asp?ItemID=412

    http://sql-server-performance.com/jc_sql_server_quantative_analysis5c.asp

    for covered indexes

    Frank

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • quote:


    See

    http://www.sqlteam.com/item.asp?ItemID=412

    http://sql-server-performance.com/jc_sql_server_quantative_analysis5c.asp

    for covered indexes

    Frank


    Ok I wasn't totaly brain dead here. Just slightly off. Thanks

  • Actually, it's quite simple, and doesn't have much to do with whether there is a clustered index on the table or not, except if the clustering key is actually needed by the query. A covering index is a non-clustered index in which the data in the leaf of the index can be used to complete the entire portion of the join or select or where condition without a need for the corresponding bookmark lookup to the clustered index's (or heap's data page). For instance, consider the following simple table structure:

    
    
    CREATE TABLE WorkOrder
    (
    OrderID INT NOT NULL IDENTITY(1,1)
    , Project INT NOT NULL
    , EndUser VARCHAR(50) NOT NULL
    , PurchaseOrder VARCHAR(20) NULL
    , CompletedOn SMALLDATETIME NULL
    , Location VARCHAR(35) NOT NULL
    , State CHAR(2) NOT NULL
    , Zip CHAR(6) NOT NULL
    , LastEditedOn SMALLDATETIME NOT NULL
    , LastEditedBy INT NOT NULL
    -- more fields
    )
    --
    CREATE TABLE Project
    (
    ProjectID INT NOT NULL IDENTITY(1,1)
    , Name VARCHAR(100) NOT NULL
    , StartedOn SMALLDATETIME NOT NULL
    , EndedOn SMALLDATETIME NULL
    , Status TINYINT NOT NULL
    -- More fields
    )

    Assume a primary key and clustered index on both the IDENTITY fields in each table. Also assume aforeign key constraint on ProjectID to Project.

    --

    For kicks, assume also, a non-clustered index is placed on the WorkOrder table on the Project field:

    
    
    CREATE NONCLUSTERED INDEX
    IDX_WorkOrder_Project
    ON WorkOrder (Project)

    The width of a row in this index, since the clustered index is on OrderID will be 8 (2 INT fields). Without the clustered index, it would be 20 (I think...) since a uniqueifier (16 bytes) would have to added to the nonclustered index in order for a bookmark lookup to be possible.

    --

    Suppose the following query:

    
    
    SELECT Project.Name, COUNT(*)
    FROM Project
    INNER JOIN WorkOrder
    ON ProjectID = Project
    WHERE WorkOrder.State = 'OH'
    GROUP BY Project.Name

    In this scenario, the query optimizer needs to find 2 basic pieces of information from the WorkOrder table: the Project and the State. The optimizer would likely do one of two things: either a) clustered index scan on the WorkOrder table, a hash match aggregate to group the WorkOrder records into Project, State buckets, then do a nested loop join to the Project table for the final output. Or, b) a bookmark lookup from the nonclustered index to the clustered index data page to find the state, the nested loop joins to the Project table to grap the Project name. The reason it has to do this is because the data the query needs (State) is in the data page of the clustered index.

    --

    Now, if there is a nonclustered index on Project and State:

    
    
    CREATE NONCLUSTERED INDEX
    IDX_WorkOrder_ProjectState
    ON WorkOrder(Project, State)

    the query optimizer can retrieve all the information regarding WorkOrders that it needs for the query from the index's leaf page without any bookmark lookup to the data page of the clustered index. Everything needed by the query can be found in the nonclustered index's leaf pages.

    Therefore, the execution plan generated by the same query will produce the following: an index scan for the Project table (since there is no where clause limiting it), followed by a series of nested loop joins to (tada!) the nonclustered index we just created. There are no bookmark lookups, and the entire index (a width of 10 bytes / row) will be spooled into cache at a rate more than 10 times the rate of the clustered index's data pages (or more if there were more fields...). Total IO is reduced, and on my test machines, the difference in execution cost ranged from (99.06 without covering index vs. .04% with) to (97.03% without vs. 2.97% with). Pretty remarkable differences when those pesky data pages don't need to be read.

    --

    There are of course disadvantages to having more indexes on your tables (increased cost for INSERT/UPDATES), so choose covering indexes wisely by profiling your stored procs and scripts and using qualified judgments based on your most heavily taxed queries...

    --

    Antares, I know you knew most of this stuff already, but just wanted to clarify on an earlier point...

    Cheers,

    Jay

Viewing 5 posts - 16 through 19 (of 19 total)

You must be logged in to reply to this topic. Login to reply