Clustered scan vs Non-clustered Seek with lookup

  • Hi

    I think the answer to this question will be "it depends" but i'm interested to know what the variables might be.

    I have a table with 18 million rows, which is large for us. At the moment it has a single clustered index and there is a lot of blocking going on, on it.

    The missing indexes DMV is recommending that I create a non clustered index on 2 non-unique columns, and include the remaining 9 columns of the table. THere are a lot of queries that use these predicates and want to return all of the rows in the table. This strikes me as a but mad. Generally would I  be better off

    • doing nothing and letting the clustered index take the strain
    • Creating the non-clustered index as recommended by the DMV
    • Creating a non-clustered index on just the 2 predicates and having an additional key lookup on the clustered index to return the additional rows.
    Any recommendations?

    Alex

  • It would be better for you to find this yourself by creating a load on your Dev  server.
    With out the included columns the plan may even ignore the new index.

    One thing I would consider is the overall benefit comparing write overhead. If this is a table that is mostly read from, there would be no harm in creating the big index. But instead of considering the missing index dmv , why don't you review top queries to see get more input to decide on the right indexing?

    ----------------------------------------------------------------------------------------------------------------------------------------------------
    Roshan Joe

    Jeff Moden -Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • You're right, the answer depends heavily on your system and queries.

    I will say, the statement below is rather confusing:

    THere are a lot of queries that use these predicates and want to return all of the rows in the table.

    It seems to be implying that you've got queries that filter / don't filter on a couple particular columns, and still return every row in the table?
    If that's incorrect (they actually ARE filtering out rows based on the two columns in question,) then it might be worthwhile to set up a test copy of the database, snag the queries in question, and test the performance with and without indexes.

    As you say they still return all the columns, it probably wouldn't be worth it to add the remaining columns as included columns, that would likely significantly increase the amount of storage (and work required during INSERT / UPDATE / DELETE operations) required vs SQL finding the RIDs and going back to the clustered index for the data.

    But again, test, test, test, test!

  • Thanks for the replies guys

    Oops I got my columns are rows confused typing.   I meant returns all the columns not all the rows.

    The predicates filter the rows but the queries request all columns are returned.

  • Change the clustering index to the 2 columns that are used most often / nearly always as predicates.  If you prefer, you can add a unique column as a 3rd key to make the index unique.  If one column is used significantly more often than the other to filter on, put that column first.  If they are (nearly) always used together, put the more highly selective column first.  Don't use too low a fillfactor and don't allow it to default -- explicitly specify 90 and then we can adjust from there.

    Here are sample commands to do this:

    DROP INDEX <existing_clustered_index_name> ON <table_name>;
    <OR>
    ALTER TABLE <table_name> DROP CONSTRAINT <existing_clustering_constraint>

    Then:
    CREATE CLUSTERED INDEX table_name__CL ON table_name ( col1, col2 ) WITH ( /*DATA_COMPRESSION = PAGE,*/ FILLFACTOR = 90, SORT_IN_TEMPDB = ON ) ON [PRIMARY]; /*change file group name for index if needed*/

    That's almost certainly the only really efficient way to address this issue, based on what you're described.  I would have to look at index stats to 100% confirm, but I suggest go ahead and try this approach first, since it should be an improvement either way.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • I'd make sure that col1 is immutable, unique, and ever-increasing or there will be performance hell to pay on inserts and updates.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Tuesday, March 14, 2017 11:32 AM

    I'd make sure that col1 is immutable, unique, and ever-increasing or there will be performance hell to pay on inserts and updates.

    That's not necessarily true, and it's irrelevant here anyway, since that's how you do lookups on the table, and you're using all columns from the table.

    Get the best clustered index on the table first, and then we can tune it from there.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher - Tuesday, March 14, 2017 4:35 PM

    Jeff Moden - Tuesday, March 14, 2017 11:32 AM

    I'd make sure that col1 is immutable, unique, and ever-increasing or there will be performance hell to pay on inserts and updates.

    That's not necessarily true, and it's irrelevant here anyway, since that's how you do lookups on the table, and you're using all columns from the table.

    Get the best clustered index on the table first, and then we can tune it from there.

    Your quite right.  "It Depends".  Smaller narrow reference or dimension tables don't fit the normal bill of "ever-increasing" and don't need to because they're usually static or suffer only seriously infrequent changes throughout their life time.  I'll also state that it's possible that a clustered index may provide the benefits of performance if the key isn't immutable or unique but I've not run across any yet.  Do you have an example of where the best clustered index is mutable and/or not unique?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi

    All interesting replies........

    Currently the clustered index is on an identity column and there are no non-clustered index's. There is select, insert and update activity on the table. Neither of the columns (lets call them col1 and col2) are unique and can have over 500 entries for any given value.

    After Scott's post I thought it might work to create on clustered index on col1, col2 with the identity column to make it unique. Given that there aren't any non-clustered indexes, will it make much difference that the clustering key is big? The primary key is currently the identify column but I can't see any reason to change that if there are no non-clustered indexes.

    Does that sound worth testing or does it present any obvious pitfalls?

    Alex

  • Since you already have an identity column, go ahead and add that to make the clustering unique.  SQL would "uniquify" it, but it's better to do it yourself to avoid ghost rows.

    Are the searches always on both col1 and col2?  Then put the more selective value first.  But if some searches are only on, say, col2, then put that column first.

    For best performance overall, don't blindly follow a very generalized "rule" such as "clus key should 'always' be a single, int value".  Instead, you should use that specific table's actual usage to determine the best clus key for it, period.

    I've tuned thousands of dbs over thirty years, and I've literally seen only a handful or so where the majority of tables were properly clustered on an identity column.  There's almost always a better clus key: such as a datetime (log tables), the parent's key first (child tables), etc..

    Similarly, don't assume you will need to create a separate nonclus index for most every major query.  Often getting the clus index right drastically reduces the number of nonclus indexes you end up needing.

    Unquestionably there are many cases where an identity makes a great clus key.  But don't assume that from the start.  Use actual index stats to make that determination.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply