Choosing columns for Clustered Index.. Why it is so important?

  • I wanted to write an article on Choosing the right columns for Clustered Index and Why it is so important? This becomes extremely important for an intensive read/write DB.

    It will cover :-

    1. What is Clustered Index?

    2. Why to always have Clustered Index? What can happen if we will not have it?

    3. Various locking in absence of Clustered Index

    4. Hard Delete vs Soft Delete... Which to choose and why?

    5. (Key lookup + Covering Indexes) vs (Clustered Index)

    6. Requirement vs Best practices in choosing the right column for the Clustered Index

    Please share your feedback and let me know if i go ahead with this?

  • I'd also mention the issue of page splitting, fragmentation, fill factor, and how it all relates to choosing the optimal clustering key combination.

    Imagine you're stacking books on a shelf sorted by author and you leave little or no room in between. Then one day you receive a shipment of 100 books all by the author Nora Roberts. What must you do to make room while maintaining the correct order? You must shuffle other books to new locations. In terms of row store tables, page splitting is the equivalent, and it can result in significant I/O overhead.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • The subject is interesting, but you're trying to cover a wide set of topics. Try to get a series of articles instead of just one. At least, that's my opinion.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Yeah.. You are right:-)

    Will break it into multiple articles and will try to link them.

    Thanks for your feedback... I am new to Public forum where I will write something. I thought to share my experience and learning that may be helpful for others.

    Thank you once again!!

    Can I go ahead and initiate it??

  • who is stopping you ? Go ahead .

  • I have written my own blog

  • Brahmanand Shukla - Thursday, September 27, 2018 9:01 AM

    I have written my own blog

    I will disagree with #6 Object, Column and Variable name should be in Title case.

    You should code these as they are declared in the database.  If the column name is declare as 'object_id' then you should code it as 'object_id' (check the system tables).  My rule of thumb, code as if you are working in a database or instance where the collation is case sensitive, even if it isn't.  You never know when your code may get used in such an environment.  This is just a defense coding method I have chosen to use and it has saved my butt numerous times.

  • Brahmanand Shukla - Thursday, September 27, 2018 9:01 AM

    I have written my own blog

    I also disagree with #17 Tables whose columns are not used in query should not be there in joins.

    I have used columns in joins even if they aren't used else where in the query.  The JOIN may be the only place columns may be used in a query depending on the data and the tables.

  • Brahmanand Shukla - Thursday, September 27, 2018 9:01 AM

    I have written my own blog

    And yes, I also disagree with #21 Don’t use CURSOR. Use WHILE loop in place of cursor.

    Yes, avoid cursors where possible.  But where appropriate a fire hose cursor can perform better than WHILE loop with temporary tables that require maintenance to support the loop.  Plus, even a cursor requires a WHILE loop.

  • Brahmanand Shukla - Thursday, September 27, 2018 9:01 AM

    I have written my own blog

    And #29 Avoid Dynamic SQL.

    Dynamic SQL is a tool.  If used appropriately it is good, but used inappropriately it is evil.  There are times that using dynamic SQL is necessary.  When using dynamic SQL be sure to code defensively to avoid SQL injection.  Use EXEC sp_executesql so you can also send appropriate data as variables to dynamic SQL where this makes sense.  Also TVP make sense in this case as well to send multiple values where needed.

  • Brahmanand Shukla - Thursday, September 27, 2018 9:01 AM

    I have written my own blog

    Reread number 1 and it is an agree/strongly disagree: SET NOCOUNT ON and TRANSATION ISOLATION LEVEL READ UNCOMMITED should be there at the beginning of Stored Procedure.

    You do know the problems that you can experience when using transaction isolation level READ UNCOMMITTED don't you?  This allows dirty reads, phantom reads.  If the data MUST BE RIGHT you don't want this.

  • I kinda disagree with #27 and 30, though I guess it depends on what version of SQL Server we're talking about. I wouldn't use @@ERROR, I'd use ERROR_NUMBER() and the other ERROR* functions instead. Plus I'd wrap things in a BEGIN TRY/BEGIN CATCH and deal with error and pending transaction in the CATCH (oh that reminds don't mention how to check for/deal with deal with an uncommitable transaction)..

    In #30, what does "properly return" mean? Output parameters? A custom error message? Why bother? The client will receive the error message (unless you swallow it in a CATCH, which you probably shouldn't do...I generally re-THROW the original error).

    Maybe you should qualify this as being best practices for you. 🙂

  • Hey, it's been a long time since anyone responded on this thread, but in case anyone is interested, I've recently published a blog post dealing with this issue of clustered indexes:

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply