Best practice when to use an index

  • quote:


    that's what the tables are. 2, maybe 3 columns.

    the data inserted hardly never changes. From year to year I guess.


    If the table is only a couple of columns wide and the average row length is, say, 100 bytes (assuming you have an integer field and a varchar() field of an average 96 bytes) then you can get nearly 80 rows in a single page!

    Allow perhaps one or two more pages for each index, and we really aren't talking much I/O here or even processor cycles. Not a big thing, and probably not the bottleneck in your application!

    Regards

    Si

  • Hi Si,

    quote:


    If the table is only a couple of columns wide and the average row length is, say, 100 bytes (assuming you have an integer field and a varchar() field of an average 96 bytes) then you can get nearly 80 rows in a single page!

    Allow perhaps one or two more pages for each index, and we really aren't talking much I/O here or even processor cycles. Not a big thing, and probably not the bottleneck in your application!


    I'm not looking for bottlenecks, just for some more or less scientific explanation whether to use this or that.

    But, you are right. In these tables is mostly an int field and a varchar(50). So more than enough will fit into one page. So, why use an index anyway, that's what I was originally asking. As in most thread it turned into something different and has opened new alternatives to think seriously over.

    Cheers,

    Frank

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • quote:


    So, why use an index anyway, that's what I was originally asking.


    In my book, indexes are for data integrity or performance.

    Since performance is unlikely to be an issue here, the question is whether you need to use an index for data integrity I guess. Am I right in thinking that a UNIQUE constraint in SQL Server is an index anyway?

  • Hi Si,

    quote:


    In my book, indexes are for data integrity or performance.


    please don't get me wrong, but it seems that a lot of people posting in this forum are writing books! Am I missing something?

    quote:


    Since performance is unlikely to be an issue here, the question is whether you need to use an index for data integrity I guess. Am I right in thinking that a UNIQUE constraint in SQL Server is an index anyway?


    that's a tricky question!

    From my understanding a UNIQUE constraint only enforces uniqueness (like a PrimaryKey), but as you are able to define such a constraint on a column that allows null values, it is an NOT index per definition.

    Cheers,

    Frank

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • quote:


    quote:


    In my book, indexes are for data integrity or performance.


    please don't get me wrong, but it seems that a lot of people posting in this forum are writing books! Am I missing something?


    "In my book" in an English idiom - it roughly translates to "to my knowledge" or "in my opinion".

    quote:


    From my understanding a UNIQUE constraint only enforces uniqueness (like a PrimaryKey), but as you are able to define such a constraint on a column that allows null values, it is an NOT index per definition.


    Just checked in Books Online, and...

    Primary Keys do not allow NULLs, but UNIQUE constraints and indexes do. Creating either one implicitly creates an index.

    --

    Si Chan

    Database Administrator

  • Hi Si,

    quote:


    "In my book" in an English idiom - it roughly translates to "to my knowledge" or "in my opinion".


    Well, it hits me again

    quote:


    Just checked in Books Online, and...

    Primary Keys do not allow NULLs, but UNIQUE constraints and indexes do. Creating either one implicitly creates an index.


    You're right, I've found it, too

    Cheers,

    Frank

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • someone said

    I tend to save my clustered index for something that would benefit from it. Normally this would be the primary key but I don't feel there is much to be gained from clustering an identity column.

    Um, clustering is awesome for performance when you are joining tables.

    If I have a reference table (id int, name varchar) I am most definitely going to cluster index and primary key the id column. Watch the join performance increase. And since joining is the whole purpose of a reference table then of course this is best practice.

    Always, always, always cluster/primary key the ident in a reference table.


    Michael R. Schmidt
    Developer

  • quote:


    Always, always, always cluster/primary key the ident in a reference table.


    OK, I have a table with umpteen fields one of which is a date field. The most common query on this table is using ranges of dates. For this reason I put my clustered index on the date column.

    In another example my table contains a user entered alphanumeric reference to a document. This reference has to be unique but my primary key has been defined as an int identity. In this case I put my clustered index on the unique document reference column rather than the primary key.

    Why don't I put my primary key on the document reference? Its a VARCHAR(20), integer primary keys take up less room and are quicker.

    If my table is a simple look-up table then, yes, the identity column is both primary key and clustered.

  • David, quick question. When you select from this table are you joining to it or from it? When you join is it based on the identity column?

    If you are joining to it with a where on the varchar(20) unique column I would suggest trying a covering index on identity/varchar, this would allow the join to happen quickly and also allow the where clause to evaluate with the same index.

    Michael R. Schmidt

    Developer


    Michael R. Schmidt
    Developer

  • quote:


    David, quick question. When you select from this table are you joining to it or from it? When you join is it based on the identity column?


    If I am going to join then it will probably be on the identity column, which will be the primary key.

    I would say that in 80% of the time I would be joining either to or from the table.

    I'm not exactly sure what you mean by a covering index?

    I have a unclustered primary key on the identity and a unique clustered index on the varchar(20). The users are in blissful ignorance of the existence of the primary key column, as far as they are concerned their data is identified by ZZ9 plural alpha or what-ever hence their search.

    There is a pattern to their varchar(20) identifiers and yes, I did try and identify a rule that would make it a calculated value but there were so many exceptions that I left it as user entered field.

  • If you are 80% of the time joining on the identity then you will probably want that to be your primary key

    If you are joining on identity and using the varchar(20) field in a where clause at the same time you should try the following primary/covering index

    identity

    varchar(20)

    or maybe in reverse

    varchar(20)

    identity

    haven't thought through that entirely - my brain is still coming up to speed after spending two nights in the brush with the family sleeping on some very hard ground.

    I'd like to see how it performs. If it is a small table but is joined to from a large table then the speedup may be significant since it will be repetitive lookups.

    Michael R. Schmidt

    Developer


    Michael R. Schmidt
    Developer

  • My identity column is ALWAYS my primary key, otherwise there would be no point in having it. It tends to be my clustered index but won't necessarily be so.

    If I am understanding you correctly then you are suggesting that I place both my primary key and the unique VARCHAR(20) field into a clustered index.

    The primary key on my identity column would enforce uniqueness.

    The clustered compound index would aid joins and searches as the leaf nodes hold data not pointers to data.

    Is that it?

  • Something like that. I like to have the joined element clustered, it definitely helps join performance. Also, when you are using a where clause it is helpful to include that column in the index that is being used, which in this case would be the primary clustered index/key.

    Michael R. Schmidt

    Developer


    Michael R. Schmidt
    Developer

  • David:

    quote:


    My identity column is ALWAYS my primary key, otherwise there would be no point in having it.


    I don't understand your point? Why do you so forcefully express that your Identity column HAVE to be your primary key?

    Mikey:

    Say you have like in David's example a big amout of records containing dates... Say that the joins on that table still always have a WHERE or a JOIN on a date range? How do you think the strategy on always clustering your Ident column (if you are talking about single column indices) will work out performance wise? Say that the range is identifying 5% of the records, would it be faster to do a tablescan (100% of records checked) through your 'range' of Ident values to find the rows to join containing dates that match? Or would it be faster with an index seek on the range and use just those 5% rows to do your join? But if you start talking composit indices it's another story. Say you would cluster <DateColumn>,<Ident> and your table is wide and no other data in the table is used, you would most definitly have a hugh performance increase due to the fact that SS 2000 would just use the narrow Index when handling the data and not going down to table data level. So I would say it is quite impossible to have a 'Always Do Your Indices Like That' point of view. The amout of appliable different cases are to big.

    Regards, Hans!

    P.S Ready for Flame

    Edited by - HansLindgren on 07/09/2003 2:53:57 PM

  • quote:


    Mikey:

    Say you have like in David's example a big amout of records containing dates... Say that the joins on that table still always have a WHERE or a JOIN on a date range? How do you think the strategy on always clustering your Ident column (if you are talking about single column indices) will work out performance wise?


    Joins on dates are a bad idea. Anyway you can replace your dates with ints?

    There was a good article posted on this topic before

    http://www.sqlservercentral.com/columnists/bsyverson/sqldatetime.asp

    Michael R. Schmidt

    Developer

    Edited by - mikeymikey on 07/09/2003 6:37:37 PM


    Michael R. Schmidt
    Developer

Viewing 15 posts - 31 through 45 (of 53 total)

You must be logged in to reply to this topic. Login to reply