QOD 11/21

  • quote:


    quote:


    That means that you now have to have 1 million unique last order dates in your table.


    Since when did clustered indices have to be unique?


  • quote:


    ... I just didn't expect people to read as much into the question.


    bad, bad mistake as certain threads in the past have shown.

    Frank

    http://www.insidesql.de

    http://www.familienzirkus.de

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • quote:


    quote:


    That means that you now have to have 1 million unique last order dates in your table.


    Since when did clustered indices have to be unique?


    Not sure why my response didn't post with the original reply. Anyway, clustered indexes do not have to be unique. You are correct.

    Jason

  • quote:


    ejdelaune:

    quote:


    So, that also means that only one customer could place an order from you per day without violating the constraints on the table. BTW - 1 million unique days from today back into history is 2739 years (with a little left over), which would cause an overflow based on SQL Server 2000's current datetime data type.


    Here you go assuming things 😉 Why does the LAST_ORDER_DT have to contain DATES WITHOUT TIME? Is there some physical law in your country that makes columns with the postfix _DT ignore the time part in a DateTime field?

    Regards, Hans!


    No, we still have time included with the datetime data type. However, if you had placed a unique constraint on the last_order_dt column, isn't it possible that 2 customers placed an order at the same exact time (again assuming, already a bad decision I know, that the column is actually being updated), which would break the constraint? Again, my assumption was wrong by placing a unique constraint on the column.

    Jason

  • ejdelaune:

    Yes, you are right about that two customers could order at exactly the same time, I wanted just to point out that you wrote

    quote:


    only one customer could place an order from you per day


    TGIF!

    Cheers everyone and I am on my way to the pub!

  • quote:


    ejdelaune:

    Yes, you are right about that two customers could order at exactly the same time, I wanted just to point out that you wrote

    quote:


    only one customer could place an order from you per day


    TGIF!

    Cheers everyone and I am on my way to the pub!


    Yes, I forgot to account for the time factor. I guess I was looking forward to the Date datatype in Yukon where we don't have to worry about time anymore. Oh well, have a great time at the pub and drink a few for me!

    Jason

  • I am amazed with how many people here have associated a clustered index with a primary key... That's the way the gui does it by default, but by no means does the clustered index have to be on a primary key. You can still have a primary key that isn't clustered on a table with a clustered index.

    But, if the field(s) are not unique in the clustered index then a unique identifier will be added to the index. You just won't see it. This adds space to the table. General rule is to have your clustered indexes unique so the unique identifier isn't added. Remember too, that secondary indexes on a table with a clustered index point to the 'key value' in the clustered index (Great for index rebuilds). So any secondary indexes created on the table will have to have an extra column to provide for the unique identifier in the clustered index by date.

    Basically, there isn't enough info provided to decide what the best choice would be. If there are 50 columns I would go with a covered secondary index. Etc. There are a lot of factors that should be included when deciding where, when, and how to create an index.


    "Keep Your Stick On the Ice" ..Red Green

  • quote:


    I am amazed with how many people here have associated a clustered index with a primary key...


    I'm guilty of this statement. For a long time, I always thought that clustered index and primary key were synonymous. In fact, as you can tell by my first posting today, I still confuse the two.

    So, is it a general consensus that DBAs should, most of the time, create clustered indexes on the primary key column(s), and if so, when is it acceptable to create a clustered index on a column that is not part of the primary key?

    Jason

  • So... let me get this straight. No clustered indexes on tables that receive updates? Are clustered indexes then to be left only for dss and olap?

    Yes. It's true. Adding an index, clustered or not, has a negative impact on the performance of inserts, updates and deletes. It also can have a positive impact on selects. Does one outweigh the other? Maybe! In general, databases have mixed duty for both reads and writes. This can lead to lengthy discussion and testing to determine the best indexing strategies to use.

    However, the question stated that the query - a select - was either the only query in use, or at the very least the most frequently used query (as mentioned in the email version). This would seem to indicate that the job at hand is to optimize this table for read operations and not for writes.

    Further, has everyone forgotten the value of setting a fill factor appropriate to your environment? It is possible that you could find an appropriate fill factor that would allow all inserts to occur without splits and still maintain fast selects. Regular maintenance could check the fragmentation of this index and then defrag it as necessary.

    --just my 2 bits...

    -Dan B

  • A lot of debate, and good debate (IMO) where it comes to this question. Looking it over, I see that I made some assumptions about this table as well. Specifically, I made the subconcious assumption that the LAST_ORDER_DT / START_DT field [e-mail / web page] was being used Date only, not Date and Time. If this was true, then in almost all cases, a non-clustered index will not be used due to lack of selectivity.

    I also made the assumption that there are no dates in the table greater than the date in the source query (> '12/1/2003'). Did anyone else?

    Now, in the real world, the answer to this question will not be what we have here, since I can think of no cases where a customer table will have the given query as the only SELECT run against the table. But remember -- these are not the real world. These are "what if" scenarios (what if the moon was made of cheeze-whiz?), not reality 🙂

  • quote:


    So... let me get this straight. No clustered indexes on tables that receive updates?


    Not exactly. What I said was no clustered indexes on COLUMNS that are updated. What they cause is this. When a clustered index column is updated, the record is no longer in the proper location in the index. SQL Server must move the updated record to the correct location in the clustered index. If the data page at the new location is too full to accept the record, it must be spit. Fill factor settings mitigate this somewhat, but to be effective in this kind of situation, they would introduce a lot of wasted space. So you probably wind of with a lot of page splitting.

    Also, if there are non-clustered indexes, they all reference the cluster key value. So every update to a cluster key value requires every non-clustered index on that table to be updated as well.

    I suppose there could be exceptions to this rule (isn't there always?), but off-hand, none come to mind.

    quote:


    So, is it a general consensus that DBAs should, most of the time, create clustered indexes on the primary key column(s), and if so, when is it acceptable to create a clustered index on a column that is not part of the primary key?


    That is my usual method. The primary key is a surrogate key, so nobody is tempted to update it. I almost always use an integer with the identity property. Being only 4 bytes, it makes for more compact non-clustered indexes thereby improving performance. A GUID would also work, but is less compact.

    I then make the primary key clustered. Actually, SQL Server makes primary keys clustered by default, if a clustered index does not already exist. This is good, because my databases are Very relational, so foreign keys refering to this primary key can do so through the clustered index.

    This leads up to my answer of the question, when not to make the primary key clustered?

    I sometimes make foreign key columns clustered for child tables. For example, you have an Order table with OrderID as the primary key. You also have an OrderDetail table with a primary key OrderDetailID and a foreign key OrderID. Because access to the OrderDetail table is usually from joining to the Order table, sometimes with order ranges, it makes sense to make the foreign key (OrderID) in the OrderDetail table the clustered index. This eliminates the bookmark operation that you frequently see in execution plans for join operations. For those unfamiliar with the bookmark operation, this is where SQL Server has retrieved the record from the non-clustered index and now must locate the actual data in the clustered index.

    Eliminating the bookmark operation can be a substantial performance boost, especially if you can do this across multiple joins, say customer-order-orderdetail.

    My $.02 worth.

    Larry Ansley

    Atlanta, GA


    Larry Ansley
    Atlanta, GA

  • quote:


    If the data page at the new location is too full to accept the record, it must be spit.


    And this is very vulgar, especially in public.

    Larry Ansley

    Atlanta, GA


    Larry Ansley
    Atlanta, GA

  • I think i misunderstand the clustered index!

    I have a table with the following structure

    CREATE TABLE [Tb_Amz_UnitChange81] (

    [Fd_Ucg_Term] [char] (8) ,

    [Fd_Ucg_Regno] [int] NOT NULL ,

    [Fd_Ucg_ClassId] [int] NOT NULL ,

    [Fd_Ucg_CourseId] [int] NOT NULL ,

    [Fd_Ucg_Action] [smallint] NOT NULL ,

    [Fd_Ucg_FormType] [char] (8) ,

    [Fd_Ucg_PermitNumber] [char] (15) ,

    [Fd_Ucg_PermitDate] [char] (10) ,

    [Fd_Ucg_SaveUserName] [char] (50) ,

    [Fd_Ucg_FormNumber] [smallint] NOT NULL ,

    [Fd_Ucg_Year] [smallint] NOT NULL ,

    CONSTRAINT [PK_Tb_Amz_UnitChange81] PRIMARY KEY CLUSTERED

    (

    [Fd_Ucg_Term],[Fd_Ucg_Regno],[Fd_Ucg_ClassId],[Fd_Ucg_CourseId],

    [Fd_Ucg_Action],[Fd_Ucg_FormType],[Fd_Ucg_FormNumber],[Fd_Ucg_Year]

    ) )

    we have many insert and delete against the table but no update .and

    most of our query like the following :

    select * from Tb_Amz_UnitChange81

    where Fd_Ucg_Year=xxxx

    and Fd_Ucg_Term='xxxxxxxx'

    and Fd_Ucg_Regno=xxxxxx

    I have two questions:

    1-should I change primary key from clustered to non clustered?since we don't need any presort on table .

    2-On a table we have a varchar (30) column for family .most of searching and sorting is based on this column ,

    is it a good idea to create clustered index on this column?

    I read somewhere when we use clustered index ,all nonclustered indexes carry value of clustered index as the key

    so it is not recommended clustered index on large column.

    any idea would be appreciate.

  • First remark: the phrasing of the correct answer was misleading. At first I thought it said to make a new primary key on the datetime column. That would have been rubbish.

    Secondly, one important piece of information was missing: approximatly how many rows the query returns.

    It it returns a large part of the table, SQL Server would alway do a table scan and the last answer would be right. If not, adding an index on the datetime column would help.

    Making that one clustered would help the query, but would slowdown updates on that column.

    Please provide more information next time for this kind of questions. Performance is influenced by more than one or two things.

    Erik

  • Am I missing something? Wouldn't creating an index on the three field in this query improve the performance more than using a clustered index, without the side effects and complications mentioned in many posts? IOW, create an index on LAST_ORDER_DT, CUSTOMER_NM, and COMPANY_NM. Although I don't usualy create indexes for specfic queries, it might be in order for one being executed once per second


    Sloan

    If you can't do it in SQL, get a bigger hammer!

Viewing 15 posts - 31 through 45 (of 54 total)

You must be logged in to reply to this topic. Login to reply