Primary Key / Index Question

  • SQL Server 2008 R2

    With the following example table

    UploadID int IDENTITY(1,1) NOT NULL,

    PeriodNo int NOT NULL,

    UploadType tinyint NOT NULL,

    col1 varchar(12) NULL,

    col2 varchar(2) NULL,

    col3 varchar(2) NULL,

    col4 varchar(2) NULL,

    col5 varchar(2) NULL,

    col6 varchar(8) NULL,

    col7 varchar(8) NULL,

    col8 varchar(8) NULL,

    col9 varchar(8) NULL,

    col10 varchar(8) NULL,

    col11 varchar(12) NULL,

    col12 varchar(4) NULL,

    col13 varchar(4) NULL,

    col14 varchar(2) NULL,

    col15 varchar(2) NULL,

    col16 varchar(3) NULL,

    col17 varchar(2) NULL,

    col18 varchar(20) NULL,

    col19 varchar(20) NULL,

    col20 varchar(11) NULL

    Insertions approx 1000-2000 per month

    col1-col20 may or may not contain values

    col1-col20 not suitable for primary key

    Inserts by PeriodNo/UploadType

    Deletes by PeriodNo, PeriodNo/UploadType or UploadID

    Select by PeriodNo/UploadType

    Rows can be deleted and reinserted many times and the data may or may not be the same

    The question is what would you suggest for a primary key and indexes to reduce fragmentation and aid performance?

    Far away is close at hand in the images of elsewhere.
    Anon.

  • I could make a wild guess but my best answer, since you haven't provided enough information to do more than just guess, is "it depends".

    Some of the questions that need answering to get more than a guess are:-

    1) How frequent are deletes?

    2) How frequent are updates? Or are there none?

    3) What does Row mean in the statement "Rows can be deleted and reinserted many times and the data may or may not be the same"? The statement implies that there is some way of determining whether two inserts at different times are inserting (and reinserting) the same row, ie there is at least one natural key. So what natural keys are there? Is there only one, or more than one?

    It might also be useful to know something about value ranges - for example does uploadType take on all 256 possible distinct values or just a dozen?

    Tom

  • Since UploadID is an identity, I assume deleted rows won't be reinserted so pages will generally become empty instead of split.

    Personally I wouldn't care about fragmentation on such a tiny table with tiny rows. I'd probably make it a heap table. Create some indexes to handle SELECTS on the two main columns. Defrag it once every few months if the empty pages bugs you.

    Spend 80% of your time optimizing your largest and most active tables.

  • Thanks for the feedback Tom

    1) How frequent are deletes?

    Unknown as the user can run a process any number of times or not at all

    2) How frequent are updates? Or are there none?

    None, data is only inserted and deleted, ie delete rows for matching PeriodNo and UploadType then reinsert

    3) What does Row mean in the statement "Rows can be deleted and reinserted many times and the data may or may not be the same"?

    There can be any number of rows per PeriodNo/UploadType combination or none at all

    PeriodNo indicates a fiscal month ie 12 per fiscal year

    There are currently 4 Upload Types (eg 1,2,3,4) but this may increase in the future

    This table is part of a monthly process, at the start of the process rows are deleted matching the Period being run

    The user has the option of inserting data for each Upload Type,

    this process deletes rows for the selected Upload Type for the Period being run

    then inserts data from other tables for the selected Upload Type for the Period being run

    col1 may have the same data repeated in several rows or could be null

    this is the same for col2 to col20

    it is most likely that the combination of col1 to col20 would be unique but cannot be guaranteed

    (Note that the user will have the ability to delete individual rows or a selection of rows)

    The reason for this is that if the user creates the data but finds a problem with it they would correct the originating data and recreate the data as outlined above

    This can be reapeated any number of times at any frequency for any of the PeriodNo/UploadType combinations

    Or the user may elect to start the whole process from the beginning

    Far away is close at hand in the images of elsewhere.
    Anon.

  • Thanks for teh extra informtion. Having seen that I think I would go for using UploadID as UNCLUSTERED primary key and have a CLUSTERED index with key (PeriodNo, UploadType). But there are people around here with far more expertise on choosing indexes than I have, so I'll ask one of them to step in and comment.

    Tom

  • Thanks Tom 🙂

    Far away is close at hand in the images of elsewhere.
    Anon.

  • If UploadID is used for anything at all, I might make it part of a composite clustered index. But, as Tom suggested, make it a separate UNCLUSTERED PK first.

    Or I might add a non-clustered Index which includes all three afore mentioned columns.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Brandie Tarvin (6/3/2015)


    If UploadID is used for anything at all, I might make it part of a composite clustered index. But, as Tom suggested, make it a separate UNCLUSTERED PK first.

    Or I might add a non-clustered Index which includes all three afore mentioned columns.

    Thanks Brandie

    Because the nature of the data in the additional columns and the need to be able to delete a single row or a selection of rows I need UploadID to make rows unique

    Far away is close at hand in the images of elsewhere.
    Anon.

  • David Burrows (6/3/2015)


    Brandie Tarvin (6/3/2015)


    If UploadID is used for anything at all, I might make it part of a composite clustered index. But, as Tom suggested, make it a separate UNCLUSTERED PK first.

    Or I might add a non-clustered Index which includes all three afore mentioned columns.

    Thanks Brandie

    Because the nature of the data in the additional columns and the need to be able to delete a single row or a selection of rows I need UploadID to make rows unique

    I know the feeling. We have the same issue in several of our databases. Sometimes we use the IDENTITY as a JOIN condition, sometimes it's just to keep things straight, especially when it comes to potentially duplicated data.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • TomThomson (6/3/2015)


    Thanks for teh extra informtion. Having seen that I think I would go for using UploadID as UNCLUSTERED primary key and have a CLUSTERED index with key (PeriodNo, UploadType). But there are people around here with far more expertise on choosing indexes than I have, so I'll ask one of them to step in and comment.

    If the PeriodNo is ever increasing, then yeah, I agree.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (6/3/2015)


    If the PeriodNo is ever increasing, then yeah, I agree.

    Thanks Jeff

    Yes PeriodNo will always be increasing

    I will go with Tom's suggestion 🙂

    Far away is close at hand in the images of elsewhere.
    Anon.

  • David Burrows (6/3/2015)


    Jeff Moden (6/3/2015)


    If the PeriodNo is ever increasing, then yeah, I agree.

    Thanks Jeff

    Yes PeriodNo will always be increasing

    I will go with Tom's suggestion 🙂

    Just one more addition... if the two columns that Tom suggested do NOT make for a UNIQUE clustered index, consider adding the UploadID column as a 3rd column to the clustered index to make it unique and then define it that way. SQL Server has a mighty big love for unique clustered indexes.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (6/3/2015)


    David Burrows (6/3/2015)


    Jeff Moden (6/3/2015)


    If the PeriodNo is ever increasing, then yeah, I agree.

    Thanks Jeff

    Yes PeriodNo will always be increasing

    I will go with Tom's suggestion 🙂

    Just one more addition... if the two columns that Tom suggested do NOT make for a UNIQUE clustered index, consider adding the UploadID column as a 3rd column to the clustered index to make it unique and then define it that way. SQL Server has a mighty big love for unique clustered indexes.

    Thanks Jeff 🙂

    Far away is close at hand in the images of elsewhere.
    Anon.

  • Jeff Moden (6/3/2015)


    David Burrows (6/3/2015)


    Jeff Moden (6/3/2015)


    If the PeriodNo is ever increasing, then yeah, I agree.

    Thanks Jeff

    Yes PeriodNo will always be increasing

    I will go with Tom's suggestion 🙂

    Just one more addition... if the two columns that Tom suggested do NOT make for a UNIQUE clustered index, consider adding the UploadID column as a 3rd column to the clustered index to make it unique and then define it that way. SQL Server has a mighty big love for unique clustered indexes.

    Which is what I was saying in my first post.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Braindead comment removed. Must remember to switch brain on before applying fingers to keyboard.

    Tom

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply