Best primarykey and index

  • Hi all, i've to refactor a database with several tables with around 100.000.000 records each one.

    Each table has the following relevant columns:

    - ID (bigint)

    - Tenant (actually i have 10k tenants)

    - Year (all info has Always a filter search by year)

    - other 10/20 simple columns

    I was thinking to have:

    1) pk not clustered on ID

    2) clustered on (tenant,year,id)

    3) otger accessory not clustered index

    In alternative, i was thinking to apply a partition by year in each table.

    What do you think Will be the optimal architetture for thia database?

    All queries are by tenant and year.

    All updates are by id.

    The growth Will be of ~8.000.000 records year on many tables.

    Thanks

  • Thanks for posting your issue and hopefully someone will answer soon.

    This is an automated bump to increase visibility of your question.

  • I have to wonder why the Updates are being done by "Id" rather than by tenant and year.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi, the scenario most of the times is:

    1)Loading data in a grid (by tenant,year)

    2)user perform update on some rows, update by id

  • mto89 wrote:

    Hi, the scenario most of the times is:

    1)Loading data in a grid (by tenant,year)

    2)user perform update on some rows, update by id

    if you do your clustered index as you mentioned update by tenant, year, id may work better than just id.

  • mto89 wrote:

    1)Loading data in a grid (by tenant,year)

    Just one tenant at a time?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Yes all Is segregated by single tenant

  • Sounds to me like Tenant/Year might be the way to go for the queries, then.  Don't be alarmed by the high fragmentation that will occur on that clustered index.  It's just not going to matter and, unless you have "ExpAnsive" Updates involved (where rows become longer from things like a NULL Modified_BY or other variable width column being updated to a larger value), you won't have a thing to worry about when it comes to page density.  Such is the nature of "Sequential Siloed" indexes, which is what this will be.  I'd likely never defrag it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • This sounds more like a learning exercise than a real world situation.

    If it is a real situation you will have existing table structures and indexes. These will have usage stats to say how often they are used. If you have turned on Query Store (QS) you will have stats on the most frequent and worst performing queries. All of this information must be taken into account when doing a restructure.  Therefore the first place to start is looking at the stats. If QS is not turned on then get it active and capture at least 2 weeks stats before doing any design work.  Using real-world data will get a more usable and performant result than using predicted usage.

    If this is a learning exercise then you do not have any stats and are dependent on the very sparse details given in the exercise. All you have to go on is predicted usage which may differ greatly from what the real world gives.  To a large extent the design you suggest for a learning exercise is not overly important. The most important aspect is to present your reasons for choosing your design.

    Show why you chose a particular clustered index, a non-clustered index, or a columnstore index.  Highlight the tasks needed to do the refactoring of the various size tables, and also show why your design will need to be reviewed when real-world data is available.  Hopefully the examiner will also be more interested in your reasoning than if your design matches what they have as a hypothetical 'right' answer.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

  • This was removed by the editor as SPAM

  • it is a good idea to carefully analyze the workload on the database and the types of queries that will be run before deciding on the optimal architecture. You may want to consider using tools such as indexes, partitioning, and materialized views to optimize the performance of your database. It is also a good idea to monitor the performance of the database over time and make adjustments as needed to ensure that it continues to meet the needs of your application.

  • To be absolutely clear, you want a:

    2) unique clustered [index] on (tenant,year,id)

    And if you partition the data (which typically wouldn't be necessary here), partition it by (tenant, year) not just tenant.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher wrote:

    To be absolutely clear, you want a:

    2) unique clustered [index] on (tenant,year,id)

    And if you partition the data (which typically wouldn't be necessary here), partition it by (tenant, year) not just tenant.

    Supposedly, "Tenant" has more than 10 thousand members to it.  Why would you partition on it?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    ScottPletcher wrote:

    To be absolutely clear, you want a:

    2) unique clustered [index] on (tenant,year,id)

    And if you partition the data (which typically wouldn't be necessary here), partition it by (tenant, year) not just tenant.

    Supposedly, "Tenant" has more than 10 thousand members to it.  Why would you partition on it?

    I would so that it matches the clustered index (leading column) and how you query the table.  Again, you typically wouldn't cluster this table, but if you did ...

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher wrote:

    Jeff Moden wrote:

    ScottPletcher wrote:

    To be absolutely clear, you want a:

    2) unique clustered [index] on (tenant,year,id)

    And if you partition the data (which typically wouldn't be necessary here), partition it by (tenant, year) not just tenant.

    Supposedly, "Tenant" has more than 10 thousand members to it.  Why would you partition on it?

    I would so that it matches the clustered index (leading column) and how you query the table.  Again, you typically wouldn't cluster this table, but if you did ...

    taking in consideration the known fact that there are 10k tenants on the table, partitioning by tenant and year is a very bad advise as they would only be able to have a single year worth of data on that table due to the again know limit of 15k partitions per table.

    as the OP clearly stated that the filtering is ALWAYS by Tenant and Year having a partition by year would work well without reaching the 15k partition limit, would do partition elimination as desired, and then further filtering based on tenant.

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply