Best primarykey and index

Question

Post reply

Best primarykey and index

mto89

SSC Enthusiast

Points: 179
More actions
November 24, 2022 at 8:45 pm

#4117710

Hi all, i've to refactor a database with several tables with around 100.000.000 records each one.
Each table has the following relevant columns:
- ID (bigint)
- Tenant (actually i have 10k tenants)
- Year (all info has Always a filter search by year)
- other 10/20 simple columns
I was thinking to have:
1) pk not clustered on ID
2) clustered on (tenant,year,id)
3) otger accessory not clustered index
In alternative, i was thinking to apply a partition by year in each table.
What do you think Will be the optimal architetture for thia database?
All queries are by tenant and year.
All updates are by id.
The growth Will be of ~8.000.000 records year on many tables.
Thanks
December 30, 2022 at 6:59 am

This was removed by the editor as SPAM

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply

Site Owners SSC Guru Points: 80320 More actions · Answer 1

Thanks for posting your issue and hopefully someone will answer soon.

This is an automated bump to increase visibility of your question.

Jeff Moden SSC Guru Points: 1003872 More actions · Answer 2

I have to wonder why the Updates are being done by "Id" rather than by tenant and year.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

mto89 SSC Enthusiast Points: 179 More actions · Answer 3

Hi, the scenario most of the times is:

1)Loading data in a grid (by tenant,year)

2)user perform update on some rows, update by id

frederico_fonseca SSCoach Points: 16023 More actions · Answer 4

mto89 wrote:

Hi, the scenario most of the times is:
1)Loading data in a grid (by tenant,year)
2)user perform update on some rows, update by id

if you do your clustered index as you mentioned update by tenant, year, id may work better than just id.

Jeff Moden SSC Guru Points: 1003872 More actions · Answer 5

mto89 wrote:

1)Loading data in a grid (by tenant,year)

Just one tenant at a time?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

mto89 SSC Enthusiast Points: 179 More actions · Answer 6

mto89

SSC Enthusiast

Points: 179

November 27, 2022 at 11:50 am

#4118587

Yes all Is segregated by single tenant

Jeff Moden SSC Guru Points: 1003872 More actions · Answer 7

Sounds to me like Tenant/Year might be the way to go for the queries, then. Don't be alarmed by the high fragmentation that will occur on that clustered index. It's just not going to matter and, unless you have "ExpAnsive" Updates involved (where rows become longer from things like a NULL Modified_BY or other variable width column being updated to a larger value), you won't have a thing to worry about when it comes to page density. Such is the nature of "Sequential Siloed" indexes, which is what this will be. I'd likely never defrag it.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

EdVassie SSC Guru Points: 60445 More actions · Answer 8

This sounds more like a learning exercise than a real world situation.

If it is a real situation you will have existing table structures and indexes. These will have usage stats to say how often they are used. If you have turned on Query Store (QS) you will have stats on the most frequent and worst performing queries. All of this information must be taken into account when doing a restructure. Therefore the first place to start is looking at the stats. If QS is not turned on then get it active and capture at least 2 weeks stats before doing any design work. Using real-world data will get a more usable and performant result than using predicted usage.

If this is a learning exercise then you do not have any stats and are dependent on the very sparse details given in the exercise. All you have to go on is predicted usage which may differ greatly from what the real world gives. To a large extent the design you suggest for a learning exercise is not overly important. The most important aspect is to present your reasons for choosing your design.

Show why you chose a particular clustered index, a non-clustered index, or a columnstore index. Highlight the tasks needed to do the refactoring of the various size tables, and also show why your design will need to be reviewed when real-world data is available. Hopefully the examiner will also be more interested in your reasoning than if your design matches what they have as a hypothetical 'right' answer.

Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

Sasuke2690 Newbie Points: 2 More actions · Answer 9

it is a good idea to carefully analyze the workload on the database and the types of queries that will be run before deciding on the optimal architecture. You may want to consider using tools such as indexes, partitioning, and materialized views to optimize the performance of your database. It is also a good idea to monitor the performance of the database over time and make adjustments as needed to ensure that it continues to meet the needs of your application.

spider solitaire 2 suit

ScottPletcher SSC Guru Points: 100949 More actions · Answer 10

To be absolutely clear, you want a:

2) unique clustered [index] on (tenant,year,id)

And if you partition the data (which typically wouldn't be necessary here), partition it by (tenant, year) not just tenant.

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Jeff Moden SSC Guru Points: 1003872 More actions · Answer 11

ScottPletcher wrote:

To be absolutely clear, you want a:
2) unique clustered [index] on (tenant,year,id)
And if you partition the data (which typically wouldn't be necessary here), partition it by (tenant, year) not just tenant.

Supposedly, "Tenant" has more than 10 thousand members to it. Why would you partition on it?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ScottPletcher SSC Guru Points: 100949 More actions · Answer 12

Jeff Moden wrote:

ScottPletcher wrote:
To be absolutely clear, you want a:
2) unique clustered [index] on (tenant,year,id)
And if you partition the data (which typically wouldn't be necessary here), partition it by (tenant, year) not just tenant.
Supposedly, "Tenant" has more than 10 thousand members to it. Why would you partition on it?

I would so that it matches the clustered index (leading column) and how you query the table. Again, you typically wouldn't cluster this table, but if you did ...

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

frederico_fonseca SSCoach Points: 16023 More actions · Answer 13

ScottPletcher wrote:

Jeff Moden wrote:
ScottPletcher wrote:
To be absolutely clear, you want a:
2) unique clustered [index] on (tenant,year,id)
And if you partition the data (which typically wouldn't be necessary here), partition it by (tenant, year) not just tenant.
Supposedly, "Tenant" has more than 10 thousand members to it. Why would you partition on it?
I would so that it matches the clustered index (leading column) and how you query the table. Again, you typically wouldn't cluster this table, but if you did ...

taking in consideration the known fact that there are 10k tenants on the table, partitioning by tenant and year is a very bad advise as they would only be able to have a single year worth of data on that table due to the again know limit of 15k partitions per table.

as the OP clearly stated that the filtering is ALWAYS by Tenant and Year having a partition by year would work well without reaching the 15k partition limit, would do partition elimination as desired, and then further filtering based on tenant.