Indexing Issue

Question

Indexing Issue

Viewing 6 posts - 16 through 21 (of 21 total)

You must be logged in to reply to this topic. Login to reply

Sergiy SSC Guru Points: 110209 More actions · Answer 1

Ok, from what you're saying I can see several key issues.

1. It's all in a single table . Big mistake. Huge.

Manufacturer references, vendor catalogue data, sales qty and pricing, reps details, etc. have no business in the same table.

2. Incorrect choice of clustered index.

Due to that logically connected parts are scattered all over the table, the transaction affects pages containing data for totally irrelevant parts.

3. Updating such a fat table creates a long transaction with a heavy locking. You lock every page with data from the affected row(s) and every index referring the data from the affected row(s). And it does not matter how many fields do you actually update - a whole record gets updated anyway.

I doubt you need to update Manufacture Part Number too often. But having it in a single fat record causes it to be locked by any pricing update, probably even of totally irrelevant parts.

4. When you select "whatever columns user selects" you still have to read the whole record(s) matching the criteria.

It still has to apply a shared lock affecting every column in the recordset. So, if some othe user is updating a price it will block you from searching a manufacturer parts numbers.

Splitting data into several tables will allow you to have multiple clustered indexes over the same recordset, and it will localise effect from an update to that part which is actually updated. Not to mention - smaller IO load, which means - faster completion and shorter locking period.

If a subset of the columns chosen by a user to be selected comes from one or 2 tables within the joined group, SQL Server may choose to read only those tables which contain requested data. Updating delivery status will never block another user from searching part descriptions in Catalog.

All your problems and performance issues come from violating those boring normalisation rules.

Now - to all the haters of Joe Celko - tell me he is wrong. 🙂

_____________
Code for TallyGenerator

tfeuz Ten Centuries Points: 1196 More actions · Answer 2

Sergiy - Wednesday, November 15, 2017 2:34 PM
Ok, from what you're saying I can see several key issues.1. It's all in a single table . Big mistake. Huge.Manufacturer references, vendor catalogue data, sales qty and pricing, reps details, etc. have no business in the same table.2. Incorrect choice of clustered index.Due to that logically connected parts are scattered all over the table, the transaction affects pages containing data for totally irrelevant parts.3. Updating such a fat table creates a long transaction with a heavy locking. You lock every page with data from the affected row(s) and every index referring the data from the affected row(s). And it does not matter how many fields do you actually update - a whole record gets updated anyway. I doubt you need to update Manufacture Part Number too often. But having it in a single fat record causes it to be locked by any pricing update, probably even of totally irrelevant parts.4. When you select "whatever columns user selects" you still have to read the whole record(s) matching the criteria. It still has to apply a shared lock affecting every column in the recordset. So, if some othe user is updating a price it will block you from searching a manufacturer parts numbers. Splitting data into several tables will allow you to have multiple clustered indexes over the same recordset, and it will localise effect from an update to that part which is actually updated. Not to mention - smaller IO load, which means - faster completion and shorter locking period.If a subset of the columns chosen by a user to be selected comes from one or 2 tables within the joined group, SQL Server may choose to read only those tables which contain requested data. Updating delivery status will never block another user from searching part descriptions in Catalog. All your problems and performance issues come from violating those boring normalisation rules. Now - to all the haters of Joe Celko - tell me he is wrong. 🙂

Thanks Sergiy!

A few more notes --

1. I am going to delay comment on this point until later in this response...

2. Clustered index - every operation the user can perform on this data is in the form "update #Catalogue_product set .... where productkey = X. So we thought that assigning the productkey as the clustered index was the correct thing to do.... locate the row needing to be updated quickly and do the update. Given every update uses only the productkey in the where clause how would you recommend we structure the clustered index?

3. I get your point here -- for ease of coding we decided to update the whole record, all the fields, regardless of what the user actually changed. Do you think going towards a solution that only updates fields that changed will decrease overall end to end time required for the user? I am asking not just about the time to execute the queries, but also the time to execute the code to build all the individual SQL update statements on a product by product basis, In most cases, the user is simply uploading a spreadsheet with all these values on them and we are not bothering to determine the deltas - just update everything.

4. I understand your point here and it is giving me reason to rethink this all. So going back to point #1 - let's assume I broke the one table into the into several small tables - each with a 1 to 1 mapping with the "main table", that would mean that each table would have the product key as the clustered index, correct? Now from an application standpoint, if I chose to be lazy and not actually determine what has changed in the product and decide to update all the rows in all the tables I am in the same position I am in today, correct?

It seems like I might be in this mess because we took the easy way out and didn't determine the true delta of the changes and only updated the fields that actually need updating. In other words, even if I normalize the table but update the entire row, I am locking the whole darn thing up anyway. Correct?

Michael L John One Orange Chip Points: 27079 More actions · Answer 3

This is a classic catch-all query.
I had this exact same issue to deal with, a search screen with an almost an infinite number of possibilities.

I captured the parameters, and values, for each execution of the proc behind this search screen.
The data showed that over 90% of the searches were to return all fields, with part number as the only parameter being used. #2 was no parameters, return everything.

Armed with that info, we called the "return all" proc, but we limited it to 25% of the total records. If the part number was supplied, and all the columns were picked, we called a proc tuned specifically for that.
Any other combination, we called a dynamic SQL proc.

Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/

tfeuz Ten Centuries Points: 1196 More actions · Answer 4

Michael L John - Wednesday, November 15, 2017 3:20 PM
This is a classic catch-all query.
I had this exact same issue to deal with, a search screen with an almost an infinite number of possibilities.
I captured the parameters, and values, for each execution of the proc behind this search screen.
The data showed that over 90% of the searches were to return all fields, with part number as the only parameter being used. #2 was no parameters, return everything.
Armed with that info, we called the "return all" proc, but we limited it to 25% of the total records. If the part number was supplied, and all the columns were picked, we called a proc tuned specifically for that.
Any other combination, we called a dynamic SQL proc.

Thanks Mike - what would a "proc tuned specifically for that" be? I am specifically interest in the tuning part..

Sergiy SSC Guru Points: 110209 More actions · Answer 5

tfeuz - Wednesday, November 15, 2017 3:14 PM
2. Clustered index - every operation the user can perform on this data is in the form "update #Catalogue_product set .... where productkey = X. So we thought that assigning the productkey as the clustered index was the correct thing to do.... locate the row needing to be updated quickly and do the update. Given every update uses only the productkey in the where clause how would you recommend we structure the clustered index?
3. I get your point here -- for ease of coding we decided to update the whole record, all the fields, regardless of what the user actually changed. Do you think going towards a solution that only updates fields that changed will decrease overall end to end time required for the user? I am asking not just about the time to execute the queries, but also the time to execute the code to build all the individual SQL update statements on a product by product basis, In most cases, the user is simply uploading a spreadsheet with all these values on them and we are not bothering to determine the deltas - just update everything.
4. I understand your point here and it is giving me reason to rethink this all. So going back to point #1 - let's assume I broke the one table into the into several small tables - each with a 1 to 1 mapping with the "main table", that would mean that each table would have the product key as the clustered index, correct? Now from an application standpoint, if I chose to be lazy and not actually determine what has changed in the product and decide to update all the rows in all the tables I am in the same position I am in today, correct?
It seems like I might be in this mess because we took the easy way out and didn't determine the true delta of the changes and only updated the fields that actually need updating. In other words, even if I normalize the table but update the entire row, I am locking the whole darn thing up anyway. Correct?

2. When you pick a record "where productkey = X" clustering does not matter at all.
Bookmark lookup for a single record adds minimal overhead (assuming productkey is a unique constraint).
So, productkey does not have to be clustered, I'd say is has to be non-clustered.

Clustering is important when you select records by range.
In your case good candidadtes for clustering would be:
- EffectiveEndDate (for filtering "current" parts for any particular date);
- EditStatus (not sure, need to know actual meaning if it);
- SupplierKey (only if you search for parts from different suppliers separately);
- VendorPartNumber

ManufacturerPartNumber must be a part of clustered index in ManufacturerPart table, with a ManufacturerKey as a non-clustered PK.

I hope it gives you an idea.

3. For ease of coding you may create a view based on those joined tables and update the view.
If the view does not have aggregations or other complications (see Updateable Views in BOL) updates against it will work as good as against your table. SQL server will figure out which tables are affected by the update and apply exclusive locks only against those tables, not every one involved in the view.

4. No, there should not be 1 to 1 mapping.
Each VendorPartNumber should be recorded in your database once, and only once.
It's a natural key, unique identifier generated by Vendor, and you must have a unique key on this column in VendorPart column.

Read about normalization rules, identify your entities and create a proper relational design for your database.
It may take a day or two, but I bet you've spent much more resolving performance issues caused by saving those 2 days.

_____________
Code for TallyGenerator

Michael L John One Orange Chip Points: 27079 More actions · Answer 6

tfeuz - Wednesday, November 15, 2017 3:28 PM
Michael L John - Wednesday, November 15, 2017 3:20 PM
This is a classic catch-all query.
I had this exact same issue to deal with, a search screen with an almost an infinite number of possibilities.
I captured the parameters, and values, for each execution of the proc behind this search screen.
The data showed that over 90% of the searches were to return all fields, with part number as the only parameter being used. #2 was no parameters, return everything.
Armed with that info, we called the "return all" proc, but we limited it to 25% of the total records. If the part number was supplied, and all the columns were picked, we called a proc tuned specifically for that.
Any other combination, we called a dynamic SQL proc.
Thanks Mike - what would a "proc tuned specifically for that" be? I am specifically interest in the tuning part..

Specifically, the indexes that were put in place were specific to the part number query. If I remember correctly, that ended up being the clustered index.
That also covered the return all query. It contained a top XXX clause and the order by was by the part number.
The dynamic queries were evaluated, and appropriate indexes were created. This was essentially impossible given the number of possibilities, however.

Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/