Two tables, exact same structure but with different names, behave differently while performing mass updates.

Question

Two tables, exact same structure but with different names, behave differently while performing mass updates.

Jay-45986

Old Hand

Points: 386
More actions
December 26, 2017 at 12:38 pm

#330649

Hi all,
This has me stumped. I'm performing mass updates to a table. I'm using Data Masker by Redgate to de-identify the data. It's basically a wide update, involving 8+ columns. The table has around 662K+ rows. I have the constraints disabled, along with the indexes, except for the clustered indexed pk column. The tool takes around 22 rows/sec executation rate to updates all the data. Now, what has me stumped is that when I make an EXACT copy of the same table, with the exact number of indexes, rows and constraints and try the exactly same update, it takes 645 rows/sec execution rate.
So to summarize. One table, 662K rows, 22 rows/sec to update all the records. Exact same copy of the aforementioned table, the performance boost is ten folds and it's executing at much much higher rate of 645 rows/sec. Could there be a buffer issue related to the existing table? or anything else you all could think of?
Thanks.
December 26, 2017 at 10:55 pm

This was removed by the editor as SPAM
December 27, 2017 at 8:53 am

This was removed by the editor as SPAM

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 1

Jay-45986 - Tuesday, December 26, 2017 12:38 PM
Hi all,
This has me stumped. I'm performing mass updates to a table. I'm using Data Masker by Redgate to de-identify the data. It's basically a wide update, involving 8+ columns. The table has around 662K+ rows. I have the constraints disabled, along with the indexes, except for the clustered indexed pk column. The tool takes around 22 rows/sec executation rate to updates all the data. Now, what has me stumped is that when I make an EXACT copy of the same table, with the exact number of indexes, rows and constraints and try the exactly same update, it takes 645 rows/sec execution rate.
So to summarize. One table, 662K rows, 22 rows/sec to update all the records. Exact same copy of the aforementioned table, the performance boost is ten folds and it's executing at much much higher rate of 645 rows/sec. Could there be a buffer issue related to the existing table? or anything else you all could think of?
Thanks.

You could still be comparing apples to oranges. The data in the "fast" table may actually be stored differently than the data in the "slow" table. We have very little information other than the number of rows of data and updates involving 8+ columns. There could be a lot of data movement in the "slow" table and little data movement in the "fast" table.

Jay-45986 Old Hand Points: 386 More actions · Answer 2

Lynn Pettis - Tuesday, December 26, 2017 1:00 PM
Jay-45986 - Tuesday, December 26, 2017 12:38 PM
Hi all,
This has me stumped. I'm performing mass updates to a table. I'm using Data Masker by Redgate to de-identify the data. It's basically a wide update, involving 8+ columns. The table has around 662K+ rows. I have the constraints disabled, along with the indexes, except for the clustered indexed pk column. The tool takes around 22 rows/sec executation rate to updates all the data. Now, what has me stumped is that when I make an EXACT copy of the same table, with the exact number of indexes, rows and constraints and try the exactly same update, it takes 645 rows/sec execution rate.
So to summarize. One table, 662K rows, 22 rows/sec to update all the records. Exact same copy of the aforementioned table, the performance boost is ten folds and it's executing at much much higher rate of 645 rows/sec. Could there be a buffer issue related to the existing table? or anything else you all could think of?
Thanks.
You could still be comparing apples to oranges. The data in the "fast" table may actually be stored differently than the data in the "slow" table. We have very little information other than the number of rows of data and updates involving 8+ columns. There could be a lot of data movement in the "slow" table and little data movement in the "fast" table.

Thank you for your reply Lynn.

To provide you more information. This is in a database that's on a sandbox and no other users and/or applications accessing it. It's just myself and I 🙂 To add more, both of these tables belong to the exact same file groups as well. There is 0 other activity in this database and into these tables, except for when I run queries to load and test.

Same # of rows, same exact structure (pks, indexes, constraints, filegroups etc.), 0 activity. Just the names are different.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 3

Jay-45986 - Tuesday, December 26, 2017 1:18 PM
Lynn Pettis - Tuesday, December 26, 2017 1:00 PM
Jay-45986 - Tuesday, December 26, 2017 12:38 PM
Hi all,
This has me stumped. I'm performing mass updates to a table. I'm using Data Masker by Redgate to de-identify the data. It's basically a wide update, involving 8+ columns. The table has around 662K+ rows. I have the constraints disabled, along with the indexes, except for the clustered indexed pk column. The tool takes around 22 rows/sec executation rate to updates all the data. Now, what has me stumped is that when I make an EXACT copy of the same table, with the exact number of indexes, rows and constraints and try the exactly same update, it takes 645 rows/sec execution rate.
So to summarize. One table, 662K rows, 22 rows/sec to update all the records. Exact same copy of the aforementioned table, the performance boost is ten folds and it's executing at much much higher rate of 645 rows/sec. Could there be a buffer issue related to the existing table? or anything else you all could think of?
Thanks.
You could still be comparing apples to oranges. The data in the "fast" table may actually be stored differently than the data in the "slow" table. We have very little information other than the number of rows of data and updates involving 8+ columns. There could be a lot of data movement in the "slow" table and little data movement in the "fast" table.
Thank you for your reply Lynn.
To provide you more information. This is in a database that's on a sandbox and no other users and/or applications accessing it. It's just myself and I 🙂 To add more, both of these tables belong to the exact same file groups as well. There is 0 other activity in this database and into these tables, except for when I run queries to load and test.
Same # of rows, same exact structure (pks, indexes, constraints, filegroups etc.), 0 activity. Just the names are different.

Well, we can't see what you see, and if EVERYTHING is identical, I don't buy it. Can you duplicate the problem in another sandbox database on the same server, a different server? Don't use detach/attach or backup/restore. Create the tables from scripts and move the data using T-SQL (same server) and Import/Export Wizard (SSIS, different servers).

Jay-45986 Old Hand Points: 386 More actions · Answer 4

These are quiet large databases, terabytes of data. So recreating it by scripting out the objects and then loading the data using SSIS or TSQL can be extremely time consuming and error-prone.

The idea is to de-identify this data in the lower environments (dev, staging, uat etc) I will be given a backup of a prod db, which is this exact database as the db/table(s) in question. I restore it and then use the tool to de-identify any PII or PHI data. I simply do not have the luxury of recreating and reloading everything, as these multiple requests would be coming in down the pipeline to restore/purge data on daily basis.

I have however, restored the exact same db and repeated the steps detailed in the original post on totally different box, with the same results.

I just wanted to know, whether anybody has dealt with something similar in the past? So let's just say for arguments sake that I am correct with having two tables (one existing, one new), same exact structure, same # of rows, but they behave differently while being mass loaded. Does it make sense? I just want to make sure that I haven't left a stone unturned with respect to troubleshooting this.

If you would like to see the DDL of the tables, I could share that as well.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 5

The only time I have seen two identical tables behave differently came down to the fact the data was not stored identical between the two tables. The data distribution was different even though the data was the same.

Jay-45986 Old Hand Points: 386 More actions · Answer 6

Lynn Pettis - Tuesday, December 26, 2017 2:03 PM
The only time I have seen two identical tables behave differently came down to the fact the data was not stored identical between the two tables. The data distribution was different even though the data was the same.

Would you mind explaining why should that make any difference?

To answer your question, no it's not stored any differently. Both of the tables have the data stored ordered by the bigint PK. The rows are ordered and match between the two tables based on the identity column.

Jay-45986 Old Hand Points: 386 More actions · Answer 7

subramaniam.chandrasekar - Tuesday, December 26, 2017 10:55 PM
Jay-45986 - Tuesday, December 26, 2017 1:56 PM
These are quiet large databases, terabytes of data. So recreating it by scripting out the objects and then loading the data using SSIS or TSQL can be extremely time consuming and error-prone.
The idea is to de-identify this data in the lower environments (dev, staging, uat etc) I will be given a backup of a prod db, which is this exact database as the db/table(s) in question. I restore it and then use the tool to de-identify any PII or PHI data. I simply do not have the luxury of recreating and reloading everything, as these multiple requests would be coming in down the pipeline to restore/purge data on daily basis.
I have however, restored the exact same db and repeated the steps detailed in the original post on totally different box, with the same results.
I just wanted to know, whether anybody has dealt with something similar in the past? So let's just say for arguments sake that I am correct with having two tables (one existing, one new), same exact structure, same # of rows, but they behave differently while being mass loaded. Does it make sense? I just want to make sure that I haven't left a stone unturned with respect to troubleshooting this.
If you would like to see the DDL of the tables, I could share that as well.
Were they any rebuild indexes that had happened on the original table ? (22 rows/sec), If not, Check for any backup jobs or maintenance jobs activity was happened at the same time of execution ? Could you please retry your operation and let us know whether you've got the same results on Original (22 rows/sec) & Duplicate ( 645 rows/sec).

Indexes are disabled on BOTH of the tables, including any and all fk constraints.

drew.allen SSC Guru Points: 76945 More actions · Answer 8

Jay-45986 - Wednesday, December 27, 2017 7:17 AM
Lynn Pettis - Tuesday, December 26, 2017 2:03 PM
The only time I have seen two identical tables behave differently came down to the fact the data was not stored identical between the two tables. The data distribution was different even though the data was the same.
Would you mind explaining why should that make any difference?
To answer your question, no it's not stored any differently. Both of the tables have the data stored ordered by the bigint PK. The rows are ordered and match between the two tables based on the identity column.

You're forgetting about fragmentation. The old table is likely to be fragmented and the new table is not. Even with exactly the same indexes and rows, the two tables are not the same.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

Jay-45986 Old Hand Points: 386 More actions · Answer 9

drew.allen - Wednesday, December 27, 2017 7:30 AM
Jay-45986 - Wednesday, December 27, 2017 7:17 AM
Lynn Pettis - Tuesday, December 26, 2017 2:03 PM
The only time I have seen two identical tables behave differently came down to the fact the data was not stored identical between the two tables. The data distribution was different even though the data was the same.
Would you mind explaining why should that make any difference?
To answer your question, no it's not stored any differently. Both of the tables have the data stored ordered by the bigint PK. The rows are ordered and match between the two tables based on the identity column.
You're forgetting about fragmentation. The old table is likely to be fragmented and the new table is not. Even with exactly the same indexes and rows, the two tables are not the same.
Drew

Indexes are defragmented, including the clust index on the old table.

ZZartin SSC-Dedicated Points: 30896 More actions · Answer 10

Jay-45986 - Wednesday, December 27, 2017 7:54 AM
drew.allen - Wednesday, December 27, 2017 7:30 AM
Jay-45986 - Wednesday, December 27, 2017 7:17 AM
Lynn Pettis - Tuesday, December 26, 2017 2:03 PM
The only time I have seen two identical tables behave differently came down to the fact the data was not stored identical between the two tables. The data distribution was different even though the data was the same.
Would you mind explaining why should that make any difference?
To answer your question, no it's not stored any differently. Both of the tables have the data stored ordered by the bigint PK. The rows are ordered and match between the two tables based on the identity column.
You're forgetting about fragmentation. The old table is likely to be fragmented and the new table is not. Even with exactly the same indexes and rows, the two tables are not the same.
Drew
Indexes are defragmented, including the clust index on the old table.

Just defragmented or did you rebuild the clustered index? If not did you update statistics after doing the defrags?

Jay-45986 Old Hand Points: 386 More actions · Answer 11

ZZartin - Wednesday, December 27, 2017 8:23 AM
Jay-45986 - Wednesday, December 27, 2017 7:54 AM
drew.allen - Wednesday, December 27, 2017 7:30 AM
Jay-45986 - Wednesday, December 27, 2017 7:17 AM
Lynn Pettis - Tuesday, December 26, 2017 2:03 PM
The only time I have seen two identical tables behave differently came down to the fact the data was not stored identical between the two tables. The data distribution was different even though the data was the same.
Would you mind explaining why should that make any difference?
To answer your question, no it's not stored any differently. Both of the tables have the data stored ordered by the bigint PK. The rows are ordered and match between the two tables based on the identity column.
You're forgetting about fragmentation. The old table is likely to be fragmented and the new table is not. Even with exactly the same indexes and rows, the two tables are not the same.
Drew
Indexes are defragmented, including the clust index on the old table.
Just defragmented or did you rebuild the clustered index? If not did you update statistics after doing the defrags?

I rebuilt the clustered index.

Jay-45986 Old Hand Points: 386 More actions · Answer 12

subramaniam.chandrasekar - Wednesday, December 27, 2017 8:53 AM
ZZartin - Wednesday, December 27, 2017 8:23 AM
Jay-45986 - Wednesday, December 27, 2017 7:54 AM
drew.allen - Wednesday, December 27, 2017 7:30 AM
Jay-45986 - Wednesday, December 27, 2017 7:17 AM
Lynn Pettis - Tuesday, December 26, 2017 2:03 PM
The only time I have seen two identical tables behave differently came down to the fact the data was not stored identical between the two tables. The data distribution was different even though the data was the same.
Would you mind explaining why should that make any difference?
To answer your question, no it's not stored any differently. Both of the tables have the data stored ordered by the bigint PK. The rows are ordered and match between the two tables based on the identity column.
You're forgetting about fragmentation. The old table is likely to be fragmented and the new table is not. Even with exactly the same indexes and rows, the two tables are not the same.
Drew
Indexes are defragmented, including the clust index on the old table.
Just defragmented or did you rebuild the clustered index? If not did you update statistics after doing the defrags?
So far we're spending the time on investigating within SQL Server, why cant we investigate on data masker and its issue with Original table (22 rows/sec) ? Worth spending ?

I'm doing that in parallel with the vendor as we speak. I just posed this question here, in case anyone in this forum has dealt with something similar before.