Help with performance - aggregating a lot of data

Question

Help with performance - aggregating a lot of data

Viewing 9 posts - 46 through 53 (of 53 total)

You must be logged in to reply to this topic. Login to reply

Kiara SSCarpal Tunnel Points: 4185 More actions · Answer 1

Evil Kraig F (11/1/2011)
Jeff Moden (10/31/2011)
I was simply going to add the date, EmpID, and 3 calculated columns to the detail table and hide them from the app using a passthrough view with the same name as the original table. Of course, the name of the detail table would change and the app would do all its work through the passthrough view being none the wiser to the underlying change that was made. And, yes... there'd be a key change to the indexing... the date would become the new clustered index column.
Now I grok. I was going the other direction, expecting you to use a passthrough for the new process, not sidestepping app issues with it and changing the base schema. Vendor updates will break if you do that, because their code changes are going to come through with an ALTER TABLE script, not ALTER VIEW. Yes, you could reverse engineer it each time, but that is ugly. At least, I'm assuming this is vendor app, because of this statement:
I'm dealing with source data where the table structure is fixed, and TPTB want this data real time, so moving to a data warehouse isn't an option at the moment.
... heh, so, sorry I wasn't following your chain of thought there. I shall blame it on ... umm... umm... Look, an elephant! *POOF*

Yes, it's a vendor app - but I know the insides of its database and update process pretty well. Managing upgrades with this type of an alteration in are a reasonable trade-off for getting better data out to my user community.

And that elephant over there? Isn't that my assumptions exploding into a pile of mist because Jeff lobbed a pork chop at them?

-Ki

Jeff Moden SSC Guru Points: 1003873 More actions · Answer 2

Evil Kraig F (11/1/2011)
Jeff Moden (10/31/2011)
I was simply going to add the date, EmpID, and 3 calculated columns to the detail table and hide them from the app using a passthrough view with the same name as the original table. Of course, the name of the detail table would change and the app would do all its work through the passthrough view being none the wiser to the underlying change that was made. And, yes... there'd be a key change to the indexing... the date would become the new clustered index column.
Now I grok. I was going the other direction, expecting you to use a passthrough for the new process, not sidestepping app issues with it and changing the base schema. Vendor updates will break if you do that, because their code changes are going to come through with an ALTER TABLE script, not ALTER VIEW. Yes, you could reverse engineer it each time, but that is ugly. At least, I'm assuming this is vendor app, because of this statement:
I'm dealing with source data where the table structure is fixed, and TPTB want this data real time, so moving to a data warehouse isn't an option at the moment.
... heh, so, sorry I wasn't following your chain of thought there. I shall blame it on ... umm... umm... Look, an elephant! *POOF*

Yep... I've been through vendor app updates, as well. Ask yourself... how often does the vendor change the table and is it worth the little bit of hassle to use the view to copy a months worth of data to a "properly named" table until the vendor is done with the update and then drop the table and reinstantiate the view?

Of course, what should really happen is that once you've proven the performance gain to the vendor, you should have a high velocity porkchop session with them until they change their app.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1003873 More actions · Answer 3

Kiara (11/1/2011)
Oh, I'm game to experiment. I'm not afraid of my test environment in the slightest, and I won't move something into production unless I'm convinced it will work well.

THAT's the spirit. And, no, I agree... absolutely no changes in production until full regression testing has been done on the app with the new structure AND management buys into the change AND it's well documented as to what must happen if the app vendor wants to make a change.

Before you do anything else, let me first prove that the changes will make the massive performance improvements that I think they will (no reason for any change if that doesn't work) and then we can work on trying to prove that it doesn't blow the app up. In the process, we might also be able to partion this large table. You said it's on Standard Edition, correct?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Kiara SSCarpal Tunnel Points: 4185 More actions · Answer 4

Jeff Moden (11/1/2011)
Kiara (11/1/2011)
Oh, I'm game to experiment. I'm not afraid of my test environment in the slightest, and I won't move something into production unless I'm convinced it will work well.
THAT's the spirit. And, no, I agree... absolutely no changes in production until full regression testing has been done on the app with the new structure AND management buys into the change AND it's well documented as to what must happen if the app vendor wants to make a change.
Before you do anything else, let me first prove that the changes will make the massive performance improvements that I think they will (no reason for any change if that doesn't work) and then we can work on trying to prove that it doesn't blow the app up. In the process, we might also be able to partion this large table. You said it's on Standard Edition, correct?

*chuckle* If I were afraid to test changes, I'd be sitting on an almost non-responsive SQL server - since that's what I inherited when I started here. It's been a fascinating process - one that can happily be documented even just by the sheer reduction in recurring deadlocks. (I'm down to one major recurring one from a crazy number of them - and that one is part of an overnight job that needs some major redesign. I just haven't tackled that yet.)

Standard Edition, yes. For now. Unless I can convince my boss that I need Enterprise Edition for the holidays. (Change Data Tracking, oh how I dream of you...)

-Ki

Jeff Moden SSC Guru Points: 1003873 More actions · Answer 5

I just got done with some testing of my theory that having temporal based and persisted calculated columns in the detail would help this problem. Sure, I get an Index Seek with it, but there was no performance gain... it's still summing and grouping more than 8 million rows of data for the example given.

So, the schema changes I was going to suggest aren't worth a hoot for query performance in this case.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 6

Jeff Moden (11/2/2011)
I just got done with some testing of my theory that having temporal based and persisted calculated columns in the detail would help this problem. Sure, I get an Index Seek with it, but there was no performance gain... it's still summing and grouping more than 8 million rows of data for the example given.
So, the schema changes I was going to suggest aren't worth a hoot for query performance in this case.

Likewise, adding empid and weekEnding to the child table and clustering over them offers little improvement in performance - you've still gotta hoik out 4M rows and aggregate them and hoiking out costs over 90% of the total. I'd recommend plan B - a batch-updated new-and-changed only rollup table.

You could lose the latency by calculating a "safe" historical date before which rows are unlikely to be edited, and using this as a cutoff for the batch update. Query the rollup and UNION with rows from the live table which are after the cutoff.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Kiara SSCarpal Tunnel Points: 4185 More actions · Answer 7

Jeff Moden (11/2/2011)
I just got done with some testing of my theory that having temporal based and persisted calculated columns in the detail would help this problem. Sure, I get an Index Seek with it, but there was no performance gain... it's still summing and grouping more than 8 million rows of data for the example given.
So, the schema changes I was going to suggest aren't worth a hoot for query performance in this case.

That's ok - I still have a new tool in my bag o' tricks, and I really, really appreciate the assistance.

-Ki

Kiara SSCarpal Tunnel Points: 4185 More actions · Answer 8

ChrisM@Work (11/2/2011)
Jeff Moden (11/2/2011)
I just got done with some testing of my theory that having temporal based and persisted calculated columns in the detail would help this problem. Sure, I get an Index Seek with it, but there was no performance gain... it's still summing and grouping more than 8 million rows of data for the example given.
So, the schema changes I was going to suggest aren't worth a hoot for query performance in this case.
Likewise, adding empid and weekEnding to the child table and clustering over them offers little improvement in performance - you've still gotta hoik out 4M rows and aggregate them and hoiking out costs over 90% of the total. I'd recommend plan B - a batch-updated new-and-changed only rollup table.
You could lose the latency by calculating a "safe" historical date before which rows are unlikely to be edited, and using this as a cutoff for the batch update. Query the rollup and UNION with rows from the live table which are after the cutoff.

Thanks much. Fortunately, for this particular dataset, that "safe" historical date is fairly easy to figure - I can do it as a set period of time prior to the current date to make it effectively a rolling cutoff, as well.

As I said to Jeff, your assistance with this has been invaluable, and is much appreciated.

-Ki

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 9

Kiara (11/2/2011)
ChrisM@Work (11/2/2011)
Jeff Moden (11/2/2011)
I just got done with some testing of my theory that having temporal based and persisted calculated columns in the detail would help this problem. Sure, I get an Index Seek with it, but there was no performance gain... it's still summing and grouping more than 8 million rows of data for the example given.
So, the schema changes I was going to suggest aren't worth a hoot for query performance in this case.
Likewise, adding empid and weekEnding to the child table and clustering over them offers little improvement in performance - you've still gotta hoik out 4M rows and aggregate them and hoiking out costs over 90% of the total. I'd recommend plan B - a batch-updated new-and-changed only rollup table.
You could lose the latency by calculating a "safe" historical date before which rows are unlikely to be edited, and using this as a cutoff for the batch update. Query the rollup and UNION with rows from the live table which are after the cutoff.
Thanks much. Fortunately, for this particular dataset, that "safe" historical date is fairly easy to figure - I can do it as a set period of time prior to the current date to make it effectively a rolling cutoff, as well.
As I said to Jeff, your assistance with this has been invaluable, and is much appreciated.

Welcome - thanks for posting an interesting challenge!

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden