SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


How to perform Incremental loads over a set of aggregated tables


How to perform Incremental loads over a set of aggregated tables

Author
Message
Paul Hernández
Paul Hernández
SSC-Addicted
SSC-Addicted (414 reputation)SSC-Addicted (414 reputation)SSC-Addicted (414 reputation)SSC-Addicted (414 reputation)SSC-Addicted (414 reputation)SSC-Addicted (414 reputation)SSC-Addicted (414 reputation)SSC-Addicted (414 reputation)

Group: General Forum Members
Points: 414 Visits: 661
Hi all,

My doubts may be are stupids but I want to discuss with you my current task.

To explain the situation I've created the following diagram:



The system consist of a Data Hub, basically a set of tables with some numerical information and contexts. The Data Hub was initially loaded.
The data flow is as follow:
1. Delta data arrive: apart from the initial load, every day new data is comming.
2. A query with some aggregations is performed.
3. A second query with aggragations is performed.
4. The customer table is populated with aggregated information from the Table B and contextualize with information from the Data Hub (5).

For this, I've created several dataflows. The first dataflow execute the query to fill the Table A, the second to fill the Table B and so on.

My questions are the followings:
- My approach is ok for the first loading, but what happen with the aggregations when new data is comming?
- Do I have to erase the content from Table A and B and run the aggregations again?

For instance, if I want to record the average of purchases that a particular customer makes during a month, I have to keep this value updated. Maybe as I receive the data in a daly-basis I can group this data by day and then the reporting guy could make this avarage stuff in his queries.

Any comment or question will be appreciated

Kind Regards,

Paul Hernández
Paul Hernández
Paul Hernández
SSC-Addicted
SSC-Addicted (414 reputation)SSC-Addicted (414 reputation)SSC-Addicted (414 reputation)SSC-Addicted (414 reputation)SSC-Addicted (414 reputation)SSC-Addicted (414 reputation)SSC-Addicted (414 reputation)SSC-Addicted (414 reputation)

Group: General Forum Members
Points: 414 Visits: 661
Ok, I think my initial explanation was not so gut. I will try to clarify the scenario.

Supose that in the Data Hub I have the following table with information about customer purchases:

Customer Id Article Deparment Amount Date of Purchase
545 Shoes Fashion 30$ 01.07.2012
545 Book Library 20$ 01.07.2012
545 CD Library 20$ 01.07.2012

From this table I load my first table with Aggregated data called Department Aggregations

Customer Id Department Amount Date of Purchase
545 Fashion 30$ 01.07.2012
545 Library 40$ 01.07.2012

I query the table above and fill a second table with aggregated data

Customer Id Total Articles Total Deparmets Total Amount Average Shopping Days
545 3 2$ 70$ 1

Finaly I load the Customer table:

Customer Id Telephone Email City Total expenses
545 654654 cust@exp.com ABC 70$

My question is: When our dear customer, Mr 545, makes a new purchase, what is the less expensive option in order to update all the tables? I think that an incremental approach would be good because the actual Fact_Details tables contains around 165 millions of records and truncate this table and loaded again is too expensives, but I don't know how to updates aggregations.

Any comment will be appreciated

Kind Regards

Paul Hernández
SQLSponge
SQLSponge
Forum Newbie
Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)

Group: General Forum Members
Points: 9 Visits: 465
Hello Paul I am attempting to develop a similar solution, how did you resolve this?

Regards,
Paul Hernández
Paul Hernández
SSC-Addicted
SSC-Addicted (414 reputation)SSC-Addicted (414 reputation)SSC-Addicted (414 reputation)SSC-Addicted (414 reputation)SSC-Addicted (414 reputation)SSC-Addicted (414 reputation)SSC-Addicted (414 reputation)SSC-Addicted (414 reputation)

Group: General Forum Members
Points: 414 Visits: 661
Hi SQLSponge,

I recently read an article that proposed one of the best solutions I've ever seen. It is a little bit complex (technically speaking) but I suggest you to read it and take your time to understanding it:

Incremental Data Loading Using CDC

I also suggest you to read the two previous articles of these series, you can find the links in the article above.

If this solution does not persuade you, please send me the design of the tables, specially the granularity and the type of the aggregations (AVG, COUNT, SUM, etc.)

Kinde Regards,

Paul Hernández
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search