Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

How to perform Incremental loads over a set of aggregated tables Expand / Collapse
Author
Message
Posted Friday, August 3, 2012 2:19 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: 2 days ago @ 3:46 AM
Points: 115, Visits: 472
Hi all,

My doubts may be are stupids but I want to discuss with you my current task.

To explain the situation I've created the following diagram:



The system consist of a Data Hub, basically a set of tables with some numerical information and contexts. The Data Hub was initially loaded.
The data flow is as follow:
1. Delta data arrive: apart from the initial load, every day new data is comming.
2. A query with some aggregations is performed.
3. A second query with aggragations is performed.
4. The customer table is populated with aggregated information from the Table B and contextualize with information from the Data Hub (5).

For this, I've created several dataflows. The first dataflow execute the query to fill the Table A, the second to fill the Table B and so on.

My questions are the followings:
- My approach is ok for the first loading, but what happen with the aggregations when new data is comming?
- Do I have to erase the content from Table A and B and run the aggregations again?

For instance, if I want to record the average of purchases that a particular customer makes during a month, I have to keep this value updated. Maybe as I receive the data in a daly-basis I can group this data by day and then the reporting guy could make this avarage stuff in his queries.

Any comment or question will be appreciated

Kind Regards,


Paul Hernández
http://hernandezpaul.wordpress.com/
https://twitter.com/paul_eng
Post #1339674
Posted Friday, August 3, 2012 4:02 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: 2 days ago @ 3:46 AM
Points: 115, Visits: 472
Ok, I think my initial explanation was not so gut. I will try to clarify the scenario.

Supose that in the Data Hub I have the following table with information about customer purchases:

Customer Id Article Deparment Amount Date of Purchase
545 Shoes Fashion 30$ 01.07.2012
545 Book Library 20$ 01.07.2012
545 CD Library 20$ 01.07.2012

From this table I load my first table with Aggregated data called Department Aggregations

Customer Id Department Amount Date of Purchase
545 Fashion 30$ 01.07.2012
545 Library 40$ 01.07.2012

I query the table above and fill a second table with aggregated data

Customer Id Total Articles Total Deparmets Total Amount Average Shopping Days
545 3 2$ 70$ 1

Finaly I load the Customer table:

Customer Id Telephone Email City Total expenses
545 654654 cust@exp.com ABC 70$

My question is: When our dear customer, Mr 545, makes a new purchase, what is the less expensive option in order to update all the tables? I think that an incremental approach would be good because the actual Fact_Details tables contains around 165 millions of records and truncate this table and loaded again is too expensives, but I don't know how to updates aggregations.

Any comment will be appreciated

Kind Regards


Paul Hernández
http://hernandezpaul.wordpress.com/
https://twitter.com/paul_eng
Post #1339711
Posted Wednesday, June 26, 2013 5:25 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, March 27, 2014 2:27 PM
Points: 3, Visits: 450
Hello Paul I am attempting to develop a similar solution, how did you resolve this?

Regards,
Post #1467575
Posted Wednesday, June 26, 2013 5:37 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: 2 days ago @ 3:46 AM
Points: 115, Visits: 472
Hi SQLSponge,

I recently read an article that proposed one of the best solutions I've ever seen. It is a little bit complex (technically speaking) but I suggest you to read it and take your time to understanding it:

Incremental Data Loading Using CDC

I also suggest you to read the two previous articles of these series, you can find the links in the article above.

If this solution does not persuade you, please send me the design of the tables, specially the granularity and the type of the aggregations (AVG, COUNT, SUM, etc.)

Kinde Regards,



Paul Hernández
http://hernandezpaul.wordpress.com/
https://twitter.com/paul_eng
Post #1467582
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse