My doubts may be are stupids but I want to discuss with you my current task.
To explain the situation I've created the following diagram:
The system consist of a Data Hub, basically a set of tables with some numerical information and contexts. The Data Hub was initially loaded.
The data flow is as follow:
1. Delta data arrive: apart from the initial load, every day new data is comming.
2. A query with some aggregations is performed.
3. A second query with aggragations is performed.
4. The customer table is populated with aggregated information from the Table B and contextualize with information from the Data Hub (5).
For this, I've created several dataflows. The first dataflow execute the query to fill the Table A, the second to fill the Table B and so on.
My questions are the followings:
- My approach is ok for the first loading, but what happen with the aggregations when new data is comming?
- Do I have to erase the content from Table A and B and run the aggregations again?
For instance, if I want to record the average of purchases that a particular customer makes during a month, I have to keep this value updated. Maybe as I receive the data in a daly-basis I can group this data by day and then the reporting guy could make this avarage stuff in his queries.
Any comment or question will be appreciated