SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


How to create staging table to handle incremental load


How to create staging table to handle incremental load

Author
Message
Jim1234
Jim1234
SSC-Enthusiastic
SSC-Enthusiastic (119 reputation)SSC-Enthusiastic (119 reputation)SSC-Enthusiastic (119 reputation)SSC-Enthusiastic (119 reputation)SSC-Enthusiastic (119 reputation)SSC-Enthusiastic (119 reputation)SSC-Enthusiastic (119 reputation)SSC-Enthusiastic (119 reputation)

Group: General Forum Members
Points: 119 Visits: 139
Hi Team,

We are designing a Staging layer to handle incremental load. I want to start with a simple scenario to design the staging.

In the source database There are two tables ex, tbl_Department, tbl_Employee. Both this table is loading a single table at destination database ex, tbl_EmployeRecord.

The query which is loading tbl_EmployeRecord is, SELECT EMPID,EMPNAME,DEPTNAME FROM tbl_Department D INNER JOIN tbl_Employee E ON D.DEPARTMENTID=E.DEPARTMENTID.

Now, we need to identify incremental load in tbl_Department, tbl_Employee and store it in staging and load only the incremental load to the destination.

The columns of the tables are,

tbl_Department : DEPARTMENTID,DEPTNAME

tbl_Employee : EMPID,EMPNAME,DEPARTMENTID

tbl_EmployeRecord : EMPID,EMPNAME,DEPTNAME

Kindly suggest how to design the staging for this to handle Insert, Update and Delete.

Regards
Jim
john.kroneberger
john.kroneberger
SSC Rookie
SSC Rookie (40 reputation)SSC Rookie (40 reputation)SSC Rookie (40 reputation)SSC Rookie (40 reputation)SSC Rookie (40 reputation)SSC Rookie (40 reputation)SSC Rookie (40 reputation)SSC Rookie (40 reputation)

Group: General Forum Members
Points: 40 Visits: 298
use "Merge" operation for incremental update
Jim1234
Jim1234
SSC-Enthusiastic
SSC-Enthusiastic (119 reputation)SSC-Enthusiastic (119 reputation)SSC-Enthusiastic (119 reputation)SSC-Enthusiastic (119 reputation)SSC-Enthusiastic (119 reputation)SSC-Enthusiastic (119 reputation)SSC-Enthusiastic (119 reputation)SSC-Enthusiastic (119 reputation)

Group: General Forum Members
Points: 119 Visits: 139
Hi,

In my scenario, I'm joining two tables and loading to single table in target. Will merge will help? Please suggest. If not any other work around is appreciated.

Regards
Jim
Greg Edwards-268690
Greg Edwards-268690
SSCarpal Tunnel
SSCarpal Tunnel (4.4K reputation)SSCarpal Tunnel (4.4K reputation)SSCarpal Tunnel (4.4K reputation)SSCarpal Tunnel (4.4K reputation)SSCarpal Tunnel (4.4K reputation)SSCarpal Tunnel (4.4K reputation)SSCarpal Tunnel (4.4K reputation)SSCarpal Tunnel (4.4K reputation)

Group: General Forum Members
Points: 4442 Visits: 8601
Jim1234 (1/2/2014)
Hi Team,

We are designing a Staging layer to handle incremental load. I want to start with a simple scenario to design the staging.

In the source database There are two tables ex, tbl_Department, tbl_Employee. Both this table is loading a single table at destination database ex, tbl_EmployeRecord.

The query which is loading tbl_EmployeRecord is, SELECT EMPID,EMPNAME,DEPTNAME FROM tbl_Department D INNER JOIN tbl_Employee E ON D.DEPARTMENTID=E.DEPARTMENTID.

Now, we need to identify incremental load in tbl_Department, tbl_Employee and store it in staging and load only the incremental load to the destination.

The columns of the tables are,

tbl_Department : DEPARTMENTID,DEPTNAME

tbl_Employee : EMPID,EMPNAME,DEPARTMENTID

tbl_EmployeRecord : EMPID,EMPNAME,DEPTNAME

Kindly suggest how to design the staging for this to handle Insert, Update and Delete.

Regards
Jim


tbl_Department - I would assume that this should be done first, as you are probably enforcing referential integrity (no employee can be assigned to a Department that doesn't exist)

tbl_Employee - then process this one

tbl_EmployeeRecord - table? I think this should be just a view off the first 2 tables.

using merge as suggested.
but with something like this, I would think tracking changes would also be important.
so more thought about the overall design might be needed, or maybe I'm just trying to read to much into this.
headcount at any point in time, or knowing when and who (or what) made the change would be questions I could see coming up
tompatib
tompatib
SSC Rookie
SSC Rookie (47 reputation)SSC Rookie (47 reputation)SSC Rookie (47 reputation)SSC Rookie (47 reputation)SSC Rookie (47 reputation)SSC Rookie (47 reputation)SSC Rookie (47 reputation)SSC Rookie (47 reputation)

Group: General Forum Members
Points: 47 Visits: 16
Hi Jim,

We are using something similar and MERGE is just perfect for this as mentioned. In our solution we are creating a work table and load the data in it. This table can have referential constraints to avoid wrong data come in (since you already have that you can skip this). In the next phase you can use the MERGE to load the data to your final table in one shot. Lastly you may want to clear your table with a truncate or so.

Regarding the MERGE you can use like this:

MERGE tbl_EmployeRecord AS destination
USING (SELECT EMPID,EMPNAME,DEPTNAME FROM tbl_Department D INNER JOIN tbl_Employee E ON D.DEPARTMENTID=E.DEPARTMENTID) AS source
ON destination.EMPID = source.EMPID
WHEN NOT MATCHED THEN
INSERT (EMPID, EMPNAME, DEPTNAME)
VALUES (source.EMPID, source.EMPNAME, source.DEPTNAME)
WHEN MATCHED THEN
UPDATE ...
;

I have tried on my sample DB and worked perfectly, hope this helps :-). Update is not implemented, but you can extend it based on your requirements.

Regards,
Tibor
Greg Edwards-268690
Greg Edwards-268690
SSCarpal Tunnel
SSCarpal Tunnel (4.4K reputation)SSCarpal Tunnel (4.4K reputation)SSCarpal Tunnel (4.4K reputation)SSCarpal Tunnel (4.4K reputation)SSCarpal Tunnel (4.4K reputation)SSCarpal Tunnel (4.4K reputation)SSCarpal Tunnel (4.4K reputation)SSCarpal Tunnel (4.4K reputation)

Group: General Forum Members
Points: 4442 Visits: 8601
Joining the 2 tables into 1, you lose the data normalization that was done.
Do as you wish, it's your choice.

At some point, depending on what is done, you may lose keys to the castle.
A key is a KEY, where Names (even a Department), can change and overlap.
Keys in the fact tables, and then pick up the descriptions as needed.
It may not happen with this example, but if you start out on this path, it will eventually happen.
Jim1234
Jim1234
SSC-Enthusiastic
SSC-Enthusiastic (119 reputation)SSC-Enthusiastic (119 reputation)SSC-Enthusiastic (119 reputation)SSC-Enthusiastic (119 reputation)SSC-Enthusiastic (119 reputation)SSC-Enthusiastic (119 reputation)SSC-Enthusiastic (119 reputation)SSC-Enthusiastic (119 reputation)

Group: General Forum Members
Points: 119 Visits: 139
Thanks Tibor. Few questions are coming in my mind. The volume of data in source (for actual tables) is too high. So we are planning for a staging layer in which only those records which got Inserted/updated/deleted will be present. From the staging we need to load the target by using the join query.

The challange is how to design the staging table. Will it be exactly like the source tables or staging will have only one table like the target. How to handle insert/update/delete for the incremental records.

Feel free to let me know if you need more details. Thanks.

Jim
tompatib
tompatib
SSC Rookie
SSC Rookie (47 reputation)SSC Rookie (47 reputation)SSC Rookie (47 reputation)SSC Rookie (47 reputation)SSC Rookie (47 reputation)SSC Rookie (47 reputation)SSC Rookie (47 reputation)SSC Rookie (47 reputation)

Group: General Forum Members
Points: 47 Visits: 16
Well, you need to do some data transformation somewhere, let's say between the source and staging or staging and destination. To choose between these two you need to check your process, in my point of view the question is how you fill the staging table.
If you fill the staging table with an SSIS job probably you want to do the ETL operations right in the job so your staging would be similar than the destination table. But if you are using a trigger or an action from the output clause of a query you may want to execute this as fast as you can, in this case I would load a staging table which is similar to the source.
Anyway, if you are using the MERGE statement with a proper index on the merged tables that can be blazing fast. And you even don't need to know if the records were deleted, inserted or updated because those will merged.
aaron.reese
aaron.reese
SSCrazy
SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)

Group: General Forum Members
Points: 2531 Visits: 907
If the new and changed records can be easily identified, I would side on the staging tables having the same structure as the source data. If not, and you effectively have to compare today's snapshot with the existing data, I would side with a denormalised table and then use MERGE/EXCEPT/INTERSECT to identify the new and changed records.

This is known as Slowly Changing Dimension data. Are you keeping history (type 2 SCD) or overwriting changed records (type 1 SCD).

If you have a lot of this to do, you might want to look at the Data Vault model. Briefly, DV breaks data into three components: Hubs - The entity identifiers, Links - the relationshipts between Hubs, and Satellites - the interesting stuff about the Hubs. In your example EmployeeID and DepartmentID will be HUBS, EmployeeID-DepartmentID is a LINK, and EmpName, dob, paygrade etc go into a SATELLITE as does DeptName, Dept Location.

Aaron
Jim1234
Jim1234
SSC-Enthusiastic
SSC-Enthusiastic (119 reputation)SSC-Enthusiastic (119 reputation)SSC-Enthusiastic (119 reputation)SSC-Enthusiastic (119 reputation)SSC-Enthusiastic (119 reputation)SSC-Enthusiastic (119 reputation)SSC-Enthusiastic (119 reputation)SSC-Enthusiastic (119 reputation)

Group: General Forum Members
Points: 119 Visits: 139
Hi Aaron,

We are getting the incremental load from source table. So we are maintaining the same structure as the source tables.

In this senario, insert into tbl_Department or tbl_Employee or both should not be a problem to load into the single target table tbl_EmployeRecord by using the join query.

Update and Delete will be the challange. Kindly suggest if we have only the staging tables which are like the source tables will suffice or any other tables for ex, to identify the dependent columns for the driving table or any other tables will help.

Please suggest. Any sample links or documents or table structures will help a lot!!
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search