Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

How to create staging table to handle incremental load Expand / Collapse
Author
Message
Posted Thursday, January 2, 2014 2:24 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Saturday, August 23, 2014 10:18 AM
Points: 53, Visits: 125
Hi Team,

We are designing a Staging layer to handle incremental load. I want to start with a simple scenario to design the staging.

In the source database There are two tables ex, tbl_Department, tbl_Employee. Both this table is loading a single table at destination database ex, tbl_EmployeRecord.

The query which is loading tbl_EmployeRecord is, SELECT EMPID,EMPNAME,DEPTNAME FROM tbl_Department D INNER JOIN tbl_Employee E ON D.DEPARTMENTID=E.DEPARTMENTID.

Now, we need to identify incremental load in tbl_Department, tbl_Employee and store it in staging and load only the incremental load to the destination.

The columns of the tables are,

tbl_Department : DEPARTMENTID,DEPTNAME

tbl_Employee : EMPID,EMPNAME,DEPARTMENTID

tbl_EmployeRecord : EMPID,EMPNAME,DEPTNAME

Kindly suggest how to design the staging for this to handle Insert, Update and Delete.

Regards
Jim
Post #1527009
Posted Thursday, January 2, 2014 3:12 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, August 25, 2014 2:35 PM
Points: 20, Visits: 140
use "Merge" operation for incremental update
Post #1527305
Posted Thursday, January 2, 2014 11:36 PM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Saturday, August 23, 2014 10:18 AM
Points: 53, Visits: 125
Hi,

In my scenario, I'm joining two tables and loading to single table in target. Will merge will help? Please suggest. If not any other work around is appreciated.

Regards
Jim
Post #1527353
Posted Friday, January 3, 2014 6:16 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Today @ 9:11 AM
Points: 1,232, Visits: 6,641
Jim1234 (1/2/2014)
Hi Team,

We are designing a Staging layer to handle incremental load. I want to start with a simple scenario to design the staging.

In the source database There are two tables ex, tbl_Department, tbl_Employee. Both this table is loading a single table at destination database ex, tbl_EmployeRecord.

The query which is loading tbl_EmployeRecord is, SELECT EMPID,EMPNAME,DEPTNAME FROM tbl_Department D INNER JOIN tbl_Employee E ON D.DEPARTMENTID=E.DEPARTMENTID.

Now, we need to identify incremental load in tbl_Department, tbl_Employee and store it in staging and load only the incremental load to the destination.

The columns of the tables are,

tbl_Department : DEPARTMENTID,DEPTNAME

tbl_Employee : EMPID,EMPNAME,DEPARTMENTID

tbl_EmployeRecord : EMPID,EMPNAME,DEPTNAME

Kindly suggest how to design the staging for this to handle Insert, Update and Delete.

Regards
Jim


tbl_Department - I would assume that this should be done first, as you are probably enforcing referential integrity (no employee can be assigned to a Department that doesn't exist)

tbl_Employee - then process this one

tbl_EmployeeRecord - table? I think this should be just a view off the first 2 tables.

using merge as suggested.
but with something like this, I would think tracking changes would also be important.
so more thought about the overall design might be needed, or maybe I'm just trying to read to much into this.
headcount at any point in time, or knowing when and who (or what) made the change would be questions I could see coming up
Post #1527478
Posted Friday, January 3, 2014 7:13 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Tuesday, June 17, 2014 8:29 AM
Points: 23, Visits: 16
Hi Jim,

We are using something similar and MERGE is just perfect for this as mentioned. In our solution we are creating a work table and load the data in it. This table can have referential constraints to avoid wrong data come in (since you already have that you can skip this). In the next phase you can use the MERGE to load the data to your final table in one shot. Lastly you may want to clear your table with a truncate or so.

Regarding the MERGE you can use like this:

MERGE tbl_EmployeRecord AS destination
USING (SELECT EMPID,EMPNAME,DEPTNAME FROM tbl_Department D INNER JOIN tbl_Employee E ON D.DEPARTMENTID=E.DEPARTMENTID) AS source
ON destination.EMPID = source.EMPID
WHEN NOT MATCHED THEN
INSERT (EMPID, EMPNAME, DEPTNAME)
VALUES (source.EMPID, source.EMPNAME, source.DEPTNAME)
WHEN MATCHED THEN
UPDATE ...
;

I have tried on my sample DB and worked perfectly, hope this helps . Update is not implemented, but you can extend it based on your requirements.

Regards,
Tibor
Post #1527521
Posted Friday, January 3, 2014 7:58 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Today @ 9:11 AM
Points: 1,232, Visits: 6,641
Joining the 2 tables into 1, you lose the data normalization that was done.
Do as you wish, it's your choice.

At some point, depending on what is done, you may lose keys to the castle.
A key is a KEY, where Names (even a Department), can change and overlap.
Keys in the fact tables, and then pick up the descriptions as needed.
It may not happen with this example, but if you start out on this path, it will eventually happen.
Post #1527550
Posted Sunday, January 5, 2014 11:57 PM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Saturday, August 23, 2014 10:18 AM
Points: 53, Visits: 125
Thanks Tibor. Few questions are coming in my mind. The volume of data in source (for actual tables) is too high. So we are planning for a staging layer in which only those records which got Inserted/updated/deleted will be present. From the staging we need to load the target by using the join query.

The challange is how to design the staging table. Will it be exactly like the source tables or staging will have only one table like the target. How to handle insert/update/delete for the incremental records.

Feel free to let me know if you need more details. Thanks.

Jim
Post #1527975
Posted Monday, January 6, 2014 12:34 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Tuesday, June 17, 2014 8:29 AM
Points: 23, Visits: 16
Well, you need to do some data transformation somewhere, let's say between the source and staging or staging and destination. To choose between these two you need to check your process, in my point of view the question is how you fill the staging table.
If you fill the staging table with an SSIS job probably you want to do the ETL operations right in the job so your staging would be similar than the destination table. But if you are using a trigger or an action from the output clause of a query you may want to execute this as fast as you can, in this case I would load a staging table which is similar to the source.
Anyway, if you are using the MERGE statement with a proper index on the merged tables that can be blazing fast. And you even don't need to know if the records were deleted, inserted or updated because those will merged.
Post #1527983
Posted Monday, January 6, 2014 3:43 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Wednesday, July 30, 2014 10:34 AM
Points: 386, Visits: 624
If the new and changed records can be easily identified, I would side on the staging tables having the same structure as the source data. If not, and you effectively have to compare today's snapshot with the existing data, I would side with a denormalised table and then use MERGE/EXCEPT/INTERSECT to identify the new and changed records.

This is known as Slowly Changing Dimension data. Are you keeping history (type 2 SCD) or overwriting changed records (type 1 SCD).

If you have a lot of this to do, you might want to look at the Data Vault model. Briefly, DV breaks data into three components: Hubs - The entity identifiers, Links - the relationshipts between Hubs, and Satellites - the interesting stuff about the Hubs. In your example EmployeeID and DepartmentID will be HUBS, EmployeeID-DepartmentID is a LINK, and EmpName, dob, paygrade etc go into a SATELLITE as does DeptName, Dept Location.

Aaron
Post #1528020
Posted Monday, January 6, 2014 7:14 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Saturday, August 23, 2014 10:18 AM
Points: 53, Visits: 125
Hi Aaron,

We are getting the incremental load from source table. So we are maintaining the same structure as the source tables.

In this senario, insert into tbl_Department or tbl_Employee or both should not be a problem to load into the single target table tbl_EmployeRecord by using the join query.

Update and Delete will be the challange. Kindly suggest if we have only the staging tables which are like the source tables will suffice or any other tables for ex, to identify the dependent columns for the driving table or any other tables will help.

Please suggest. Any sample links or documents or table structures will help a lot!!
Post #1528085
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse