Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Merge,Look Up,SCD which one is good for millions of records Expand / Collapse
Author
Message
Posted Monday, March 11, 2013 7:41 AM


SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Sunday, December 08, 2013 10:45 PM
Points: 27, Visits: 265
Hi,

I am new to SSIS.
I have to load 50 millions of records(full load) into a SQL Server table.
After the full load I have to create another package for incremental load based on Hash value.
Please suggest me one of the below mentioned approach to get good performance.

-Merge command
-Look Up
-SCD

Thanks in advance
Post #1429268
Posted Monday, March 11, 2013 3:12 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Thursday, April 10, 2014 2:17 PM
Points: 124, Visits: 485
Use the Merge, of those 3.
Post #1429491
Posted Tuesday, March 12, 2013 3:04 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Thursday, April 10, 2014 5:48 AM
Points: 107, Visits: 436
Hi Rocky´s

I totally agree with that, the use of the merge command is for me the better option. I designed several ETL packages the last year and I tried the 3 options, and at the end I always used the Merge command.

Here I leave you a code snippet from my personal library. This code makes the following:
- Log the start time and name of the package in a custom log table with an identity column
- Merge into a target table the results of a query (as a source). You can adapt the source query to your needs
- The columns that match the merge condition are updated, if not, they are inserted.
- At the end the finish time is log together with the row count of updated and inserted records.

-- I use a table to log the start and finish time, package name, number of rows inserted and number of rows updated
insert into SSIS_Log
(PackageName,DTStart)
values
(?,GetDate()); -- Map the System variable "PackageName"

declare
@mergeResultsTable table (MergeAction VARCHAR(20));

declare
@insertCount int,
@updateCount int;


MERGE Table_A AS TARGET
USING (
SELECT
Col1
,Col2
.....
,ColN
FROM Table_B
) AS SOURCE
ON (TARGET.Key1 = SOURCE.Key1 and TARGET.Key_2 = SOURCE.Key_2)
-- Update
WHEN MATCHED THEN UPDATE SET
TARGET.Col1= SOURCE.Col1
,TARGET.Col2= SOURCE.Col2
,TARGET.UpdateDT=GetDate() -- I create a UpdateDT column in the target table to store the update datetime
-- etc
--Insert
WHEN NOT MATCHED BY TARGET THEN INSERT
(
Col1
,Col2
,CreateDT -- I create a CreateDT column in the target table to store the insert (or creation) datetime
-- etc
)
VALUES
( SOURCE.Col1
,SOURCE.Col2
,GetDate()
-- etc
)

OUTPUT $action into @mergeResultsTable;

update SSIS_Log
set UpdatedCount = -- Log the count of updated rows
(SELECT COUNT(*)
FROM @mergeResultsTable
WHERE MergeAction = 'UPDATE')
, InsertedCount = -- Log the count of inserted rows
(SELECT COUNT(*)
FROM @mergeResultsTable
WHERE MergeAction = 'INSERT')
,DTFinish = GetDate()
where Idx =
(
select MAX(Idx)
from SSIS_Log
);


If your target table should be treated as a SCD I strongly recommend to you the following article by Adam Aspin:
Slowly changing dimensions using T-SQL MERGE

If you have further question just write me.

Kind Regards


Paul Hernández
http://hernandezpaul.wordpress.com/
https://twitter.com/paul_eng
Post #1429621
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse