Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««123»»

Loading change data from AS400 (or other source systems) Expand / Collapse
Author
Message
Posted Saturday, December 08, 2012 2:44 PM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Thursday, April 03, 2014 6:23 AM
Points: 1,135, Visits: 1,149
shannonjk (12/7/2012)
I am aware of how to query data based on the entry dates and change dates. If it were a simple matter of throwing together some SQL then I wouldn't have needed to post this question . Perhaps I should have stated my original question better though

I am concerned with the HOW on this; as in what data flow items do I use to achieve monitoring the absent keys, newer entry dates, and altered change dates? Or if there is a good link that explains this methodology?

Since you already know how to query your source data, I don't think it's an overly complex task to update your warehouse -- perhaps a little more than throwing together some T-ASQL. You mention that you have 75+ source tables, so it's not a little thing either. But once you get the idea for deletes, updates and additions you apply it to all of your other tables.

Break it down into parts:

For your deletes from the warehouse:
You don't have to monitor your deleted business keys, just determine what ones are present in your DW each night that are no longer in your source data. These are the records to remove from your Fact tables in the DW. You can do this using a SSIS lookup transformation. If you don''t find the business key, route that down the path to a SQL Command to log and delete from your fact table.

For deletes, remove records from your fact tables before your dimension tables because of the FK referential integrity on your tables.

For additions:
For the same FK reasons, do your additions to your dimension tables first, then your fact tables. You know what records need to be inserted into the DW because you have the source system entry dates since last night (or the last time the ETL finished). You can basically use your current ETL packages, just change the source tranformations to add the WHERE EntryDate > @LastETLRunDate

For updates:
Do your changes after your additions so that any new records in the dimension tables will be present if you have to update your fact table to point to those new dimension records.

You know what records potentially need to be updated in the DW by the change date in your source system. You need to SELECT these records and compare against the fields in the DW. You can do this simply in a conditional split, any records that have changed get routed down the update path to a SQL command to update those records. Take a look at: http://bennyaustin.wordpress.com/2010/05/29/alternatives-to-ssis-scd-wizard-component/. This shows you a couple of options rather than the dismal SCD transformation in SSIS.



Regarding Jeff's comments that this can be done in T-SQL; he's 100% spot on. SSIS is just another tool in your belt. Use whatever tool you (and your team) are most comfortable with. I do all of these types of jobs with SSIS. A T-SQL wizard like Jeff, can do the same tasks using T-SQL.



If I've been talking at too high a level of generalities/abstractions, please ask further questions.

Take a look at Andy Leonard's Stairway series here on SSC where he talkes about adding, deleting and updating records in an incremental process. http://www.sqlservercentral.com/stairway/72494/

HTH,
Rob
Post #1394359
Posted Saturday, December 08, 2012 2:53 PM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Thursday, April 03, 2014 6:23 AM
Points: 1,135, Visits: 1,149
shannonjk (12/8/2012)
Also one further quick question; since I do have 75+ tables I am loading on a nightly basis...would I just create multiple jobs to handle this load instead of multiple SSIS packages?

I'm assuming you have an overnight window where your data is not being updated.

I would imagine that you'd have a SSIS package for each DW dimension and Fact table; similar tables may be combined. Of those 75 tables, they probably logically break down to a smaller set of logically independent groupings. Say, customers, sales, inventory, ..., for example. You could create a seperate SQL Agent jobs that could run in those SSIS packages in parallel. Of course, they would have to be no dependencies. If the sales needs the customers to be updated first, then they would be in one package.

HTH,
Rob
Post #1394360
Posted Saturday, December 08, 2012 3:37 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 2:45 PM
Points: 35,962, Visits: 30,255
Hi Shannon,

I didn't have the time to build you a complete system but this is an example of what I'm talking about. Details, some programmer's notes, and a run-able example are in the code that follows. Each job that you start will run asynchronously. You can start as many jobs as you'd like. Just remember that SQL Server does have some limits. If you start 75 import jobs all at once, you're might just peg I/O to the ceiling and CPU to the wall.

The "command" you pass to this proc could, in fact, be a stored procedure name and parameters that would do your imports for you using just a single proc and a cofiguration table (or direct commands... which ever you prefer).

DROP PROCEDURE dbo.CreateExecSelfDeletingTsqlJob 
GO
CREATE PROCEDURE dbo.CreateExecSelfDeletingTsqlJob
/**********************************************************************************************************************
Purpose:
Creates a new job based on the input parameters and executes it.

This example stored procedure has no error checking but will identify if the job started. You can add in the error
checking that you think is necessary.

Programmer's Notes:
1. The job that is created (along with its history) is dropped upon completion whether successful or not.
This proc could be modified to drop the job and retain the history using sp_delete_job instead of having
it automatically drop. That way, you could test to see if the job completed successfully or not.

Usage Example:

DECLARE @IsJobStarted INT,
@JobID UNIQUEIDENTIFIER,
@Command NVARCHAR(MAX)
;
SELECT @Command = N'
IF OBJECT_ID(''tempdb.dbo.MyObjects'',''U'') IS NOT NULL
DROP TABLE tempdb.dbo.MyObjects;
SELECT * --for example only
INTO TempDB.dbo.MyObjects
FROM sys.Objects
;'
EXEC @IsJobStarted = dbo.CreateExecSelfDeletingTsqlJob
@pJobName = N'TestJob',
@pDatabase = N'Master',
@pCommand = @Command,
@pJobID = @JobID OUTPUT
;
--===== Show that the job exists
SELECT * FROM msdb.dbo.sysjobactivity WHERE job_id = @JobID;
PRINT @IsJobStarted;
PRINT @JobID;

--===== Show that the job completed and deleted itself
WAITFOR DELAY '00:00:01'
SELECT * FROM msdb.dbo.sysjobactivity WHERE job_id = @JobID

Revision History:
Rev 00 - 08 Dec 2012 - Jeff Moden
**********************************************************************************************************************/
--===== Define the IO for this procedure
@pJobName SYSNAME,
@pDatabase SYSNAME,
@pCommand NVARCHAR(MAX),
@pJobID UNIQUEIDENTIFIER = NULL OUTPUT,
@pServerInstance SYSNAME = @@SERVERNAME
AS

--===== This creates a self deleting job (see comment in code)
EXEC msdb.dbo.sp_add_job
@job_name = @pJobName,
@enabled = 1,
@notify_level_eventlog = 0,
@notify_level_email = 2,
@notify_level_netsend = 2,
@notify_level_page = 2,
@delete_level = 3, --Makes a self deleting job at job completion
@category_name = N'[Uncategorized (Local)]',
@owner_login_name = N'sa',
@job_id = @pJobID OUTPUT --You'll need this as part of the return
;
--===== This identifies which server instance to run the job on
EXEC msdb.dbo.sp_add_jobserver
@job_name = @pJobName,
@server_name = @pServerInstance
;
--===== This adds the job step. Note that the "command" could be retrieved
-- from a cofiguration table here and could include parameters. So could
-- the database name and the name of the job step (although 'Step 1' works).
EXEC msdb.dbo.sp_add_jobstep
@job_name = @pJobName,
@step_name = N'Step 1',
@step_id = 1,
@cmdexec_success_code = 0,
@on_success_action = 1,
@on_fail_action = 2,
@retry_attempts = 0,
@retry_interval = 0,
@os_run_priority = 0,
@subsystem = N'TSQL',
@command = @pCommand,
@database_name = @pDatabase,
@database_user_name = N'dbo',
@flags = 0
;
--===== This just adds some stuff to the job like which step to start on.
-- It duplicates a lot of the stuff from above (like the auto-delete)
-- so you need to be sure to use the same settings here.
EXEC msdb.dbo.sp_update_job
@job_name = @pJobName,
@enabled = 1,
@start_step_id = 1,
@notify_level_eventlog = 0,
@notify_level_email = 2,
@notify_level_netsend = 2,
@notify_level_page = 2,
@delete_level = 3, --Makes a self deleting job at job completion
@description = N'',
@category_name = N'[Uncategorized (Local)]',
@owner_login_name = N'sa',
@notify_email_operator_name = N'',
@notify_netsend_operator_name = N'',
@notify_page_operator_name = N''
;
--===== Start the job
DECLARE @pIsJobStarted INT
;
EXEC @pIsJobStarted = msdb.dbo.sp_start_job
@job_name = @pJobName,
@server_name = @pServerInstance,
@step_name = 'Step 1'
;
--===== Return with status of if job is running
RETURN ABS(@pIsJobStarted-1)
;
GO




--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

"Change is inevitable. Change for the better is not." -- 04 August 2013
(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1394362
Posted Sunday, December 09, 2012 12:35 AM
Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Friday, December 27, 2013 2:54 PM
Points: 579, Visits: 795
Jeff thank you that makes sense. I will test this here shortly (or possibly tomorrow as I am getting sleepy!) and run some tests on the OPENQUERY. If run into any hiccups with that I will let you know.

And yes in regards to the jobs running concurrently, I have done quite a lot of extensive testing, it is pretty much 9 tables that are 95% of the data, the rest are much smaller tables so I pretty much run the first 9 in serial, then the rest in a parallel chain.

I will need to figure out a way to figure out how to execute the next batch of jobs once the first batch completes, but I guess one step at a time :)

Robert I will give this a whirl as well and weigh the value of both methods to see which will be best for my company. Since we have all these SSIS packages in place I am leaning toward that, but if Jeff's solution tends to be world changing, I may still be swayed that way ;)


Link to my blog http://notyelf.com/
Post #1394384
Posted Wednesday, December 12, 2012 12:39 PM
Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Friday, December 27, 2013 2:54 PM
Points: 579, Visits: 795
Ok so finally getting the testing done on this.

So far I am finding OPENQUERY to be a slower method. Are there particular things I should be utilizing in order to accomplish this? I am simply starting off with the basic insert script of

select column1, column2 from openquery(sourceserv, 'select column1, column2 from table') as t1
LEFT JOIN destinationtable as t2
on t1.key = t2.key

where t2.key is null.

It is taking quite a long time to accomplish but perhaps I am not using the best method here?


Link to my blog http://notyelf.com/
Post #1395849
Posted Friday, December 14, 2012 1:02 PM
Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Friday, December 27, 2013 2:54 PM
Points: 579, Visits: 795
Well it is looking like the SSIS method is not going to work either sigh

Apparently there is a bug when using a Lookup transform on DB2 data when it is a Decimal Data Type. Unfortunately several columns In our source system (DB2) are Decimal, and many of them would need to be compared making this bug rather prohibitive! However, I would still like to thank you Robert, as I did read the stairway to SSIS series and learned quite a bit and was successful at incremental loads on several tables, there are just some major ones that wont work because of the data conversion mentioned above!

So it's back to the OPENQUERY method. Hopefully Jeff can chime back in here regarding some more effective methods :)

I have also revised my insertion and update methods based on the entry and change dates (I had forgotten to do so before) and that has sped things up considerably. However, the deletion process still takes forever as it has to search the whole source table for entries that are not there in the destination table which takes just as long to load the data.


Link to my blog http://notyelf.com/
Post #1396796
Posted Saturday, December 15, 2012 4:13 PM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Thursday, April 03, 2014 6:23 AM
Points: 1,135, Visits: 1,149
shannonjk (12/14/2012)
Well it is looking like the SSIS method is not going to work either sigh

Apparently there is a bug when using a Lookup transform on DB2 data when it is a Decimal Data Type. Unfortunately several columns In our source system (DB2) are Decimal, and many of them would need to be compared making this bug rather prohibitive! However, I would still like to thank you Robert, as I did read the stairway to SSIS series and learned quite a bit and was successful at incremental loads on several tables, there are just some major ones that wont work because of the data conversion mentioned above!

So it's back to the OPENQUERY method. Hopefully Jeff can chime back in here regarding some more effective methods :)

I have also revised my insertion and update methods based on the entry and change dates (I had forgotten to do so before) and that has sped things up considerably. However, the deletion process still takes forever as it has to search the whole source table for entries that are not there in the destination table which takes just as long to load the data.

I'm not familiar with the SSIS - DB2 bug, but why couldn't you cast the decimal to something else; integer say, and then undo the math on the SSIS side. So if the decimal is a 123.45 - multiply by 100, bring into SSIS by 12345. Divide by 100 in SSIS and then write the 123.45 into your DW. I'm sure it's not as simple as that, but there must be a way to work around this -- especially since you're bring in that data currently into your DW.

Just a thought,
Rob
Post #1396916
Posted Monday, December 17, 2012 6:32 AM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Wednesday, April 16, 2014 8:03 AM
Points: 4,052, Visits: 4,192
What is the actual error that you are getting?

The reason that I ask is because I get a bogus truncation error or about a dozen tables from DB2 to staging.

If you use OPENQUERY you can use a merge statement.


For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Post #1397227
Posted Monday, December 17, 2012 3:39 PM
Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Friday, December 27, 2013 2:54 PM
Points: 579, Visits: 795
Robert, the source table is in the lookup transform itself. So I do not know of a way to feed that and change the data types. I have tried doing 'use the results of a sql query' to cast the data type but that does not work as well. Additionally doing that over 12 million rows could be a bit slow and defeat the purpose of the original point anyway ;)

Welsh, the error is that the data types do not match, which digging into the advanced editors and looking at the input/output columns...all the data types are exactly the same. Using the Merge query yields the same extraordinarily slow results as the left join (tested and verified).


Link to my blog http://notyelf.com/
Post #1397449
Posted Monday, December 17, 2012 4:06 PM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Wednesday, April 16, 2014 8:03 AM
Points: 4,052, Visits: 4,192
shannonjk (12/17/2012)
Robert, the source table is in the lookup transform itself. So I do not know of a way to feed that and change the data types. I have tried doing 'use the results of a sql query' to cast the data type but that does not work as well. Additionally doing that over 12 million rows could be a bit slow and defeat the purpose of the original point anyway ;)


Welsh, the error is that the data types do not match, which digging into the advanced editors and looking at the input/output columns...all the data types are exactly the same. Using the Merge query yields the same extraordinarily slow results as the left join (tested and verified).


I never stated that the MERGE would be fast and it is not because you are using an OPENQUERY. I avoid it because it is terribly slow and I could not load the data with the window that I have?

Have you considered using the Data Conversion Task?

I have tables that do not have Date InseDatedate Columns. I can't add columns to the tables because the AS400 is hosted by a Vendor and it is proprietary.

I'm Using combination of Lookup Task for inserts and a Hash for Updates.

I'm loading 100's of tables from the DB2 Data Source alone and several are greater than 12 million.

I'm not using OPENQUERY. I'm redireting the records that fail to a text file.

You have gotten a lot of good advise from the Forum Members, hopefully things work out for you.


For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Post #1397463
« Prev Topic | Next Topic »

Add to briefcase ««123»»

Permissions Expand / Collapse