RE: Conditional Set-Based Processing: Moving Towards a Best Practice for ETL

SSC Rookie

Points: 26

August 21, 2010 at 9:40 am

Hi folks

Sorry for dragging up an old thread, but I stumbled upon the related article whilst looking for a bit of info on best practices for the use of try/catch within a transaction for a small ETL process. A very informative discussion, so thanks to all; I like the binary/hyperexponential chop ideas but will try to explore using bcp where poss, have found it very useful for non-production stuff in the past.

Anyway, here's my question and scenario:

In Mel's example in the original article, he deletes from his staging table after committing the transaction which inserts into the target table and updates the source table, and even explicitly comments the fact to stress it's importance:

--Don't delete from temp table until transaction is committed

DELETE @TempSource

Does this matter? In similar code I wrote the other day before reading this, I assumed I wanted the delete from staging to be part of the transaction too, so the entire process gets committed or rolled back together? Otherwise if you get a random error during the delete aren't you left with data that you don't want in the staging table? Not a massive issue as my process truncates the staging table at the start, but I'm wondering if I'm missing something, or worrying over something inconsequential? Thanks

My scenario is thus:

Existing process uses SSIS to import data directly into production table. The data actually needs to be transformed to be in the form our stored procs want it in, and at the moment this is done using a view, which performs pretty badly - columns we need to index on exist courtesy of expressions in the view. So options are index the view, or change the import process.

I opted for the latter, and am adding a staging table, and a stored proc which transforms the data from staging into a new target table, plus copies the raw data into the current production table, which in effect becomes our audit store. Existing stored procs will be changed to point at the new target table rather than existing production raw data store.

There's no real constraints on the target table so I shouldn't have to worry about errors too much, but still want to use an explicit transaction, something like:

BEGIN TRAN

-- Insert from staging into target table, transforming columns where desired

INSERT tbl_Target

SELECT col1

,MySchema.fn_transform(col2)

,ISNULL(col3)

FROM tbl_Staging;

-- Copy raw data from staging into store

INSERT tbl_Store

SELECT col1, col2, col3

FROM tbl_Staging;

-- Delete data from staging

DELETE FROM tbl_Staging;

IF (@@Error = 0)

COMMIT TRAN;

ELSE

BEGIN

ROLLBACK TRAN;

-- RAISERROR to tell SSIS package there's been a problem

END

This process runs every hour to import a few thousand rows, thankfully mostly during office hours not middle of the night!:-P

Now, i've never used T-SQL try/catch before, as I worked mainly with SQL 2000 until this year, but feel I should use it, especially as it can be used to detect a deadlock and retry (in this case there shouldn't be other stuff querying the staging and store tables but you never know)

But does it matter whether my delete from staging is inside the transaction?

Thanks