Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Audit transformation in SSIS Expand / Collapse
Author
Message
Posted Thursday, March 7, 2013 5:28 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, June 3, 2014 9:57 AM
Points: 5, Visits: 62
Hi everyone. This is my first time posting on this forum so let me know if you need more information in order to answer my question.

I'm working on a ETL-project and part of the assignent is to implement auditing. The audit transformation is working flawlessly, however I want to add an audit_id
that will correspond to the respective dimension tables.
Like I mentioned, I am able to retrieve data/information for every column in the audit transformation (contained in a table in my database) with an audit_id column that use
identity (1,1) but I am unable to join them to the dimension tables audit_id column.

I've been trying to solve it with a Lookup-transformation but it's been unsuccessful.

I appreciate any feedback or answer.
Thanks in advance!

// Daniel
Post #1427907
Posted Thursday, March 7, 2013 7:55 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Today @ 5:14 AM
Points: 119, Visits: 486
Hi Daniel,

I will need a little bit more details in order to help you.

- Which SQL Server version are you using?
- As far as I've understood, you have implemented your own audit process. How does it works? Which auditing information are you logging?

There is a built-in audit functionality in SQL Server 2008 which gathers the Names and IDs of your packages and tasks, but it is very basic.

Kind Regards


Paul Hernández
http://hernandezpaul.wordpress.com/
https://twitter.com/paul_eng
Post #1427994
Posted Thursday, March 7, 2013 9:20 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, June 3, 2014 9:57 AM
Points: 5, Visits: 62
Thank you for your reply Paul.
I'm using SQL Server 2012 (SP1), Developer Edition.
I use a OLE DB source to extract some data from 3 different tables from a specific database. I then
connect this component with the Audit transformation component. In the audit I selected the following information to audit:
Execution start time, Machine name, User name, Package name and Task name.
After that I create a data flow path between my audit transformation and an OLE DB destination where I load my Audit_table. When I created the audit_table
I created a column called Audit_id with identity (1,1).

This is as far as I get. Like I mentioned in my previous post I am unable to join the Audit_id in my audit_table with my audit_id columns in my various dimension tables.
I have 6 dimension tables and 2 fact tables which currently have audit_id column with the value NULL.

I hope this is all the information you need and if not please let me know.

// Daniel
Post #1428054
Posted Friday, March 8, 2013 12:52 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 1:10 PM
Points: 5,103, Visits: 11,906
daniel-johnsson (3/7/2013)
Thank you for your reply Paul.
I'm using SQL Server 2012 (SP1), Developer Edition.
I use a OLE DB source to extract some data from 3 different tables from a specific database. I then
connect this component with the Audit transformation component. In the audit I selected the following information to audit:
Execution start time, Machine name, User name, Package name and Task name.
After that I create a data flow path between my audit transformation and an OLE DB destination where I load my Audit_table. When I created the audit_table
I created a column called Audit_id with identity (1,1).

This is as far as I get. Like I mentioned in my previous post I am unable to join the Audit_id in my audit_table with my audit_id columns in my various dimension tables.
I have 6 dimension tables and 2 fact tables which currently have audit_id column with the value NULL.

I hope this is all the information you need and if not please let me know.

// Daniel


I don't think I fully understand the structure of what you are doing.

Are you saying that AuditId in your dim tables should be a many to one FK to AuditId in your Audit table?

Is it just a case of retrieving the AuditId for a single insert to your Audit table and then passing that into subsequent data flows?




Help us to help you. For better, quicker and more-focused answers to your questions, consider following the advice in this link.

When you ask a question (and please do ask a question: "My T-SQL does not work" just doesn't cut it), please provide enough information for us to understand its context.
Post #1428433
Posted Friday, March 8, 2013 2:39 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Today @ 5:14 AM
Points: 119, Visits: 486
Hi Daniel,
The problem here is how to look up the Audit_Id in your Audit table using one or more columns from your dimension or fact tables. You have the following information: Execution start time, Machine name, User name, Package name and Task name.

How you can relate this two table, good question. If you store the creation time of the columns (I always create a CretatedOn and UpdatedOn column for Dimension tables and CreatedOn column for Fact tables) you can make the following:

Get the Audit_Id where the CreateDT >Execution start time

But you must do it after an execution and before the next execution in order to obtain one and only one Audit_Id.

This approach does not work if you have different user, machines or packages running at the same time, it is just an idea and it seems to be not precise.

Other approach that definitely works is to drop your separated Audit table and add audit columns to each of your dimension tables and fact tables. It is redundant and space consuming but I think it not so bad at all.


Paul Hernández
http://hernandezpaul.wordpress.com/
https://twitter.com/paul_eng
Post #1428459
Posted Friday, March 8, 2013 5:08 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, June 3, 2014 9:57 AM
Points: 5, Visits: 62
Hi Phil,

Yes that is exactly what I want to accomplish. Basically that I have Audit_ID with identity (1,1) that works as a PK in my Audit table and
my respective dimension tables contain Audit_ID columns that should work as foreign keys to the Audit_ID PK in my Audit table.
And just what you mentioned, I want to retrieve the Audit_ID from my Audit table and pass it on through the data flow to give that value
to my various dimension tables.

Also I should mention that I'm auditing every row that's passing through but the only value that seperates these is the Audit_ID column so I currently end up with
128 rows which is identical (except for the Audit_ID). Can I use a Row Sampling component to filter it down to just 1 row or will this become a problem?

Again, let me know if it's not clear what I currently have and what I want to accomplish. I'd be glad to provide the information if I can. Still rather fresh to both
posting in SSC and working with SSIS.
Post #1428526
Posted Saturday, March 9, 2013 4:28 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, June 3, 2014 9:57 AM
Points: 5, Visits: 62
Hi Paul,

It's true that I could simply drop my audit table and log all my auditing in my various dimension and fact tables but that would not be according
to my assignment I'm afraid. I would like to make a separate table that contains all my auditing.
It might work as a last resort if all else fails however.

I'm not entirely sure how your solution with CreatedOn and UpdatedOn works. Are you saying I should create the columns inside the package?
And would I then have to create this on every dimension table I'm loading, since I am auditing every dimension table?

I wouldn't mind doing that, I just want to make sure I understand your solution.
I appreciate the help!

// Daniel
Post #1428868
Posted Saturday, March 9, 2013 6:47 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 1:10 PM
Points: 5,103, Visits: 11,906
daniel-johnsson (3/8/2013)
Hi Phil,

Yes that is exactly what I want to accomplish. Basically that I have Audit_ID with identity (1,1) that works as a PK in my Audit table and
my respective dimension tables contain Audit_ID columns that should work as foreign keys to the Audit_ID PK in my Audit table.
And just what you mentioned, I want to retrieve the Audit_ID from my Audit table and pass it on through the data flow to give that value
to my various dimension tables.

Also I should mention that I'm auditing every row that's passing through but the only value that seperates these is the Audit_ID column so I currently end up with
128 rows which is identical (except for the Audit_ID). Can I use a Row Sampling component to filter it down to just 1 row or will this become a problem?

Again, let me know if it's not clear what I currently have and what I want to accomplish. I'd be glad to provide the information if I can. Still rather fresh to both
posting in SSC and working with SSIS.


I think that you could achieve this as follows:

1) Create a package-scoped integer variable in SSIS - 'AuditId', perhaps.
2) Insert an ExecuteSQL task before your data flow - this will insert a row into the audit table and return the new AuditId into your AuditId variable.
So the SQL would be something like

insert dbo.Audit (col1, col2)
select ...

select scope_identity()

3) Now you can add a derived column to any subsequent data flow and set its value to whatever is in the variable AuditId.








Help us to help you. For better, quicker and more-focused answers to your questions, consider following the advice in this link.

When you ask a question (and please do ask a question: "My T-SQL does not work" just doesn't cut it), please provide enough information for us to understand its context.
Post #1428881
Posted Saturday, March 9, 2013 10:46 AM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Today @ 7:58 PM
Points: 20,705, Visits: 32,356
Phil Parkin (3/9/2013)
daniel-johnsson (3/8/2013)
Hi Phil,

Yes that is exactly what I want to accomplish. Basically that I have Audit_ID with identity (1,1) that works as a PK in my Audit table and
my respective dimension tables contain Audit_ID columns that should work as foreign keys to the Audit_ID PK in my Audit table.
And just what you mentioned, I want to retrieve the Audit_ID from my Audit table and pass it on through the data flow to give that value
to my various dimension tables.

Also I should mention that I'm auditing every row that's passing through but the only value that seperates these is the Audit_ID column so I currently end up with
128 rows which is identical (except for the Audit_ID). Can I use a Row Sampling component to filter it down to just 1 row or will this become a problem?

Again, let me know if it's not clear what I currently have and what I want to accomplish. I'd be glad to provide the information if I can. Still rather fresh to both
posting in SSC and working with SSIS.


I think that you could achieve this as follows:

1) Create a package-scoped integer variable in SSIS - 'AuditId', perhaps.
2) Insert an ExecuteSQL task before your data flow - this will insert a row into the audit table and return the new AuditId into your AuditId variable.
So the SQL would be something like

insert dbo.Audit (col1, col2)
select ...

select scope_identity()

3) Now you can add a derived column to any subsequent data flow and set its value to whatever is in the variable AuditId.







Instead of :


insert dbo.Audit (col1, col2)
select ...

select scope_identity()


Why not:


insert dbo.Audir(col1,col2)
output INSERTED.AuditID -- or what ever column is the Identity column
select ...


The output value(s) could also be put into a temporary table or table variable if needed first. It depends on how the INSERT is being done, is it a single row or multi-row entry.



Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
Post #1428912
Posted Thursday, March 14, 2013 4:50 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, June 3, 2014 9:57 AM
Points: 5, Visits: 62
Hi Phil and Lynn,

I have been trying to solve my problem with your instuctions but I'm still not able to work it out I'm afraid. Maybe I am missing something obvious, so I will explain a bit more in detail how far I have got.

I created an integer variable that I named AuditID and before I do my first load to my first staging table I insert a row in my Audit_table with the following t-sql:
SET IDENTITY_INSERT dbo.STAGE_AUDIT ON

INSERT INTO dbo.STAGE_AUDIT ([Audit ID], [Package Name], [Execution Start Time], [Machine name], [User name], [Task name])
VALUES (1, 'Staging', '2013-03-13', 'DANIEL-SSIS', 'ECUTBILDNING\Administrator', 'Stage Geography')

SELECT SCOPE_IDENTITY()

This way it inserts the same information as my Audit Transformation would. Maybe this is not how the SQL statement should look like, but I'm honestly unsure what the statement should look like.
Nevertheless, I'm successfully loading my variable with 1 in the Result Set section saying result name = 0, Variable name = User::AuditID. Also I'm using result set as single row.

From this Execute SQL Task my data flow continues into my first load Data Flow task which I can summarize like this:
A source where I retieve some data and insert it into my staging table.
Next to it I use the same source, retrieving the same data but before I connect it to my destination I use an Audit Transformation and also a Row Sampling to only have 1 row as the output since they are all identical except for their AuditID. I then connect it to my destination table, which is my Audit_Table.
Lastly I use a source which is my staging table that I loaded previously. My data path is connected with a Derived Column which looks as following:
I replace my audit_id column with the expression audit_id == @User::AuditID and then I connect this Derived Column with my destination, which is the same table as my source.

Sadly when I run this package my AuditID remains NULL in my staging table. Obviously I'm doing something wrong but that is why I'm here asking for help is it not :)

Thanks again for your time and help, and I'm sorry if it's not very clear with the lack of images and the fact that it's all just text. Let me know if it's unclear or wether you need more information.

// Daniel
Post #1430871
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse