One understanding of auditing is to log which user at which point of time has done what. The following article shows one way, how to answer these questions for ETL-Processes.
SSIS already provides a logging mechanism (Menu SSIS > Logging). If configured reasonable, this logging is very useful. This SSIS-Logging should be understood as a technical logging. Depending on how logging is configured, the logging table (or file) will get huge and confusing. But if logging is proper configured it gives a very useful overview of the ETL's technical quality. Next to this technical logging a business logging is eligible. Technical and business logging should coexist.
The presented method helps to answer the following questions:
- Which package ran when and how long?
- Who started this package and on which server?
- What kind of errors and how many occurred?
- What was the development version of the package?
- What data were written/changed by what ETL-Process?
Principle of Auditing
Every SSIS-Package inherently provides many interesting information that only have to be saved. Further useful information can be gathered within the package, depending on the specific requirements of the project. But which information is necessary to answer the above questions and how will we gain it?
Which package ran when and how long?
This fundamental question is simply answered by logging every ETL-Process. Here, an ETL-Process means the runtime of one package. The start time and the end time of this runtime is recorded. If there is a master package, that calls other packages, there will be audit records for all the packages.
Who started this package and on which server?
To answer the next question, the logged in user is added to the audit. Certainly this will usually be a technical user under which the server is running. Uncommonly there are also scenarios in which end-users trigger the ETL-Processes.
What kind of errors and how many occurred?
Technical errors can be determined from standard SSIS Logging. Building a custom view, the technical logging can be joined with the audit.
Business errors should be determined within the package and logged to custom tables, perhaps. It depends on the project what details make sense there.
What was the development version of the package?
Sooner or later this question becomes crucial in every bigger project. Commonly projects will be deployed to production while parts still have to be developed. In this first hot phase frequent hot fixes are common. Who is losing overview there, is lost. Fortunately SSIS asserts every package a unique version number for every build. This number should be logged in the audit.
What data were written/changed by what ETL-Process?
With the audit every process will get a surrogate unique ID. This ID should be written to the target tables as metadata. Doing this, every record can be mapped to the process which wrote it or updated it.
The Audit Table
The approach of the auditing is very simple. Every package opens an audit record as its first task with the information that is known from beginning. At the end of the package the audit record is closed and updated with more information.
Of course, the central role is played by the audit table. In the following an audit table is exhibited, that contains informative columns. Beside a brief annotation the information source is shown:
Filling the Audit
Start of Package
At the beginning of every package the procedure SP_Audit_Open is called via SQLTask. The procedure opens one audit with all already known information. Then it returns the ID of the audit-record. This ID has to be asserted to a variable in the package.
ALTER PROCEDURE [dbo].[sp_audit_open]
@packagename varchar(255) = 'N/A'
, @userid varchar(255) = 'N/A'
, @audit_id int output
, @package_id varchar(255) = 'N/A'
, @machinename varchar(255) = 'N/A'
, @execution_guid varchar(255) = 'N/A'
, @father_audit_id int = -1
, @versionBuild int = -1
INSERT INTO output.var_adt_audit(paket, startzeit, userid, package_id, machinename, execution_guid, fk_father_audit, versionBuild)
values (@packagename, getutcdate(), @userid, @package_id, @machinename, @execution_guid, @father_audit_id, @versionBuild);
SELECT @audit_id = ident_current('output.Adt_Audit');
IF @audit_id IS NULL
RAISERROR ('Fehler beim Starten des Audits!',16, 1);
Mainpart of package
The Audit_ID can be carried along to the target tables as metadata. This way the lineage of data can be retraced every time. It is possible to query which process has written/updated which records - or which records were written by which process.
The package is responsible to determine the count of inserted, updated or deleted records and further designated information.
End of package
At the end of every package the procedure SP_Audit_Close is called. The package meanwhile has gathered all designated information and completes the audit-record.
ALTER PROCEDURE [dbo].[sp_audit_close]
, @count_ins int = 0
, @count_upd int = 0
, @count_del int = 0
, @tech_status varchar(50) = 'SUCCESS'
, @fach_status varchar(50) = 'SUCCESS'
IF @audit_id IS NULL or @audit_id < 0
RAISERROR ('Der Prozedur muss eine audit_id übergeben werden!',16, 1);
set endzeit = getutcdate()
, rows_insert = @count_ins
, rows_update = @count_upd
, rows_delete = @count_del
, technischer_status = @tech_status
, fachlicher_status = @fach_status
where audit_id = @audit_id;
When a technical error occurs, the audit record will be closed by an event handler on package layer:
Problematically is the violent abortion of a package. In this case SSIS has no chance to close the audit record. So the admin (or maybe the workflow) has to call the procedure SP_Kill_Process which marks the audit record as aborted.
ALTER procedure [admin].[sp_kill_process](@audit_id int)
set endtime = getutcdate()
, killed = 'X'
where audit_id = @audit_id
Querying the Audit
- The Audit can be joined with SSIS's technical logging using the Execution_GUID. So a view can be build that shows all audits together with the count of technical warnings/errors.
- The columns Rows_Insert / Rows_Update / Rows_Delete provide information about how many records was accessed by one process. Precondition in the shown model is surely that one process writes only to one table.
- The audit_id can be written to the target tables. Then, afterwards the lineage of data can be determined. Otherwise this can be an intriguing task with temporal databases.
- In the range of debugging it can prove to be useful to know, which version the process has had, that was used. For this, the column versionBuild can be queried.
The audit table in the exhibition is just a pattern that can be used probably by every project. Depending on specific requirements of a special project the auditing should be extended.
For example, in my current project I am also logging business errors and the sources. I join this information with the audit. So, I can also answer questions about the source of specific data or which data from which source had to be dismissed for business reasons.