Audit Trails and Logging Part II

  • Comments posted to this topic are about the item Audit Trails and Logging Part II

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Good article, by coincidence I wrote a script recently that generates three triggers for every table in a database for tracking change history of tablecontents into one single _Changehistory table using the xml datatype.

    This script should enable you to implement data change auditing with little effort. Please read preconditions and limitations.

    -- ==========================================================================================

    -- Author:Hans Michiels

    -- Create date: 4 June 2008

    -- Description:

    --Generate triggers to detect inserts, deletes and updates in a database

    -- PRECONDITIONS:

    -- * Tablenames do not contain spaces

    -- * A table _ChangeHistory exists in your database, if not this script will create this table.

    -- * All tables have a single column that is primary key or has a unique index, if not a column

    -- called 'id' will be added to the table by this script

    -- * After schema changes in your database (e.g. new columns or tables) you should rerun this script to

    -- make sure that the triggers contain these new columns and new triggers are made for new tables.

    -- LIMITATIONS:

    -- * Impact on performance has not been tested, be careful when using this in a production

    -- environment!

    -- * Changes of columns of type text, ntext and image are not stored in table [_ChangeHistory]

    -- * It seems like only first 8000 characters of changes of columns of type varchar(max),

    -- nvarchar(max) and varbinary(max) are stored in table [_ChangeHistory], but this needs more testing.

    -- ==========================================================================================

    DECLARE @Cursor1 CURSOR

    DECLARE @Cursor2 CURSOR

    DECLARE @Tablename nvarchar(255)

    DECLARE @SQL nvarchar(max)

    DECLARE @SourceDb nvarchar(50)

    DECLARE @KeyColumn varchar(50)

    DECLARE @ColumnName varchar(50)

    DECLARE @ColumnNames varchar(max)

    DECLARE @SQLHead nvarchar(255)

    DECLARE @Newline varchar(2)

    DECLARE @ColIndent varchar(8)

    SET ANSI_NULLS ON;

    SET QUOTED_IDENTIFIER ON;

    SET ANSI_PADDING ON;

    Use YourDatabaseNameHere

    IF NOT EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[dbo].[_ChangeHistory]') AND OBJECTPROPERTY(id, N'IsUserTable') = 1)

    CREATE TABLE [dbo].[_ChangeHistory](

    [ID] [int] IDENTITY(1,1) NOT NULL,

    [TableName] [varchar](50) NOT NULL,

    [Action] [char](6) NOT NULL,

    [DateCreated] [datetime] NOT NULL CONSTRAINT [DF__ChangeHistory_DateCreated] DEFAULT (getdate()),

    [KeyValue] [varchar](255) NULL,

    [DeletedRecord] [xml] NULL,

    [InsertedRecord] [xml] NULL,

    [Username] [varchar](50) NOT NULL CONSTRAINT [DF__ChangeHistory_Username] DEFAULT (user_name())

    ) ON [PRIMARY];

    SET @Newline = CHAR(13) + CHAR(10)

    SET @ColIndent = SPACE(8)

    SET @SQLHead = '-- ============================================='

    + @Newline + '-- Author: Hans Michiels'

    + @Newline + '-- Create date: ' + CONVERT(nvarchar(25), getdate(),103)

    + @Newline + '-- Description: This trigger is generated with a'

    + @Newline + '-- script. Do not modify it by hand.'

    + @Newline + '-- ============================================='

    SET NOCOUNT ON

    SET @Cursor1 = CURSOR STATIC FOR

    SELECT CONVERT(varchar(255), so.name) AS tablename

    FROM dbo.sysobjects AS so

    WHERE so.type = 'U' and so.name <> '_ChangeHistory' -- and so.name = '_TestTypes'

    ORDER BY so.name

    OPEN @Cursor1

    FETCH NEXT FROM @Cursor1 INTO @Tablename

    WHILE @@FETCH_STATUS = 0

    BEGIN

    --\

    ---> Get the primary key field name

    --/

    SELECT @KeyColumn = NULL;

    SELECT @KeyColumn = c.name

    FROM sys.objects AS o INNER JOIN

    sys.indexes AS i ON o.object_id = i.object_id INNER JOIN

    sys.index_columns AS ic ON i.index_id = ic.index_id AND i.object_id = ic.object_id INNER JOIN

    sys.columns AS c ON ic.column_id = c.column_id AND ic.object_id = c.object_id

    WHERE (o.type = 'U')

    AND (o.name = @Tablename)

    AND ((i.is_primary_key = 1) OR (i.is_unique = 1 AND i.ignore_dup_key = 0 AND i.is_unique_constraint = 1))

    ORDER BY i.is_primary_key DESC, i.type

    IF @KeyColumn IS NULL

    BEGIN

    -- If the table has no primary key, a column named 'id' will be added as primary key. It is defined as IDENTITY (autonumber)

    SELECT @SQL = 'ALTER TABLE [dbo].[' + @Tablename + '] ADD id INT IDENTITY CONSTRAINT PK_' + @Tablename + '_ID PRIMARY KEY'

    EXEC sp_executesql @SQL

    SELECT @KeyColumn = 'id'

    END

    --\

    ---> Cursor2 for building string with all fieldnames.

    --/

    SET @Cursor2 = CURSOR STATIC FOR

    SELECT c.name as columnname

    FROM sys.objects AS o

    INNER JOIN sys.columns AS c on o.object_id = c.object_id

    WHERE o.name = @Tablename

    AND o.type = 'U'

    AND (c.system_type_id <> 34) -- image } These types cannot be used as source

    AND (c.system_type_id <> 35) -- text } column in SQL statement that uses

    AND (c.system_type_id <> 99) -- ntext } FOR XML AUTO

    ORDER BY c.column_id

    OPEN @Cursor2

    SET @ColumnNames = ''

    FETCH NEXT FROM @Cursor2 INTO @ColumnName

    WHILE @@FETCH_STATUS = 0

    BEGIN

    SELECT @ColumnNames = @ColumnNames + ', [' + @ColumnName + ']' + @Newline + @ColIndent

    FETCH NEXT FROM @Cursor2 INTO @ColumnName

    END

    DEALLOCATE @Cursor2

    -- Remove first comma:

    SELECT @ColumnNames = SUBSTRING(@ColumnNames, 3, 9999)

    --\

    ---> UPDATE trigger - DROP current

    --/

    SELECT @SQL = 'IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N''[dbo].[trgChgHis_' + @Tablename + 'Update]'') AND OBJECTPROPERTY(id, N''IsTrigger'') = 1) DROP TRIGGER dbo.trgChgHis_' + @Tablename + 'Update'

    EXEC sp_executesql @SQL

    --\

    ---> UPDATE trigger - CREATE new

    --/

    SELECT @SQL = @SQLHead

    + @Newline + 'CREATE TRIGGER dbo.trgChgHis_' + @Tablename + 'Update ON dbo.' + @Tablename + ' AFTER UPDATE AS '

    + @Newline + 'BEGIN '

    + @Newline + ' SET NOCOUNT ON;'

    + @Newline + ' INSERT INTO [dbo].[_ChangeHistory]([TableName],[Action],[KeyValue],[DeletedRecord],[InsertedRecord])'

    + @Newline + ' SELECT '

    + @Newline + ' ''' + @Tablename + ''' AS [TableName]'

    + @Newline + ' , ''UPDATE'' AS [Action]'

    + @Newline + ' , CONVERT(varchar(255), [' + @KeyColumn + ']) AS [KeyValue]'

    + @Newline + ' , (Select '

    + @Newline + ' ' + @ColumnNames

    + 'From deleted as ' + @Tablename + 'Record'

    + @Newline + ' Where [' + @Tablename + 'Record].[' + @KeyColumn + '] = [maindel].[' + @KeyColumn + ']'

    + @Newline + ' FOR XML AUTO, TYPE, ROOT(''Deleted'') ) AS [DeletedRecord]'

    + @Newline + ' , (Select '

    + @Newline + ' ' + @ColumnNames

    + 'From inserted as ' + @Tablename + 'Record'

    + @Newline + ' Where [' + @Tablename + 'Record].[' + @KeyColumn + '] = [maindel].[' + @KeyColumn + ']'

    + @Newline + ' FOR XML AUTO, TYPE, ROOT(''Inserted'') ) AS [InsertedRecord]'

    + @Newline + ' FROM deleted AS maindel;'

    + @Newline + 'END'

    EXEC sp_executesql @SQL

    --\

    ---> INSERT trigger - DROP current

    --/

    SELECT @SQL = 'IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N''[dbo].[trgChgHis_' + @Tablename + 'Insert]'') AND OBJECTPROPERTY(id, N''IsTrigger'') = 1) DROP TRIGGER dbo.trgChgHis_' + @Tablename + 'Insert'

    EXEC sp_executesql @SQL

    --\

    ---> INSERT trigger - CREATE new

    --/

    SELECT @SQL = @SQLHead

    + @Newline + 'CREATE TRIGGER dbo.trgChgHis_' + @Tablename + 'Insert ON dbo.' + @Tablename + ' AFTER INSERT AS '

    + @Newline + 'BEGIN '

    + @Newline + ' SET NOCOUNT ON;'

    + @Newline + ' INSERT INTO [dbo].[_ChangeHistory]([TableName],[Action],[KeyValue],[DeletedRecord],[InsertedRecord])'

    + @Newline + ' SELECT '

    + @Newline + ' ''' + @Tablename + ''' AS [TableName]'

    + @Newline + ' , ''INSERT'' AS [Action]'

    + @Newline + ' , CONVERT(varchar(255), [' + @KeyColumn + ']) AS [KeyValue]'

    + @Newline + ' , NULL AS [DeletedRecord]'

    + @Newline + ' , (Select '

    + @Newline + ' ' + @ColumnNames

    + 'From inserted as ' + @Tablename + 'Record'

    + @Newline + ' Where [' + @Tablename + 'Record].[' + @KeyColumn + '] = [mainins].[' + @KeyColumn + ']'

    + @Newline + ' FOR XML AUTO, TYPE, ROOT(''Inserted'') ) AS [InsertedRecord]'

    + @Newline + ' FROM inserted AS mainins;'

    + @Newline + 'END'

    EXEC sp_executesql @SQL

    --\

    ---> DELETE trigger - DROP current

    --/

    SELECT @SQL = 'IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N''[dbo].[trgChgHis_' + @Tablename + 'Delete]'') AND OBJECTPROPERTY(id, N''IsTrigger'') = 1) DROP TRIGGER dbo.trgChgHis_' + @Tablename + 'Delete'

    EXEC sp_executesql @SQL

    --\

    ---> DELETE trigger - CREATE new

    --/

    SELECT @SQL = @SQLHead

    + @Newline + 'CREATE TRIGGER dbo.trgChgHis_' + @Tablename + 'Delete ON dbo.' + @Tablename + ' AFTER DELETE AS '

    + @Newline + 'BEGIN '

    + @Newline + ' SET NOCOUNT ON;'

    + @Newline + ' INSERT INTO [dbo].[_ChangeHistory]([TableName],[Action],[KeyValue],[DeletedRecord],[InsertedRecord])'

    + @Newline + ' SELECT '

    + @Newline + ' ''' + @Tablename + ''' AS [TableName]'

    + @Newline + ' , ''DELETE'' AS [Action]'

    + @Newline + ' , CONVERT(varchar(255), [' + @KeyColumn + ']) AS [KeyValue]'

    + @Newline + ' , (Select '

    + @Newline + ' ' + @ColumnNames

    + 'From deleted as ' + @Tablename + 'Record'

    + @Newline + ' Where [' + @Tablename + 'Record].[' + @KeyColumn + '] = [maindel].[' + @KeyColumn + ']'

    + @Newline + ' FOR XML AUTO, TYPE, ROOT(''Deleted'') ) AS [DeletedRecord]'

    + @Newline + ' , NULL AS [InsertedRecord]'

    + @Newline + ' FROM deleted AS maindel;'

    + @Newline + 'END'

    EXEC sp_executesql @SQL

    FETCH NEXT FROM @Cursor1 INTO @Tablename

    END

    DEALLOCATE @Cursor1

    GO

  • Another way to keep the log database "small":

    When I log workflow activity I normally decide how long the log data will be stored. Say I have a limit of 14 days: any log record older than 14 days will be deleted.

    Since I use my workflow activity logs for debugging interfaces, I don't need to keep the logged data for longer than that.

  • Nice one... I didn't expect the part II to come so quickly. How you people write so quick.....:)

  • Nice article. I have done this a lot. Another option I don't know if you have considered is using transactional replication. It is nice because it splits the logging from the transaction, but it can still be nearly real-time in a lot of systems.

    Setting it up is pretty easy - just configure replication taking all of the defaults, and then modify the stored procedures it creates at the subscriber. There are a few snags to deal with, but it works pretty well overall.

  • Anirban Paul (6/10/2008)


    Nice one... I didn't expect the part II to come so quickly. How you people write so quick.....:)

    I actually wrote the whole thing as one long article about a month ago. Steve had me split it in two. I can write pretty rapidly, but not quite like what you're seeing. 🙂

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Nice series. I think the key is knowing what to log and why. I have been involved with logging where everything was logged because we could and I don't know that anyone ever looked at the logged data. I have also been involved with situations where changes were logged and used on a regular basis to show the "history" of a product.

    This is 1 area where I think DB2 at least on the AS400(iSeries) has SQL Server beat because you can query the journals (think transaction log) directly. It does store a limited amount of data, but in one place I worked we had a process that queried the journals and loaded a history of the product in SQL Server with virtually no impact on the DB2 database.

  • I would like to get a feel from the author and others about a different type of logging. One requirement I have seen is that because of privacy issues, data loss protection issues, and sometimes legal requirements, systems must log who has “seen” the data. Rather than logging INSERTS, UPDATES, and DELETES, this is targeted to logging SELECTS. For example, let’s say you have a web site that allows end users to search for patients. The end users have a few different search criterion to search by. And for example, let’s say a user searches for a last name of Smith. The requirement says we must log who caused the search, when the search was done, what the criterion was and the log the results returned to the application. In this case assume 10 fields were returned and 500 patients matched the last name of Smith. Therefore our log tables would need to contain that user John Doe on 06/10/2008 at 8:45am did a search of LastName = Smith and 500 records were returned by the search and record what data was returned (10x500).

    I think this type of requirement is very difficult to implement by using custom code and not to mention the storage required to do this. But in some cases it is law. Thoughts on the subject?

  • Lee,

    I'm glad I'm not in that position. This impacts how you allow users to query the data as well. I would think the easiest way to implement this would be by using stored procedures for data access (which I would do anyway) and log within each stored procedure. Of course all the latest and greatest programming tools seem to use ORM, like Linq To SQL or SubSonic, and these tools all generate the selects, how do you log that? Now you have to add the audit code to the DAL. When doing that do I run the query twice, once to return it to the client and the second time to insert into the audit table, or do I do a batch update in the DAL using the dataset, list, collection in the DAL? I don't even know if these tools will do a set-based insert.

  • These were a pretty good basic overview of two of the options available. I think there is one option that was left out that offers much more flexibility than reading the transaction log with much lower overhead and lower chance of introducing errors than triggers. I'm talking about server side traces.

    Reading the transaction log won't show you any information about who selected from what table etc... This is critical for many databases that store sensitive data. Using stored procedures and triggers is cumbersome, error prone and imposes potentially significant overhead on your transactions. But setting up a server side trace is efficient (as long as you don't go crazy) can be tarteted to log just the operations against objects that you define, and is more difficult to circumvent than triggers. It also lends itself nicely to querying and analysis.

    Lee, I'm not aware of a solution that will actually record the result sets returned to a client, that seems like a ridiculous requirement. I've worked in credit card processing, banking, and insurance and I've never seen any audit requirement that is even close to that...I'm not saying that it couldn't exist, but man, that seems way excessive. Doing that it would be easy to imagine that the logging database would quickly dwarf the source.

    Now, tracking which queries were sent, which tables/columns were queried, and who sent them makes sense and is easily accomplished with a trace.

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • DCPeterson (6/10/2008)


    Now, tracking which queries were sent, which tables/columns were queried, and who sent them makes sense and is easily accomplished with a trace.

    This is why I did not mention server side traces. Even running constant server side traces and storing in a table for querying will lead to a very large database quite quickly.

  • Jack Corbett (6/10/2008)


    DCPeterson (6/10/2008)


    Now, tracking which queries were sent, which tables/columns were queried, and who sent them makes sense and is easily accomplished with a trace.

    This is why I did not mention server side traces. Even running constant server side traces and storing in a table for querying will lead to a very large database quite quickly.

    True, but an audit on a busy system is likely to become large regardles of the method used to create the audit. I have yet to see a system where the audit requirement stated that every time any user accesses any data, we have to track it. There is almost always some data that is sensitive and some that is not. Usually, you want to audit any access to the sensitive data and then audit changes to certain data. Traces can be tarteted (filtered) to capture data for only those events/objects/users that you are interested in.

    The most common problem I've seen when it comes to audit logs is that the requirements aren't well defined so you get something like this: "well, we don't know what we might want to see, so just capture everything..." Sure, I can do that, but then you will be buried under a mountain of useless audit data.

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • Lee Forst (6/10/2008)


    I would like to get a feel from the author and others about a different type of logging. One requirement I have seen is that because of privacy issues, data loss protection issues, and sometimes legal requirements, systems must log who has “seen” the data. Rather than logging INSERTS, UPDATES, and DELETES, this is targeted to logging SELECTS. For example, let’s say you have a web site that allows end users to search for patients. The end users have a few different search criterion to search by. And for example, let’s say a user searches for a last name of Smith. The requirement says we must log who caused the search, when the search was done, what the criterion was and the log the results returned to the application. In this case assume 10 fields were returned and 500 patients matched the last name of Smith. Therefore our log tables would need to contain that user John Doe on 06/10/2008 at 8:45am did a search of LastName = Smith and 500 records were returned by the search and record what data was returned (10x500).

    I think this type of requirement is very difficult to implement by using custom code and not to mention the storage required to do this. But in some cases it is law. Thoughts on the subject?

    It is possible. There are a couple of ways to do it. One would be to set up something in the data connection layer that tracks that kind of thing. Another would be to add logging to your select procs, and then make sure all data is through those procs.

    A select proc could either use system_user or an input parameter to log who selected what, could certainly insert the input params and run-time into a log, and could, probably using For XML, log the results. It would take some work to build such a log, but not a lot, with For XML and an XML data type for the results log column. Would be much more complex to build with logging each column as its correct data type, or something like that.

    The main thing, though, is that this would not log if a DBA bypassed the procs and ran a direct select against the database, through the command line or Management Studio or some such. You would have to make sure nobody had that kind of access to the production database.

    I haven't had to set up anything like that kind of security, but I think it could be done.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • DCPeterson, on the subject of a "server side trace". I'm not clear on what you mean by that. Are you talking about something outside the database, or some feature of SQL 2005/2000 that I'm unaware of? (I've read that SQL 2008 will offer more built-in auditing features, but haven't had a chance to play with those yet.) Or do you just mean running Profiler constantly?

    Just running Profiler constantly will give you a log, but the files will quickly fill up a lot of disk space, and can't be queried the way log tables can (at least, not easily). If you have Profiler inserting into tables, the performance hit is just about the same as using triggers, as is the space used.

    Could be done. Like any of the other methods, it has advantages and disadvantages. I can't think of an easy way to query such data to readily get the complete history of a particular row in a particular table, to run undo commands, or to track workflow data for management reports. On the other hand, you could get performance data right in your audit log, which could be useful for other things.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • DCPeterson (6/10/2008)


    The most common problem I've seen when it comes to audit logs is that the requirements aren't well defined so you get something like this: "well, we don't know what we might want to see, so just capture everything..." Sure, I can do that, but then you will be buried under a mountain of useless audit data.

    This is the problem. Most audit trails created audit everything and then expect you to sort it out.

    Now I am not advocating that this as the solution, but theoretically you could run your server-side trace to log the desired (well-filtered) selects to a table and, using a trigger, service broker, or another custom service, re-run the query to get the results using an account that is ignored by the trace or when logged ignored by the application re-running queries. This takes care of missing ad-hoc queries. Of course now you have doubled the read load on your database.

    Needless to say this is an interesting problem to consider, especially since I don't know how you could deny rights to the DBA. Then you would also have to protect the logged data and would need to log access to the logged data? Seems like an endless loop:w00t:

Viewing 15 posts - 1 through 15 (of 32 total)

You must be logged in to reply to this topic. Login to reply