SQLServerCentral Article

Audit Trails and Logging Part II

,

The prior article in this series went over auditing in general and passive logging in detail. The other option is active logging, which we will cover here.

Active Logging

Inline (proc logging and trigger-based logging have one real advantage over passive logging. They log the data into query-able tables, which can be a major advantage, if used correctly.

This can also be their biggest disadvantage, because those log tables will continue to grow so long as you keep the log data. Managing them can become quite a chore. Various archiving or dumping solutions can be used to mitigate this, but they have to done with care.

They also have the disadvantage of slowing down any logged transactions. Whether it’s done by a trigger or by the same proc that updates the data in the first place, it’s more for the server to do, and thus will require a bit longer to complete. There are techniques that can be used to reduce that effect, but not to get rid of it without also getting rid of the auditing. Some of these techniques are outlined below in Tips and Hints.

Procs

Having the proc that does the insert/update/delete also log the action has some obvious advantages. Any developer working on the proc will know that it does logging, and can work with that. All the code for the transaction is in one place, and can be tested and debugged all at once. If an error is thrown, because of a change somewhere else in the database (besides inside the proc itself), it is easy to figure out what’s causing it. With a trigger, if a proc generates an error, you have to determine if the error is in the proc, or in the trigger, before you can fix it.

This also means that the logging code can be controlled a bit more. There might be some types of updates that you don’t want to bother with logging. If, for example, the audit trail is mainly for blamestorming, there’s no real reason to log automatic updates. If it’s for tracking workflow activity, maybe there are certain statuses that need to be tracked, and other ones don’t need to be. And so on.

The disadvantage to this, compared to triggers, is that you have to write the code into each proc. If the table changes at all, the logging code might have to be updated in a dozen or more places, depending on how you build your stored procedures.

If you build one logging procedure, and every other proc calls that one, you can bypass some of these disadvantages, while keeping some of the advantages.

Triggers

The advantage to having a trigger do the logging is that it’s automatic. No matter who updates the table, no matter if it’s done by a proc, or some application directly connecting to the table, or if it’s a DBA using Management Studio, so long as the trigger is active, the logging will happen. Build it once, not a dozen times, and don’t bypass it.

You’ll have less control over the actions in a trigger, because it won’t have access to input parameters in the proc, but still quite a bit of control can be achieved based on the data in the inserted and deleted tables.

Which brings us to the other advantage of triggers. They have direct access to the inserted and deleted tables. In SQL 2005, procs can use the Output command to achieve this same functionality, but not in SQL 2000.

Hints and Tips

Regardless of whether the active logging is done by procs or triggers, or some combination thereof, there are some important things to keep in mind.

First, the tables you are inserting the log data into. I highly recommend keeping them in a separate database, on the same server as the database you are auditing. These tables will grow and grow, and keeping them in the main database means that database will grow and grow. This means larger backups, with take longer to create and longer to restore. It also means more data in your primary database’s transaction log, which means the log will grow more too.

The separate log database can be set on Simple Recovery mode, which means its log file won’t grow as much. A separate log database also means you can log multiple primary databases all in the same place. (That can also be overdone, causing too many log actions in the same files on the same drive, which can ruin performance.)

Another important thing to do with the log tables is make sure they have an ascending clustered index, so that data being inserted can be as fast as possible. Depending on volume of inserts, it might be a good idea to have multiple tables in multiple files, possibly even on different hard drives.

Don’t have any other indexes on the log tables. If you need more indexes for reporting purposes, transfer the data at set intervals from the live logs to indexed tables. This again is for purposes of speed of insertion.

Check constraints, foreign keys, etc., should all be disabled on the log tables, also for speed.

Only actively log what you need to. Use active logging on tables that will require reports, or user-accessible undo buttons or reviews. Use passive logging on everything else.

Simply by using active logging, you’re going to slow down your main database. It’s important to do everything you can to minimize the impact of that.

Details and Options

There are several options for what to log, as well as how to log it.

One option is to create a parallel table, with the same columns as the main table, and insert every update/insert/delete into there. This usually means adding a LogDate column, and often a LogBy column.

So, if the main table looks like:

Create table dbo.Orders (
OrderID int identity primary key,
AccountID int not null references dbo.Accounts(AccountID),
StartDate datetime not null default(getdate()),
ItemID int not null references dbo.Items(ItemID),
Qty int not null default(1),
Constraint CK_Qty check(Qty > 0),
StatusID int not null references dbo.OrdersStatuses(StatusID),
Price money not null,
Constraint CK_Price check(Price > 0))

The log table might look like:

Create table dbo.Orders (
LogDate datetime not null default(getdate()),
LogBy varchar(100) not null default(system_user),
Action char(1),
OrderID int,
AccountID int,
StartDate datetime,
ItemID int,
Qty int,
StatusID int,
Price money)
Go
Create clustered index CID_OrdersLog on dbo.Orders(LogDate)
And the trigger might look like:
Create trigger Orders_Log on dbo.Orders
After insert, update, delete
As
Insert into LogDatabase.dbo.Orders (Action, OrderID, AccountID, StartDate, ItemID, Qty, StatusID, Money)
Select
Case
             When inserted.OrderID is null then ‘D’
             When deleted.OrderID is null then ‘I’
             Else ‘U’
End,
isnull(inserted.OrderID, deleted.OrderID),
isnull(inserted.AccountID, deleted.AccountID),
isnull(inserted.StartDate, deleted.StartDate),
isnull(inserted.ItemID, deleted.ItemID),
isnull(inserted.Qty, deleted.Qty),
isnull(inserted.StatusID, deleted.StatusID),
isnull(inserted.Money, deleted.Money)
from inserted
full outer join deleted
             on inserted.OrderID = deleted.OrderID

That will give you a full copy of the data each time some action is taken on it.

This has the advantage of being simple to set up in the first place. It gets the data being logged right away. It is also very easy to query.

But what happens if you add a new column to the Orders table? Or figure out that an order can have more than one item and normalize that into a sub-table with OrderID, ItemID and Qty in it? Or decide to stop storing the price in the Orders table and instead have that be a reference to another table that is joined to based on ItemID and OrderDate and Qty?

In that case, you have to modify the log table and the log trigger for each of these modifications to the base table.

Also, what happens in this case if you update only one column? Yep, it logs the whole row. And what about cases where an update is run, but nothing is actually changed? “Update dbo.Orders set Qty = 1” will register as an update command, and fire the trigger, and log the row, even if Qty is already 1.

So, easy to set up, a bit of a pain to maintain if the database evolves, and it burns up hard drive space for unnecessary logging.

Another option would be to create the log table as:

Create table dbo.Orders (
LogDate datetime not null default(getdate()),
LogBy varchar(100) not null default(system_user),
ColName varchar(128) not null,
Val varchar(max))

(Same clustered index as before.)

The trigger then becomes:

Create trigger dbo.Orders_Log on dbo.Orders
After insert, update, delete
As
If update(AccountID)
             Insert into LogDatabase.dbo.Orders (ColName, Val)
             Select ‘AccountID’, isnull(inserted.AccountID, deleted.AccountID)
             From inserted
             Full outer join deleted
                         On inserted.OrderID = deleted.OrderID
             Where inserted.AccountID != deleted.AccountID
             Or inserted.AccountID is not null and deleted.AccountID is null
             Or inserted.AccountID is null and deleted.AccountID is not null
If update(ItemID)
             Insert into LogDatabase.dbo.Orders (ColName, Val)
             Select ‘ItemID’, isnull(inserted.ItemID, deleted.ItemID)
             From inserted
             Full outer join deleted
                         On inserted.OrderID = deleted.OrderID
             Where inserted.ItemID != deleted.ItemID
             Or inserted.ItemID is not null and deleted.ItemID is null
             Or inserted.ItemID is null and deleted.ItemID is not null
If update(Qty)
             Insert into LogDatabase.dbo.Orders (ColName, Val)
             Select ‘Qty, isnull(inserted.Qty, deleted.Qty)
             From inserted
             Full outer join deleted
                         On inserted.OrderID = deleted.OrderID
             Where inserted.Qty != deleted.Qty
             Or inserted.Qty is not null and deleted.Qty is null
             Or inserted.Qty is null and deleted.Qty is not null
If update(StatusID)
             Insert into LogDatabase.dbo.Orders (ColName, Val)
             Select ‘StatusID’, isnull(inserted.StatusID, deleted.StatusID)
             From inserted
             Full outer join deleted
                         On inserted.OrderID = deleted.OrderID
             Where inserted.StatusID != deleted.StatusID
             Or inserted.StatusID is not null and deleted.StatusID is null
             Or inserted.StatusID is null and deleted.StatusID is not null
If update(Price)
             Insert into LogDatabase.dbo.Orders (ColName, Val)
             Select ‘Price’, isnull(inserted.Price, deleted.Price)
             From inserted
             Full outer join deleted
                         On inserted.OrderID = deleted.OrderID
             Where inserted.Price != deleted.Price
             Or inserted.Price is not null and deleted.Price is null
             Or inserted.Price is null and deleted.Price is not null

As you can see, the trigger is now quite a few more lines of code, but it has the advantage of only recording the columns that actually change. For inserts and deletes, that’s all of them, but for updates, it only gets the ones that are actually changed. This means smaller log tables.

Simply because it’s more code, it will take a bit longer to run.

This can be at least partially handled by changing it to get rid of the If statements and using a Union operation, as follows:

Create trigger dbo.Orders_Log on dbo.Orders
After insert, update, delete
As
             Insert into LogDatabase.dbo.Orders (ColName, Val)
             Select ‘AccountID’, isnull(inserted.AccountID, deleted.AccountID)
             From inserted
             Full outer join deleted
                         On inserted.OrderID = deleted.OrderID
             Where inserted.AccountID != deleted.AccountID
             Or inserted.AccountID is not null and deleted.AccountID is null
             Or inserted.AccountID is null and deleted.AccountID is not null
Union All
             Select ‘ItemID’, isnull(inserted.ItemID, deleted.ItemID)
             From inserted
             Full outer join deleted
                         On inserted.OrderID = deleted.OrderID
             Where inserted.ItemID != deleted.ItemID
             Or inserted.ItemID is not null and deleted.ItemID is null
             Or inserted.ItemID is null and deleted.ItemID is not null
Union All
             Select ‘Qty, isnull(inserted.Qty, deleted.Qty)
             From inserted
             Full outer join deleted
                         On inserted.OrderID = deleted.OrderID
             Where inserted.Qty != deleted.Qty
             Or inserted.Qty is not null and deleted.Qty is null
             Or inserted.Qty is null and deleted.Qty is not null
Union All
             Select ‘StatusID’, isnull(inserted.StatusID, deleted.StatusID)
             From inserted
             Full outer join deleted
                         On inserted.OrderID = deleted.OrderID
             Where inserted.StatusID != deleted.StatusID
             Or inserted.StatusID is not null and deleted.StatusID is null
             Or inserted.StatusID is null and deleted.StatusID is not null
Union All
             Select ‘Price’, isnull(inserted.Price, deleted.Price)
             From inserted
             Full outer join deleted
                         On inserted.OrderID = deleted.OrderID
             Where inserted.Price != deleted.Price
             Or inserted.Price is not null and deleted.Price is null
             Or inserted.Price is null and deleted.Price is not null

Maintenance is a bit simpler, because the definition of the log table doesn’t change if the main table changes. Only the trigger needs to be updated in that case.

This type of log is a bit harder to query, since it’s not one row per update. This can be simplified by adding a TransactionID column to the log table, which should be data type UniqueIdentifier. Then add:

Declare @TID UniqueIdentifier
Select @TID = newid()

To the top of the trigger, and add that as a column in each of the insert statements. This makes it easier to determine which rows of the log table go together.

Another option is to use a single XML column in the log table for the data from the main table.

Create dbo.Orders (
LogDate datetime not null default(getdate()),
LogBy varchar(100) not null default(system_user),
LogXML XML)
Create trigger Orders_Log on dbo.Orders
After insert, update, delete
As
Insert into LogDatabase.dbo.Orders (LogXML)
Select *,
             (select *
             From deleted
             For XML RAW, TYPE, ROOT(‘Deleted’)
From inserted
For XML RAW, TYPE, ROOT(‘Inserted’)

Like the first example, this stores all columns and fills up the log tables quickly, but is relatively easy to query. Unlike either of the other examples, changes to the main table require no changes to either the log table nor the trigger.

There are other options, like picking only the columns you care about and including only those in the log table, or in the XML query, or adding more to the Where clause of any of the insert statements, so that only specific actions are logged or so that specific actions are not logged. Error handling can be added to any of these.

Another option is only including Update and Delete statements in the logging. In this case, it’s best to include the value from the deleted table in the log. This has the advantage that not every row in the database will end up in the log, which reduces the potential size of the log. But it’s important in this case to include before and after for updates and deletes. That will add some size to the log, because of the before data, but might also reduce it if many rows are added and then never changed. You’ll have to test to see which way works best for your needs.

The stored procedure versions of these should be relatively easy to extrapolate from the triggers, just use the Output statement, or the parameters, instead of directly querying the inserted and deleted tables. One example of this would be:

Create proc dbo.OrdersUpdatePrice
(@OrderID_in int,
@NewPrice_in money)
As
Update dbo.Orders
Set price = @NewPrice_in
Where ordered = @OrderID_in
Insert into LogDatabase.dbo.Orders (OrderID, Col, Val)
Select @OrderID_in, ‘Price’, cast(@NewPrice_in as varchar(100));

Another thing to check out is commercially available products that will create your log triggers for you. One is ApexSQL Audit (http://www.apexsql.com/sql_tools_audit.asp). Again, this will cost money, but it will save you the time of setting the thing up yourself.

Active Logging Summary

Active logging has some nice advantages in terms of queriability and control over what gets logged (signal vs noise). Reading log activity from a table can be much faster than parsing a log file. Also, it gives you the ability to build reports and code on top of it that can be available to users without having to go to the DBA every time.

It does, however, slow your database down, and use up potentially huge amounts of disk space. It also means you have to administer and maintain the database and tables that you do your logging in.

Overall Summary

In summary, there are lots of options for logging data in SQL Server. The best one depends on what you need to do with the audit logs.

Unless you really need them, I recommend against using log tables and triggers. Use the SQL Server log files and a log parser.

If you carefully consider all your needs, and decide that a log table is the best way to do it, make sure to take into account the options you have about how and what to log.

Rate

4.21 (19)

You rated this post out of 5. Change rating

Share

Share

Rate

4.21 (19)

You rated this post out of 5. Change rating