Logging with XML

  • When I've set up audit-trail type logging before, I've used triggers on update/insert/delete to copy the data from Inserted and Deleted to a log table. The log table would have the same structure as the table being logged (same columns, same data types), with constraints, indexes, FKs, etc., removed, and with some specific columns for the log, like "LogDate" and "LogBy".

    This, of course, means that changes to table structure, such as adding a column or increasing the width of a column, have to be reflected in the log copy of the table as well as the main table, and added/removed columns have to be accounted for in the triggers.

    What I'm thinking of trying, instead, is something like this:

    create table dbo.Hierarchy (

    HierarchyID int identity primary key,

    ParentID int null references dbo.Hierarchy(HierarchyID),

    ... other columns as needed

    );

    create table LogDatabase.dbo.HierarchyLog (

    LogData xml,

    LogDate datetime not null default(getdate()),

    LobBy varchar(100) not null default(system_user));

    create trigger dbo.Hierarchy_Log on dbo.Hierarchy

    after insert, update, delete

    as

    declare @Log xml

    select @Log =

    (select *,

    (select *

    from Deleted

    where deleted.HierarchyID = inserted.HierarchyID

    for xml raw ('From'), elements xsinil, type)

    from Inserted

    for xml raw ('To'), elements xsinil, type)

    insert into LogDatabase.dbo.HierarchyLog (LogData)

    select @Log

    I've tested this, and end up with very readable, very usable data in the log. Each separate update/insert/delete statement ends up with a single entry (which is a big advantage over the parallel table version), and it's easy enough to extract exact changes and such from it.

    Here's a sample of a log entry:

    (To xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance")

    (ID)6367(/ID)

    (ParentID)6363(/ParentID)

    (From xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance")

    (ID)6367(/ID)

    (ParentID xsi:nil="true" /)

    (/From)

    (/To)

    (To xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance")

    (ID)6366(/ID)

    (ParentID)6363(/ParentID)

    (From xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance")

    (ID)6366(/ID)

    (ParentID xsi:nil="true" /)

    (/From)

    (/To)

    (To xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance")

    (ID)6365(/ID)

    (ParentID)6363(/ParentID)

    (From xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance")

    (ID)6365(/ID)

    (ParentID xsi:nil="true" /)

    (/From)

    (/To)

    (I had to replace the angle-braces with parenthesis to get it to post on this forum.)

    It has the advantage that schema changes in the table being logged don't have to be made in multiple places (it's easy to forget to make them). It also has the advantage that it can be easily modified for any table.

    (Before anyone asks, yes, the log table is a heap with no primary key and no clustered index. No indexing at all. This is to make inserts as fast as possible. Pulling data out of these tables has been, in my experience, relatively rare, but inserts are constant. I have, in the past, used LogArchive tables to move the older log data out of the main Log tables, and then index the LogArchive where appropriate for various management reports.)

    My question is: Has anyone use this kind of logging solution, and, if so, what are the drawbacks to it? Also, the trigger, as set up, has an inline sub-query, which might hurt performance on bigger updates. I'm not as familiar with building XML as I probably should be, and would appreciate any help on making that part of the trigger more efficient.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Just a couple of suggestions... some in the form of questions...

    First, what is the purpose of an audit log? Think about it carefully... what is the ONLY purpose for an audit log?...

    ... to record changes to data in a table. That's all. So, if you add new data to a table using an insert, technically, you haven't changed anything... all of the new data is available in the original table and storing it all in the audit log would be a duplication of data.

    So, the first thing I'd do is to remove INSERTs from the audit process. Only record modifications (DELETE/UPDATE).

    Second question... do you really need to store a whole record in the audit log if you only change 1 column in say, a 30 column table? No... again, record only the changes to the original data. Store only which table changed, which column changed, the PK for the row that changed, the modified value (perhaps the original value to make reassembly easier), the date and person/system who made the change.

    I know... sounds like a lot of data... the real advantage is that you only need one audit log table, all tables share the same audit log table, and you can add/delete columns in the audited tables until the cows come home and there's no need to change anything about the audit except maybe what's in the audit trigger for that one table.

    Of course, storing whole rows makes it easier to trace an audit on any given row, but like you said... how often are you going to need to do that?

    So far as the XML stuff goes, I rarely use it so I'm not real qualified to judge anything you're done with it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks Jeff.

    On the use of Insert in the trigger, that gives me a date the data was first created, and by whom. I don't like having "CreatedDate" and "CreatedBy" columns in the primary table, because they, in my way of thinking, don't actually describe the entity being stored in the table. Having "Insert" in the logging proc makes the audit log the one-stop-shop for a complete history of the entity: Who created it, when; who modified it, what was changed, when; who deleted it, when? All in one place. That means a single query of the log table gives me a complete history of the entity.

    When managers need reports on who worked on which orders (for example), when certain changes were made on the orders, and what changes were made, I like to pull that from one table, instead of having to pull the original creation from one place, and the changes from another.

    I see your point of view on that one, but I'm probably too stuck in my ways to change to some other method of recording record creation.

    On the idea of only logging what was changed: Yeah, that might be better. I'll check into that. It might make certain reports more complex to query, but it will certainly save on disk space.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • I say just do a full backup after every UPDATE or DELETE. :w00t:

    ______________________________________________________________________

    Personal Motto: Why push the envelope when you can just open it?

    If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.

    Jason L. Selburg
  • Jason Selburg (3/20/2008)


    I say just do a full backup after every UPDATE or DELETE. :w00t:

    WOW!!!! I should have thought of that!!! That would be so much better!!!! :hehe:

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared (3/20/2008)


    Thanks Jeff.

    On the use of Insert in the trigger, that gives me a date the data was first created, and by whom. I don't like having "CreatedDate" and "CreatedBy" columns in the primary table, because they, in my way of thinking, don't actually describe the entity being stored in the table. Having "Insert" in the logging proc makes the audit log the one-stop-shop for a complete history of the entity: Who created it, when; who modified it, what was changed, when; who deleted it, when? All in one place. That means a single query of the log table gives me a complete history of the entity.

    When managers need reports on who worked on which orders (for example), when certain changes were made on the orders, and what changes were made, I like to pull that from one table, instead of having to pull the original creation from one place, and the changes from another.

    I see your point of view on that one, but I'm probably too stuck in my ways to change to some other method of recording record creation.

    On the idea of only logging what was changed: Yeah, that might be better. I'll check into that. It might make certain reports more complex to query, but it will certainly save on disk space.

    Heh... thought you said that reading the log was a rare thing... and if the procs are written to work correctly who cares that the data is split between the audit log and the source table?

    Don't forget... if you want to do full row auditing, then your database will be at least 2 times the original size you planned for. 500 gig db suddenly becomes a 1 or 2 terabyte db. Backups will take longer to make and restore, as well.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff:

    I've been thinking about the point of including Insert in the logging trigger.

    What would you do in a circumstance where you find out that some employee has been altering records (for whatever reason), which he was authorized to alter while employed, but the employee gets fired and management needs you to undo the changes he's made for the last X days/weeks/whatever?

    If you have logged that he updated the sales commission on order # 53-12345 to $200 for himself, how do you know what to revert it to if you don't have a trail that goes back to the creation of that row of data? Sure, if there was another update before that one, you'd be able to go back one step, but what if his change is the first change in that order? Do you open up a transaction log parser, pull the relevant backup file off of tape, and start looking for the insert data in there?

    That's just one situation where I'd want to have a record of "what was the data when it was first created", readily available.

    Sure, a log trigger won't defeat someone who has DBA access to the database, because such can get into the log database and update it (if they can't, they can grant themselves the permissions for it), and you have to go to the transaction log for that kind of deliberate sabotage by a DBA/SysAdmin. But to handle the "oops" updates, the "we had to fire him" updates, and the "we just changed our minds about the policy, and we're making it retroactive" updates, does pulling up transaction logs and going through those work for you on that kind of thing?

    I'm mainly asking because in my experiences with various log parsers (Lumigent, Apex and Red Gate), they are generally slow and more than a little convoluted to use, when compared to having an audit table that can be queried with simple T-SQL (and maybe XML query). Has your experience with those been more positive?

    Or do you have another (better) solution for those situations where rollbacks are needed on data?

    This is, of course, an aside from the reporting needs I have audit tables for.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Perhaps I should be more clear on how often this data is used.

    Today's updates are rarely needed for anything. The occassional, "I updated the wrong account, and now I can't remember which one I changed", kind of thing. Or the occassional manager who wants to know, "who changed all the Robertson Account orders to have a ship date for last month?" It's rare.

    Added to that, I dump the audit trail into an archive table on a separate server on a nightly basis. That means the data for today is not only rarely needed, but when it is, it's in a table that is, relatively speaking, small. Queries on it are slow, because of no primary key, no clustered index, no other indexes of any sort, but they kind of make up for that by only having a few thousand rows per audit table per day (it's a small business), less in some of the less active tables or on a slow day.

    I don't keep today's log in the same database as the main tables. No need to have them clutter up the backups, make the database expand like crazy, etc. Since I clean out that audit database nightly, it doesn't grow above a certain manageable size.

    The archive audit tables, on a separate server, grow relatively rapidly every day, of course. But that server is set up with a ton of disk space, and isn't likely to run out any time soon (and disks are getting larger and cheaper faster than the data is currently growing).

    That data, the archives, is used for a number of reports, and the reports from it are an ever-expanding list.

    Things like order entry-to-delivery time reports (which can track down to a very detailed level by knowing what was changed when and by whom), are very common on those tables. Those reports don't include the current day's data (which is understood in the business, and accepted).

    So, when I say "that data isn't accessed very often", I'm very specifically refering to the current day's audit tables, with no indexes, etc.

    (I appologize for the length of these posts. I tend horribly towards over-explanation and serious pedantry.)

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • "Don't forget... if you want to do full row auditing, then your database will be at least 2 times the original size you planned for. 500 gig db suddenly becomes a 1 or 2 terabyte db. Backups will take longer to make and restore, as well. " - Jeff

    Actually, I'm used to them ending up a LOT more than twice the size of the active tables. In one case, a 1.5 Gig database has a 2,000 Gig audit archive. The point about backups and restores is one of the reasons I do not keep those tables in the same database. And since the audit tables can, if necessary, be rebuilt from the transaction log of the main database, I don't even use a very paranoid backup scheme on the audit database.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • To solve the problem of both auditing the initial change AND make life a bit easier on the type of "fired" recovery that you're talking about, without the additional overhead of logging the entire initial insert, you would add a PreviousValue column to the audit table... it's a bit denormalized, but it's worth it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • That makes sense.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • I recall seeing an awfully nice auditing solution from Pop Rivett. Hang on, I'll go look...

    Yep, still there.

    See

    http://www.simple-talk.com/sql/database-administration/pop-rivetts-sql-server-faq-no.5-pop-on-the-audit-trail/[/url]

    for the article I was thinking about. Now it doesn't use XML, but does only store deltas, and If I recall correctly stores the before and after values in the same row. I have never used this in production, but I did try it on a test db and filed it away for possible future use. To fit Jeffs requirements you would have to create the trigger for updates and deletes only, but I did think it was quite elegant. One caveat, from the article.

    --note that for this system to work there must be a primary key

    --to the table [to audit] but then a table without a primary key

    --isn't really a table is it? 🙂

    HTH

    Dave J


    http://glossopian.co.uk/
    "I don't know what I don't know."

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply