• Jeff Moden (10/21/2012)


    Gosh, thanks for the extensive writeup. Unfortunately, the only way I've seen to get the table name through a CLR trigger is a nasty hack that uses a tablenameIDnaming convention on the primary key. That, of course, isn't very reliable because someone could change the ame of the PK at anytime and, unless you invoke a DDL trigger of somesort to enforce such a namig convention on new tables, there's also no guarantee that new tables will follow the correct convention.

    I dislike the naming convention approach just as much as you do. Besides, the databases that need to be audited are existing db's, so I will hardly be able to enforce such a convention. So this is a no-go for us.

    Equally as unfortunate, your writeup still makes me think that you're in for even more headaches and performance issues especially since you're considering auditing only the columns that actually changed. If you want to actually compare values between the INSERTED and DELETED tables, then a CLR is still going to be slow on larger data inserts/updates because your CLR will have to load the INSERTED and DELETED tables into memory first. Even if you only use the equivalent of "IF UPDATED(columnname) in your CLR (whatever that equivalent may be... I'm not a C# programmer), you still might not be OK because you have to read the trigger tables into memory before you can interrogate them in a CLR. Of course, that can all be avoided using the hard-coded triggers we spoke of earlier.

    I think you're missing a bit of information here: we can pass a modifier CommandBehavior.SequentialAccess to Command.ExecuteReader(), which makes that the reader returned will only keep the column in memory for as long as this is the current item being worked on. So we can read the data without having to load all results into memory. Instead the data is "streamed" through the CLR code: at any time only the value we're currently processing is in memory. There are a few restrictions on what you can and can't do with the data taking this approach, but using the SequentialAccess modifier we advance a "window" over the data that we're processing instead of reading it all then processing it all in one go. Also, the SequentialAccess mode ensures you CAN read large objects from the same stream. I do have to admit that I still need to verify whether both points also hold for a SqlCommand querying the Inserted and Deleted tables from a SqlTriggerContext, but I have little reason to believe they won't.

    You spoke of a sproc on each table (if I read that correctly) to initially generate and update the hard-coded triggers. I have been able to create a generic proc that will do that for you just by passing it the name of the table you want to create the trigger for. It uses the same method that you're talking about moving to... auditing only the changes and it does so in a "common" audit table. I've not done it myself because I don't audit but a handful of 8 critical tables but I believe you could write a DDL trigger that would detect any schema changes to the tables and auto-magically rebuild the audit triggers on whatever table changed.

    Sorry, I didn't explain that properly. We also have one generic procedure that generates the trigger(s) for any table. We're already successfully auditing over 50 tables with that, so we know how to do this. In fact we've even got multiple subscribers for the same audited data; each of the subscribers may be interested in another subset of tables or columns from the same tables.

    Actually, so of that's not quite right. I audit the changes and send them to a common audit "Staging" table that only has one index to prevent any page splits (all new audit rows go to the logical end of the table). That index, of course, is based on an IDENTITY column having nothing to do with the data on each row.

    We initially used this approach too, but we found that having a single table that all triggers write to causes a serious hot-spot, resulting a.o. in delays and even deadlocks between processes that without the auditing in place would never have had conflicts at all. It is even more complicated since our databases are replicated using merge replication. Merge replication -as you know- also uses triggers on the replicated tables which write into a shared set of tables. This combined with the fact that the order in which triggers are executed can't be influenced, makes that we can't predict nor influence whether our trigger or the replication's would fire first, resulting in unpredicatble locking patterns. We've found we could get around this by sending service broker messages instead of writing to a table.

    I also have a scheduled job that runs once a minute to transfer the audit data from the staging table to the final audit table. Since each row has the table and column name recorded in it as well as the type of action (Insert/Update/Delete) that caused the row to be audited, I would imagine that you could take such a thing further by using the info in the staging table to create your XML rather than trying to do it "in process" during a CLR trigger.

    Of course, that process also clears rows from the staging table once they've been moved. You could do the same thing once the rows have been converted to XML.

    Your scheduled job introduces an additional delay of at least 30 seconds on average for each change to arrive at the destination; we need the changes to arrive practically real-time at the destination, so I would like to avoid batch processes like these as much as possible.

    The bottom line is that I think you might actually make performance matters worse by trying to do any of this auditing with a CLR not because CLR is bad but because you end up needing to load the entirety of both trigger tables in order to do anything real with them. A hard-coded trigger using "IF UPDATE(columname)" will run much faster especially for "record changes only" types of triggers especially on wide tables.

    Our current T-SQL audit triggers generate the xml document in one single statement; it's a cross apply generating the elements for each row using a "union all select <name>, <value> where <conditions> to decide which columns to include in the xml. One of the conditions is indeed the update(columnname). The SqlTriggerContext object available to the CLR has an equivalent array of boolean values that can server the exact same purpose. As I explained, the SequentialAccess method doesn't require all data to be read at once and thus shouldn't give the memory pressure that you're (rightfully) afraid of.

    One other thing that you need to worry about whether you use a CLR or a hardcoded trigger. BLOB and certain other datatypes aren't auditable through a trigger. You can tell what action was taken but you can't actually directly audit the changes that occurred. You would need to audit such columns by making a trip to the real table and picking up the new value to store. There's no way that I'm aware of to audit the "before" condition of the data.

    As I explained above, I haven't checked this yet, but the CLR shouldn't have more problems with this than a T-SQL implementation. I do know we've already included nvarchar(max) in our T-SQL triggers and this works as expected. The only thing is that we had to do some pretty bad tricks to ensure all comparisons were performed case and accent sensitive. In c# all string comparisons are case and accent sensitive by default, so I think we may even have an advantage there for the CLR approach.

    Last but not least, you spoke of using Service Broker to move some of this data. Wouldn't it be easier to setup some form of replication of the audit data and change the data to XML there? I could certainly be incorrect but it seems like you might be using Service Broker as a "poor man's" replictation.

    The data in our databases is already replicated and MS replication doesn't allow the data to be re-replicated (other than choosing transactional replication for the one and merge for the other -which is a setup we also applied already and introduces a whole new world of issues :(- ). On top of that replication is very restrictive in the SQL server versions that you can replicate into/from. Our oltp systems are still at SQL server 2005 and there is little reason to upgrade to the much more expensive licensing scheme of 2008 or 2012. Yet many systems that it communicates with are at other versions: 2008 and even some 2012 systems are under construction (one of them is the previous project). Luckily we eliminated the last of our 6.5 systems little more than a year ago, but we will always have the requirement to link between versions. Service broker does not restrict us like replication does, so it it is perfect for our purpose.

    So, I hear your warnings, but I still think it's worth at least the experiment. The only issue I have is that I do not want to have to write/generate and then compile a separate CLR routine for each table. Because that eliminates a lot of the advantages the CLR gives us over the T-SQL approach. On the other hand, without a way to determine the table the CLR audit trigger was fired for, the entire setup doesn't have any chance of succeeding.



    Posting Data Etiquette - Jeff Moden[/url]
    Posting Performance Based Questions - Gail Shaw[/url]
    Hidden RBAR - Jeff Moden[/url]
    Cross Tabs and Pivots - Jeff Moden[/url]
    Catch-all queries - Gail Shaw[/url]


    If you don't have time to do it right, when will you have time to do it over?