Logging Tables

  • Comments posted to this topic are about the item Logging Tables

  • I have a Logging table which I use for all Page (website), Stored Procedure calls, and as a general log for unexpected errors.

    I also have a SessionLog table. All new website sessions call an SProc to get a Session ID, and that Session ID is passed to every SProc that is then called (and any child SProc ...). All Batch tasks / SQL Agent Jobs etc. can also obtain a SessionID for the same purpose.

    I have a User Logon Log which records when a user logs on, and their Session ID. Thus within a session we start with no logged on user, that user may then logon, logoff, or log on as a different user (within that session).

    The Session and User Logon log tables are pretty trivial, so I've left them out.

    Here are details of the Page / SProc / Error logging. I have massively edited these scripts to remove all the SET NOCOUNT ON type stuff, all error handling, and anything proprietary so apologies if, in the process, I have chopped out something important or a Find&Replace has gone haywire.

    My SProcs do the following:

    On the way in it logs the parameters, and on the way out that log record is updated with the current time, and any Result Code (and optionally any error message of "additional values of interest")

    The log record is created with a pseudo error no. so any rows in the log table with that value are either "still running" or "terminated without completing"

    Every SProc we write starts with the Session ID parameter. We reserve the number "123" for unknown, but in practice we have almost zero SProcs which are not run in a known-Session-ID

    CREATE PROCEDURE dbo.MySProc

    @SessionIDint

    , @Param1varchar(100)

    , @Param2varchar(100)

    , @intParam3int

    AS

    DECLARE@MyLogIDint,

    @strParametersvarchar(8000)

    SELECT@strParameters = LEFT(

    '@Param1=' + COALESCE(''''+@Param1+'''', 'NULL')

    + ',@Param2=' + COALESCE(''''+@Param2+'''', 'NULL')

    + ',@Param3=' + COALESCE(CONVERT(varchar(20), @intParam3), 'NULL')

    , 8000)

    EXEC @MyLogID = dbo.SPLogInsert @SessionID, 'MySProc', @strParameters

    EXEC dbo.SPLogUpdate @MyLogID, @intRetVal, @strErrMsg, @intRowCount, @intErrNo

    RETURN @intRetVal

    We originally thought the horsepower to log every single SProc execution was a wild fantasy, since then we have built eCommerce websites with thousands of concurrent users and experimented removing / disabling the logging and, surprisingly to us at the time, found that it made very little difference.

    We do attempt to not change the row length on the Update (unless there is an error, which then needs to log an error message, but the general idea is that SProcs should not exit in an error state!)

    The @strParameters could be constructed in any format. We used to just separate parameters (having converted them all to Varchar) with a "~" to a) keep the string short and b) reduce the amount of concatenations, but a fellow forum user said that he included the parameter name to make it easy to Cut & Paste to trial-run the sproc. It had always annoyed me that the logged ABC~123~XYC~789 list could be from a parameter sequence of an earlier version of the SProc so was always "tedious" to convert to suitable EXEC syntax and a while ago I changed style to include the "@ParamName=", as you see in this example, which is WAY WAY better when doing a Test Exec and is immune to the sequence of parameters changing etc. Some bytes are wasted in the process 😎

    Here is the Log Create, as used from an SProc

    CREATE PROCEDURE dbo.SPLogInsert

    @SessionIDint-- Session ID (in SessionLog table)

    , @Namevarchar(255) -- e.g. "MySProc"

    , @paramsvarchar(8000)

    -- Optional additional data

    , @Special1varchar(8000) = NULL

    , @Special2varchar(8000) = NULL

    , @Special3varchar(8000) = NULL

    , @Special4varchar(8000) = NULL

    , @Special5varchar(8000) = NULL

    , @blnTransNoLog bit = 1-- 0=Always log, 1=Only log if NOT in a transaction

    , @blnBusyNoLogbit = 1-- 0=Always log, 1=Only log if NOT Busy

    -- Re-create Sproc and Set to 1 during Busy Periods

    , @blnIsBusybit = 0-- Do NOT pass this parameter, it is for DBA Control only.

    -- You can pass as 0 to FORCE logging - even when in Server Busy state

    AS

    DECLARE@LogIDint,

    @dtNowdatetime

    SELECT@dtNow = GetDate()

    if @SessionID = 0 OR @SessionID IS NULL SET @SessionID = 1-- "not known"

    IF (@blnTransNoLog = 0 OR @@TRANCOUNT <= 2) AND (@blnBusyNoLog = 0 OR @blnIsBusy = 0)

    BEGIN

    INSERT INTO dbo.MyLogTable

    (

    EditNo, CreateDate, UpdateDate,

    --LogID,

    SessionID, Name, Params,

    Type,

    ErrNo, ErrMsg, RowCount, ErrNo2,

    Special1, Special2, Special3, Special4, Special5

    )

    SELECT

    EditNo= 1,

    CreateDate= @dtNow,

    UpdateDate= @dtNow,

    --LogID= 99999999999,-- [Identity] [PK]

    SessionID= @SessionID,

    Name= @Name,-- Actually this is an ID from a lookup table

    Params= @params,--

    Type= 2,-- Type 1=Page, 2=SP, 3=Error

    ErrNo= 999,

    ErrMsg= NULL,

    RowCount= -999,-- Pre-populate (to reduce chance of row-expansion on Update)

    ErrNo2= -999,-- Pre-populate

    Special1= @Special1,

    Special2= @Special2,

    Special3= @Special3,

    Special4= @Special4,

    Special5= @Special5,

    SELECT@LogID = scope_identity()

    END

    ELSE

    BEGIN

    SELECT@LogID = -1-- Indicate that logging is supressed

    END

    RETURN @LogID

    We did build in a flag to indicate "Only log me if server not very busy" and we set that flag in SProcs that are called very frequently. In practice the only times we have used that is where we have had a blocking issue caused by something unrelated - for example we had a frequently used Function which was blocking due to it being recompiled every time it was called, because Stats were stale, and not having Stats Rebuild set to ASync. At the time we thought the problem was with our logging so used the flags etc. to reduce logging volume, but as I said earlier we have actually found the impact is surprisingly modest.

    Then we come to the Log Update. This basically just freshens the Current Date/Time and an stores the Error No (0=No error) outcome of the task/SProc. There are various other columns - for example I a have a column for @@ROWCOUNT; many of our SProcs are multiple-step so Row Count is not really applicable; an example where I find it useful is a web SProc that does a Product Search. The Parameters include the search term(s) the user was looking for, but the Row Count will show how many hits there were (for THAT user at THAT moment in time), which has been useful. Also knowing whether an important SELECT/UPDATE etc. within the SProc did 0 rows, 1 row or many rows I find useful. Anyway, its optional, and not all Log Updates include that column.

    There are also "special" columns where I want to log "anything else"

    CREATE PROCEDURE dbo.SPLogUpdate

    @LogIDint-- ID from the Create

    , @ErrNoint

    , @ErrMsgvarchar(8000)

    -- Remaining fields are only stored if NOT NULL

    , @RowCountint = NULL

    , @ErrNo2int = NULL

    , @Special1varchar(8000) = NULL

    , @Special2varchar(8000) = NULL

    , @Special3varchar(8000) = NULL

    , @Special4varchar(8000) = NULL

    , @Special5varchar(8000) = NULL

    , @paramsvarchar(8000) = NULL

    AS

    DECLARE@dtNowdatetime

    SELECT@dtNow = GetDate()

    @intRetVal = 0

    SELECT@ErrMsg = NullIf(@ErrMsg, ''),

    @Special1 = NullIf(@Special1, ''),

    @Special2 = NullIf(@Special2, ''),

    @Special3 = NullIf(@Special3, ''),

    @Special4 = NullIf(@Special4, ''),

    @Special5 = NullIf(@Special5, ''),

    @params = NullIf(@Params, '')

    IF @LogID <> -1

    BEGIN

    UPDATEyPPL

    SET

    zEditNo = COALESCE(zEditNo, 0) + 1,

    zUpDt = @dtNow,-- Completion time

    ErrNo = @ErrNo,

    ErrMsg = @ErrMsg,

    Special1 = COALESCE(@Special1, Special1),

    Special2 = COALESCE(@Special2, Special2),

    Special3 = COALESCE(@Special3, Special3),

    Special4 = COALESCE(@Special4, Special4),

    Special5 = COALESCE(@Special5, Special5),

    RowCount = COALESCE(@RowCount, RowCount),

    ErrNo2 = COALESCE(@ErrNo2, ErrNo2),

    Params = COALESCE(@Params, Params)

    FROMdbo.MyLogTable

    WHERELogID = @LogID

    SELECT@intRetVal = CASE WHEN @@ROWCOUNT = 1 THEN 0 ELSE -2 END

    END

    ELSE

    BEGIN

    SELECT@intRetVal = -1-- No update was made

    END

    RETURN @intRetVal-- Return error number, 0=No error

    Next is the general error log SProc. I use this in the code where either something should not have happened (this is is different to an SProc exiting with an error return code, that is recorded by just using the Update Log SProc), or to log some Debugging Data that I need in, say, a production environment to help me diagnose and fix a problem.

    An example usage would be where some parameter / SProc / etc. has been deprecated. We then log any usage of that and can then report on how often it is happening and, from the overall log, given the Session ID, we can work out where abouts in the code-flow it was called, and then track down the culprit 🙂 also to report on how the number of calls reduces over time - hopefully! - as code is refactored)

    CREATE PROCEDURE dbo.SPLogError

    @SessionIDint-- Session ID

    , @Namevarchar(255)

    , @intErrNoint

    , @strErrMsgvarchar(8000)

    -- Optional parameters

    , @strSpecial1varchar(8000) = NULL -- Other data

    , @strSpecial2varchar(8000) = NULL

    , @strSpecial3varchar(8000) = NULL

    , @strSpecial4varchar(8000) = NULL

    , @strSpecial5varchar(8000) = NULL

    AS

    DECLARE@LogIDint,

    @dtNowdatetime

    SELECT@dtNow = GetDate()

    INSERT INTO dbo.MyLogTable

    (

    zEditNo, CreateDate, UpdateDate,

    --LogID,

    SessionID, Name, Params,

    Type,

    ErrNo, ErrMsg, RowCount, ErrNo2,

    Special1, Special2, Special3, Special4, Special5

    )

    SELECT

    zEditNo= 1,

    CreateDate= @dtNow,

    UpdateDate= @dtNow,

    --LogID= 99999999999,-- [Identity] [PK]

    SessionID= @SessionID,

    Name= @Name,-- Actually the Name's ID

    Params= NULL,

    Type= 3,-- Type 1=Page, 2=SP, 3=Error

    ErrNo= @intErrNo,

    ErrMsg= @strErrMsg,

    RowCount= NULL,

    ErrNo2= NULL,

    Special1= @strSpecial1,

    Special2= @strSpecial2,

    Special3= @strSpecial3,

    Special4= @strSpecial4,

    Special5= @strSpecial5

    SELECT@LogID = scope_identity()

    RETURN @LogID

    Here's the DDL: Not very exciting ...

    CREATE TABLE dbo.MyLogTable

    (

    EditNo int NOT NULL CONSTRAINT DEFAULT (1),

    CreateDate datetime NOT NULL CONSTRAINT DEFAULT (getdate()),

    UpdateDate datetime NOT NULL CONSTRAINT DEFAULT (getdate()),

    LogID int IDENTITY(10000,1) NOT NULL,

    SessionID int NOT NULL,

    Name varchar(255) NOT NULL,-- Actually we use an int ID for this

    Params varchar(8000) NULL,

    Type tinyint NOT NULL,

    ErrNo int NOT NULL CONSTRAINT DEFAULT (0),

    ErrMsg varchar(8000) NULL,

    RowCount int NULL,

    ErrNo2 int NULL,

    Special1 varchar(8000) NULL,

    Special2 varchar(8000) NULL,

    Special3 varchar(8000) NULL,

    Special4 varchar(8000) NULL,

    Special5 varchar(8000) NULL,

    CONSTRAINT PK_MyLogTable PRIMARY KEY CLUSTERED

    (

    LogID ASC

    )

    ) ON PRIMARY

    GO

    CREATE NONCLUSTERED INDEX IX_MyLogTable_SessionID ON dbo.MyLogTable

    (

    SessionID ASC

    )

    GO

    CREATE NONCLUSTERED INDEX IX_MyLogTable_CreateDate ON dbo.MyLogTable

    (

    CreateDate ASC,

    ErrNo ASC

    )

    Deleting the log entries is a bit of a challenge. The log table is BY FAR the biggest table in our applications - Client Data pales into insignificance by comparison! We can usually retain 7 days, but in very busy periods we reduce that the 3 days in order not to strain disk storage and backup duration

    This has been in existence for a long time ... hopefully I won't, now, find that I should have been doing it better / more efficiently over all these years 🙂 but I would welcome any comments and feedback

  • Logging might get complicated during the execution a multi-statement in a stored procedure when the various steps are within a transaction.

    Has anyone considered the possibility of writing to a file log with the basic "columns" like date and time, stored proc name, parameters, error/status code. The logging will take place regardless of the use of a transaction.

  • j-1064772 (9/25/2015)


    Logging might get complicated during the execution a multi-statement in a stored procedure when the various steps are within a transaction.

    Has anyone considered the possibility of writing to a file log with the basic "columns" like date and time, stored proc name, parameters, error/status code. The logging will take place regardless of the use of a transaction.

    I make extensive use of TRY...CATCH constructs wrapped around explicit transactions and call an error logging sp in the catch block.

    ____________
    Just my $0.02 from over here in the cheap seats of the peanut gallery - please adjust for inflation and/or your local currency.

  • I have implemented some very simple logging to just have a record of changes and internet activity.

    (I have "inherited" a habit of prefixing field names with an abbreviation of the table name. This is helpful in joins etc. Thanks Kurt! )

    For a record of inserts and changes to data I use this table:

    CREATE TABLE dbo.UpdateLog(

    UplId int IDENTITY(1,1) NOT NULL,

    UplStaffId int NULL,

    UplWindowsId varchar(255) NULL,

    UplDateTime datetime NULL,

    UplTableName varchar(50) NULL,

    UplRowIdField varchar(50) NULL,

    UplRowIdValue varchar(50) NULL,

    UplRowAction varchar(50) NULL,

    UplIdentFields varchar(50) NULL,

    UplAllFields varchar(max) NULL

    ) ON PRIMARY

    This is basically just to identify who did what to what row in what table in case we need to track down changes, motivated by HIPAA.

    To keep a record of errors and changes submitted through web pages from our remote locations I created this internet log table:

    CREATE TABLE dbo.InternetLog(

    IlogId int IDENTITY(1,1) NOT NULL,

    ILogRelatedId int NULL,

    ILogDateTime datetime NULL,

    ILogEmail varchar(255) NULL,

    IlogStatus int NULL,

    ILogType int NULL,

    ILogText varchar(max) NULL,

    CONSTRAINT PK_InternetLog PRIMARY KEY CLUSTERED

    (

    IlogId ASC

    )

    ) ON PRIMARY

    ILogEmail is used to identify who made the change, and different statuses are defined for different operations. Related ID is different based on different statuses...

    For our online employment application I decided to dump everything they submitted to a log first thing, so if there are any problems processing the data I can at least reproduce what they tried to submit. This has been useful on a few occasions...

    CREATE TABLE dbo.ApplicantActivityLog(

    ApLogId int IDENTITY(1,1) NOT NULL,

    ApLogApplicationId int NULL,

    ApLogDateTime datetime NULL,

    ApLogURL varchar(255) NULL,

    ApLogApplicationDateTime datetime NULL,

    ApLogPostDateTime datetime NULL,

    ApLogText varchar(max) NULL,

    CONSTRAINT PK_ApplicantActivityLog PRIMARY KEY CLUSTERED

    (

    ApLogId ASC

    )

    ) ON PRIMARY

    The ApLogText field is just all the web fields from the page concatenated with a "|" delimiter.

    I would be interested in any feedback.

  • I make extensive use of TRY...CATCH constructs wrapped around explicit transactions and call an error logging sp in the catch block.

    Yes, this is what I meant by "complicated". Some programmers here want to avoid the bullet-proof approach (extensive error checking and handling) as they prefer the stored proc to fail on bad data in order to address the problem at the source - the bad data - instead of using assumptions on how to proceed further with questionable data.

  • For me it is very much dependent on what gets logged and how frequent log events happen.

    1) Which details are relevant and are known at logging time and which can be derived or filled in sometime later during an enrichment process (IP to location lookups for example)?

    2) How fast will the table grow and will it also be used to do real-time searches in, and how frequent?

    Generally speaking logging should take as little time and space as possible. It should also be guarded from the sort of complexity that can cause issues under I/O pressure. Additionally there should not be one big log table. It is better to have a manageable active table or partition and have older unchanging data reside in their own.

    The active table should only have the clustered index and have auto update statistics disabled. When data in some columns can be encoded by using lookup tables, I will do so. Some straightforward reads will not slow SQL Server down, but bigger writes and additional indexes certainly will! I put all the logging code in one stored procedure for maintenance.

    A feature that I recently found missing in SQL Server with respect to logging tables are unique indexes that are correlating to the clustered index and as such would require zero additional space, like in:

    ID int Identity

    A1 int

    A2 int

    A3 int

    D1 ...

    D2 ...

    Where A1..A3 form a natural composite key, like datetime parts as an example. This key is, like the identity column, order sensitive when it comes to logging. Having a feature that exploits this correlation can also make for more compact additional indexes whenever searching is required (like in history tables).

    In the above example and the current situation: if I put the clustered PK on "ID" the consequence will be that any search on the natural key results in a table scan or requires a separate index on the natural key as well...often doubling the storage and I/O requirements.

    Putting the PK on "A1..A3" gives the same problem for searching on "ID" and/or makes other additional indexes even more wasteful as the natural key is three times as wide as the identity key.

    My ideal solution would be to designate one clustered index ordered by "ID" (the smallest unique sub-key) and "A1..A3" as correlated (possibly with an auto resolve option for "ID" in edge cases).

    Currently I see no other option then to make the natural key the clustered index as this allows for searches and skip additional indexes even where they really would help searches. As an alternative I make sure the first columns of the natural key specify a date-time. This helps as I can constrain searches to specific dates and eliminate full table scans that way.

    This is the definition for the main log table (that contains recent, possibly still non-enriched data):

    create table dbo.RawAuthLog

    (

    Day2k smallint not null

    , DayOffset smallint not null

    , DayOffsetOrder int not null

    , IdAccount int not null

    , IdAuthEvent tinyint not null

    , IdAuthMethod tinyint not null

    , IdAuthResult tinyint not null

    , IdISP int null

    , IP varbinary(16) null

    , IP4 as ( cast ( case when datalength( IP ) = 4 then IP else null end as int ) )

    , IP6 as ( case when datalength( IP ) = 16 then IP else null end )

    , constraint pk_dbo__RawAuthLog primary key clustered( Day2k, DayOffset, DayOffsetOrder )

    , constraint fk_dbo__RawAuthlog_IdAccount___dbo_Accounts foreign key( IdAccount ) references dbo.Accounts ( Id ) on delete cascade

    , constraint fk_dbo__RawAuthLog_IdAuthEvent___dbo_AuthEvent foreign key( IdAuthEvent ) references dbo.AuthEvent ( IdAuthEvent )

    , constraint fk_dbo__RawAuthLog_IdAuthMethod___dbo_AuthMethod foreign key( IdAuthMethod ) references dbo.AuthMethod( IdAuthMethod )

    , constraint fk_dbo__RawAuthLog_IdAuthResult___dbo_AuthResult foreign key( IdAuthResult ) references dbo.AuthResult( IdAuthResult )

    , constraint fk_dbo__RawAuthLog_IdISP___dbo_ISP foreign key( IdISP ) references dbo.ISP ( IdISP )

    , constraint ck_dbo__RawAuthLog_Day2k check ( Day2k >= 0 and Day2k < 29012 ) -- Works till {ts '2079-06-06 23:59:00' }

    , constraint ck_dbo__RawAuthLog_DayOffset check ( DayOffset >= 0 and DayOffset < 1440 )

    , constraint ck_dbo__RawAuthLog_DayOffsetOrder check ( DayOffsetOrder >= 0 and DayOffsetOrder < 60000000 )

    , constraint ck_dbo__RawAuthLog_IP check ( IP is null or datalength( IP ) in ( 4, 16 ) ) -- 0 / 6 / 18 bytes storage required

    )

    ;

    go

    Please note that "DayOffsetOrder" is microseconds within a minute. If there happens to occur a collision at this level I add one microsecond to resolve it (a few times if necessary). This is done in a stored procedure that handles the logging and also maintains the lookup tables.

    The interface for this procedure is:

    create procedure dbo.AuthLog

    (

    @Day2k smallint = null output

    , @DayOffset smallint = null output

    , @DayOffsetOrder int = null output

    , @AuthEvent varchar(64) = null

    , @AuthMethod varchar(20) = null

    , @AuthResult varchar(30) = null

    , @IdAuthEvent tinyint = null output

    , @IdAuthMethod tinyint = null output

    , @IdAuthResult tinyint = null output

    , @IP4 varchar(15) = null

    , @IdAccount int = null

    )

    as

    Also note that the procedure handles translating logical names to various ID's in lookup tables to keep records compact and calling code readable. Another nice twist is the use of varbinary(16) as the last nullable column in the table.

    If no IP number is known, there is no space used (except for a bit in the null bitmap. If it is your typical IP number like x.x.x.x then then I store it as 4 bytes and in combination with the length it will consume 6 bytes. For IPv6 the full 16 bytes are used and with the length that totals 18 bytes storage.

    Computed columns handle the translation to more typical formats and a view (not shown here) makes it even readable in x.x.x.x format. The storage however remains compact, flexible and future proof.

    The history table (not shown here) condenses data even more by exploiting redundancies over a day without compromising readability, practical meaning or search-ability of the data.

  • In general for logging you want to store Who, What and When. The exact details of what needs to be included will depend on why you are logging.

    Who may be any or all of login to SQL, login to application which connects to SQL using a service account, SQL job name, SSIS/DTS package name, Application name - it may also need to refer to Where i.e. Web URL, PC Name/IP Address, Server name, Environment.

    When is the easy bit but may want start and end times for some operations

    What is heavily dependent upon the purpose of the logging but should store details of what was done.

    The other important thing about logs is that they are intended to be primarily about appending new records. Querying the log should be the exception so any indexing for those queries can be left until it is required so it doesn't impact the performance of the writes. I would be perfectly happy with log files being heaps and you may wish for any queries to be made against a read only copy

  • I do simple logging and it's currently just for Return on Investment calculations. I just log when someone runs something with the date and time so I can say "see look they really do use this a lot" and I think it's saving the company $X amount of dollars this year. That's why it was "good" that I spent time on it.

    As for logging errors I currently don't do that. I trap the errors but then display good info. Our users are taught to come get one of us so we can resolve it right then and there. Since we have less than 200 active users it works out well. We do put a ton of effort into requiring good data entry so we really don't get many errors (maybe one every 2 months).

  • This post has great timing. Just yesterday I sent my manager an analysis of a log table that was created a year ago that is massive. I'm hoping I can get permission to blog about it. In the mean time, I will give some things not to do.

    Some context. This one table took 78% of the rows inserted into the whole database and 98% of the storage.

    The type of log was a data change log. So a record is inserted for every I/U/D to tables.

    Here is what not to do.

    1. Don't use the table for every kind of change. One change being captured was a bit flag being turned off and on all day long for different users. That should have been in a separate type of log file. That particular sample would have gone from 3000 bytes per row down to 17 bytes per row.

    2. Don't use nvarchar when varchar will do. We have no unicode in our system, so nvarhcar is not needed.

    3. Don't use nvarchar(max) or varchar(max) if it's not needed. We only had 3 out of 58 tables that actually needed it. I would put them in a different log table.

    4. We are using XML to store the old data and the new data in two columns. For our new logging table, I'm going to have them change to JSON. (SQL Server 2016 will have some JSON support.) I converted a 228 character sample XML document to JSON. I saved 20% in space. This will very depending on XML Element name usage.

    5. Don't save the old AND new data. Save the new data and a separate tinyint column that specifies the operation. You can use LAG to find what the previous value if you need to compare.

    6. Don't save for eternity. Find out the retention policy and start archiving unneeded data as it becomes stale.

    7. Don't keep the audit log in your transaction database. Make a separate database for your logs. This will keep the backup and restore times shorter. It will also keep your transaction database backups smaller. See my note at the beginning as to the amount of storage our audit log is taking. Yes, you'll still need to backup/restore the logging database, but the transaction database is so much more important.

    Mickey Stuewe
    Sr Database Developer
    My blog
    Follow me on twitter: @SQLMickey
    Connect with me on LinkedIn
    --------------------------------------------------------------------------
    I laugh loudly and I laugh often. Just ask anyone who knows me.

  • Script Run / Create SProc, Trigger, View etc. Logging

    I log every "run" of scripts that Create an SProc, Trigger, View etc. The call to the Script Logging SProc, at top of each script file, looks like this:

    --

    PRINT 'Create procedure MySProc'

    GO

    EXEC dbo.SPLogScriptRun 'MySProc' ,'150925'

    GO

    ALTER PROCEDURE dbo.MySProc

    ...

    Basically I just used todays date '150925' as the "version number", although I suffix "del" if the script is, now, dropping the SProc as being redundant, or "xxx" if it is a client-specific variation etc.

    SPLogScriptRun warns if things like ansi_warnings are not set as expected - to prevent the SProc being created with the wrong base settings. I do not know which ones are important so I incorporated a check for all the ones I could find 😉

    The name of the Server is also stored, so if the database gets moved from server to server we know on which server that version of that object was originally created.

    The comma before the version number is careful positioned to make it easy to highlight up to and including the first parameter only and that execution (e.g. EXEC dbo.SPLogScriptRun 'MySProc') will display a history of run-dates and version numbers. When we deploy from DEV to TEST and TEST to Production we compare the Script Run Logging tables to check that the version numbers for all objects are the same, and that the downstream database object was created on a later date/time than the upstream one.

    The report complains if the Modify Date on the object (in sys.objects) is earlier than the log entry - i.e. the Log entry was created but the CREATE / ALTER failed.

    There is a reporting option to show the Run History of the SProc (i.e. matching on the @Name parameter). This uses the Procedure Call Log (described in an earlier post) and shows the run dates/times for the SProc

    The DDL is:

    CREATE TABLE dbo.ScriptRunLog

    (

    [CreateDt] [datetime] NOT NULL,

    [CreateUser] [varchar](4) NULL,

    [ScriptLogID] [int] IDENTITY(10000,1) NOT NULL,

    [Name] [varchar](100) NOT NULL,

    [Version] [varchar](10) NOT NULL,

    [Server] [varchar](30) NOT NULL,

    CONSTRAINT [PK_ScriptRunLog] PRIMARY KEY CLUSTERED

    (

    [ScriptLogID] ASC

    )

    ) ON [PRIMARY]

  • Log / Audit Data Changes

    We log changes to data in an "Audit" table. It has exactly the same columns as the main table, but with two additional columns for Audit Type, (U)date or (D)elete, and the Audit date.

    To save some space we don't store the current record (its in the main table, natch!), we only store the "old" version when a record is updated or deleted.

    We store the "old" data using an Update / Delete trigger.

    Our tables have names like XX_YYY_ZZZ_TableName and for the Audit table we add an "a" after the start - so XXa_YYY_ZZZ_TableName. We prefix all the column names in the audit table with "A_" to differentiate them from the main table.

    CREATE TABLE dbo.XXa_YYY_ZZZ_TableName

    (

    A_yyy_zzz_AuditType char(1) NOT NULL,

    A_yyy_zzz_AuditDt datetime NOT NULL,

    A_yyy_zzz_ID int NOT NULL,

    A_yyy_zzz_...

    ) ON [PRIMARY]

    GO

    CREATE CLUSTERED INDEX IXPK_XXa_YYY_ZZZ_TableName ON dbo.XXa_YYY_ZZZ_TableName

    (

    A_yyy_zzz_ID,

    A_yyy_zzz_AuditDt,

    A_yyy_zzz_zEditNo

    )

    GO

    and the trigger on the underlying table:

    CREATE TRIGGER dbo.XX_TR_YYY_ZZZ

    ON dbo.XX_YYY_ZZZ_TableName

    AFTER /* INSERT, */ UPDATE, DELETE

    AS

    DECLARE@dtNowdatetime,

    @strUservarchar(4)

    SELECT@dtNow = GetDate(),

    @strUser = left(user_name(), 4)

    ... any other trigger actions on the table ...

    -- AUDIT:Store any previous version (including where record is being deleted)

    INSERT dbo.XXa_YYY_ZZZ_TableName

    SELECTCASE WHEN I.yyy_zzz_ID IS NULL THEN 'D' ELSE 'U' END AS [AuditType],

    @dtNow AS [AuditDt],

    D.*-- Deliberate "*" - will break if columns added to parent table

    FROMdeleted AS D

    LEFT OUTER JOIN inserted AS I

    ON I.yyy_zzz_ID = D.yyy_zzz_ID

    We have a purge routine to delete "stale" data. Retention time varies from table to table, but let's say it is 3 months for this table. We delete all rows older than 3 months except that we keep the most recent (older than 3 months). Thus a user can see all changes in the last 3 months and one previous record, so they are able to say "Prior to the last three months it was not changed for X days/months/years".

  • peter-757102 (9/25/2015)


    Additionally there should not be one big log table. It is better to have a manageable active table or partition and have older unchanging data reside in their own.

    We have "one big log table". The delete operation, each day, deleting old data (old is usually "more than 7 days old") is quite significant. I hadn't thought of partitioning it, maybe I could create a new partition each day and then just throw away the 7-day-old partition to remove stale data?

    I'm not sure that segregating my Error Log into multiple, descrete, tables would make a difference though - I'd still have the same total size of log data??

    One advantage of the one big log file is that everything is in chronological order, so I can list for a given SessionID and see everything that user was doing - Web pages views, SProcs called, Errors logged ... all in chronological order.

    The active table should only have the clustered index

    I have tried to do this, and failed 🙁 When we report on the log (which is not often) we need to report either by SessionID or by Date. NOT having indexes for those makes the querying very slow ... so we do have those non clustered indexes even though I know it is slowing the INSERT process somewhat, and costing us bytes, and reporting is relatively rare. This is one of the rare cases when we use NOLOCK in the reporting.

    Perhaps there is another way to facilitate reporting (in real time - i.e. when we need to go looking for the cause of a problem) without having to have to overhead of the NC indexes?

    When data in some columns can be encoded by using lookup tables, I will do so. Some straightforward reads will not slow SQL Server down

    Our approach is

    SELECT @intLoop = 1

    WHILE @intLoop <= 2

    BEGIN

    INSERT INTO LogTable

    SELECT @LogValue1, @LogValue2, NameLookupID, ...

    FROM NameLookupTable

    WHERE Name = @Name

    SELECT @intRowCount = @@ROWCOUNT

    IF @intRowCount = 1

    BEGIN

    SELECT @intLoop = 999 -- Loop completed, exit now.

    END

    BEGIN

    INSERT INTO NameLookupTable(name)

    SELECT @Name

    SELECT @intLoop = @intLoop + 1

    END

    END

    IF @intLoop = 3 PerformErrorHandling

    on the grounds that the Name Lookup should succeed 99% of the time, and speed of insert is key, so I don't do a separate Name Lookup to get the ID and then optional ADD (if not found) and THEN the INSERT but rather attempt the INSERT with the Name Lookup in the same statement, and only if that fails do the ADD and then repeat the INSERT-with-lookup

    I don't know if that is the best way though?

  • @Kristen-173977

    When building a log table there is nearly always a chronological order that can be exploited. Create the clustered index such that date and time (or sequence number) are the first columns of your index and that they are in sufficient resolution to be unique.

    If we need to filter on a session ID we can add a date filter to the where clause. There is usually no good reason to scan the whole table or seek in a specific sesionID index which results in subsequent bookmark look-ups. This is because by very nature sessions span a certain date/time range and with those columns being the first in the clustered index, there is already a proper index to exploit. Create a separate session table if you have to quickly determine the relevant date range of interest. Essentially making your own "index" with the freedom to add session specific meta data that does not burden the log table down.

    It certainly helps to make the session ID an ever increasing integer as well and never have the same ID reused, as opposed to say generating a new UUID per session, which could result in duplicate UUIDs.

    When filtering on accounts, it also makes sense to filter over a certain period for the same reason.

    Actually do it for all queries and use periods like "today", "yesterday", "this week", "last week", etc. (as usual in modern reporting).

    An unfocused query to retrieve what an account has done over all time is hardly useful in the context of reporting, especially not for logs.

    As for partitioning, how often data should be moved totally depends on how fast the data volume builds up and how soon the data can be enriched to reach a final state that will never require modification anymore. I would at least keep the current day and maybe the day before in the "active" table. The rest is candidate for moving to another table. I myself use a view that combines both tables in a union all as were it one singular table and query against that for small reports such as recent activity and failed login attempts.

    This method keeps the table frequently hit by inserts, quite small. This also has a benefit for reporting where a simple seek for data will never result in much reads or prolonged locking. Better not use "with (nolock)", as that can and will give incorrect results when you do not expect it.

    I mentioned disabling auto updates of the statistics and actually also meant that auto create of new statistics should not happen either. This is important as those operations take time at unexpected moments and actually generate outdated/misleading information in case of log tables. Better is to do it manually when after having moved a day of data to the final log table (which can become large, partitioned or otherwise). For good measure, start with a freshly created table with these options set correctly before running any query against it.

  • peter-757102 (9/25/2015)


    When building a log table there is nearly always a chronological order that can be exploited.

    Yup, agree with that, never spotted how I can exploit it though

    Create the clustered index such that date and time (or sequence number) are the first columns of your index and that they are in sufficient resolution to be unique.

    On a busy day we get about 3M log rows. Lets assume the insert rate is double at busy times of the day, so something like 70 per second. I couldn't guarantee that a millisecond time would be unique. I could re-attempt the insert with the time incremented by one millisecond (well, granularity for DATETIME, or use DATETIME2 - but that's all more clustered index key width), until I got an insert, but that would loop (don't know how often, maybe "rarely") and not be truly chronological, whereas IDENTITY (Clustered Index) is giving me strictly chronological and a nice narrow key. Bit of a PITA when it nears the INT limit and we have to reset it, but in practice only "on that day" and we reset it at Midnight so in practice we never really notice (the DELETE purge has to have logic that deletes not solely on Max/Min ID range for the day though!)

    If we need to filter on a session ID we can add a date filter to the where clause. There is usually no good reason to scan the whole table or seek in a specific SessionID index which results in subsequent bookmark look-ups. This is because by very nature sessions span a certain date/time range and with those columns being the first in the clustered index, there is already a proper index to exploit.

    A session for us can often be several hours, up to 24 hours. Many users keep their browser open and dip in-and-out of the site. We deliberately don't have a 20-minute timeout on sessions as it annoys users to have to log back in and lose the context of their session (although we will automatically reload their last basket for a new session, provided they haven't cleaned out the MachineID cookie ...). The Session Table will tell me the start / end time of their session (and I could easily add the Start/End LogID [i.e. clustered index on Log Table] which might be an idea), but that may still only be restricting me to several hours of the table which only holds 3 - 7 days data, so still a significant %age of rows to then scan.

    Another option (only just occurring to me 🙂 ) might be to warehouse a session when it is queried. First time it is requested copy the Log Rows to a Cache Table. Subsequent times then top-up that Cache Table with any Log Records with LogID newer than most-recent-seen-last-time. Once I start investigating a session, e.g. which logged an error, I will probably wade backwards and forward through it for "quite a while" and would appreciate decent performance during that job. I rarely look at more than a handful of sessions, and the Cache Table can delete anything 24 hours old (even truncate at midnight would do, if I was busy then it would just re-cache the session I was viewing with a slight delay)

    I wonder how much effort it is to create a Date or SessionID Index? Perhaps I should try having that normally-dropped, and then create it before we want to do any diagnostics. We go for weeks at a time without needing to investigate anything ... Again, I could have the index drop-at-midnight and -create-if-missing when I did a Session Query.

    It certainly helps to make the session ID an ever increasing integer as well and never have the same ID reused, as opposed to say generating a new UUID per session, which could result in duplicate UUIDs.

    Yup, we have an INT IDENTITY. (as said the only hassle is periodic reset to zero when it nears INT limit, but that only happens once every couple of years)

    An unfocused query to retrieve what an account has done over all time is hardly useful in the context of reporting, especially not for logs.

    Just "for discussion" our Session Report (showing all Page views, SProc calls and errors for that session) also shows previous sessions for that MachineID and UserID logins (e.g. if multiple different logins used on that machine) and we might well investigate those, including if that UserID was used on a different MachineID (e.g. Home + Work PCs) and had the same problem. We might have a browser-specific issue which is causing a problem, for example.

    This method keeps the table frequently hit by inserts, quite small

    Is there a tangible benefit to that? I was assuming that inserting into a big table is the same as inserting into a small table?

    For me the Purge of stale data is the bigger issue. partitioning might make that easier for me.

    Better not use "with (nolock)", as that can and will give incorrect results when you do not expect it.

    In this specific case I would disagree. We never use NOLOCK in production code, but we do use it in DBA reports. It is critical that our DBA reports have zero impact on end users in what is, pretty much, a write-only logging table, so I think it is essential in our SessionID reports. We know that the report is unreliable - unlike using NOLOCK on an end user's report where they will assume the data they are seeing is accurate.

    I mentioned disabling auto updates of the statistics and actually also meant that auto create of new statistics should not happen either

    I noted that in your original and stroked my chin 🙂 Something I will need to look into. We have ASYNC Stats Create, so any query we make that triggered a stats update would not have the benefit t that query anyay, (although maybe the next query we made would benefit, but there is every chance we will only make the one query checking for a single, specific, log error), so Stats Update would be a bit of a waste of time. I can see there are other ways around this - perhaps even forcing the query plan based on whatever the optimiser usually chooses with properly optimised indexes / stats and then, as you suggest, disable the stats update.

    Better is to do it manually when after having moved a day of data to the final log table

    What is the most optimal way to "move" the data? If I have to to INSERT INTO ... DELETE ... I can't see us wanting that amount of usage on CPU/Disk I/O, nor the TLog space 🙂

    For good measure, start with a freshly created table with these options set correctly before running any query against it.

    Not sure that will help me. 90% (at a guess) of our log queries will be for "today" data, and a high proportion of those for "last hour or two". e.g. Marketing emails n00,000 people with some special offer they have dreamed up, that they never told us about, have never done in that way before, and the first we'll know about it is something alerting us that the site is stressing in some way or another!

Viewing 15 posts - 1 through 15 (of 34 total)

You must be logged in to reply to this topic. Login to reply