Query to Get Differences in Table Sizes

  • Dear Group:

    We get a daily push from other systems into our SQL Server for us to do reporting against.  Over the last couple weeks, we have noticed that these pushes are not consistent, in that sometimes a few tables are empty, and other times, the record counts are significantly off from what is normally expected.  The bad thing is, we don't notice until our processes stop working.

    We wrote a small query that populates a table with the Schema, Tablename, RecordCount and Date and gets executed daily so we can see the daily counts, but I am trying to figure out how to write a query and return the results of tables where the daily count is off, say 20% from the average, but what keeps catching me is that if I use this table and a table with a normal daily load of 42,000,000 has zero for a day, it will greatly throw off any averages, or if this same table, one day only loads 2,500,000 records.

    So I am looking for some help on how we can know ahead of time from a query if certain tables might be an issue.

    Hopefully this makes sense, but if not, please let me know.

  • Seems like the obvious answer is to filter out the days where a table has no records. You could just use LAG() to get the previous recorded day's record count. This is what I did... (mind you it's been a while)

    CREATE PROCEDURE [dbo].[GetTableGrowth]
    @Threshold DECIMAL(3,2) = 0
    AS

    SET NOCOUNT ON;

    WITH TableGrowth(SchemaName, TableName, LogDateTime, TotalRowCount, PrevRowCount, RowCountDelta)
    AS
    (SELECT
    schemaName
    ,TableName
    ,LogDateTime
    ,TotalRowCount
    ,PrevRowCount = COALESCE(
    LAG(TotalRowCount,1)
    OVER (PARTITION BY schemaName, TableName
    ORDER BY LogDateTime)
    ,0)
    ,RowCountDelta = TotalRowCount - COALESCE(
    LAG(TotalRowCount,1)
    OVER (PARTITION BY schemaName, TableName
    ORDER BY LogDateTime)
    ,0)
    FROM
    dbo.TableGrowthLog
    WHERE
    TableName NOT IN ('Tally','sysdiagrams')
    )
    SELECT *
    FROM (
    SELECT tg.SchemaName
    , tg.TableName
    , tg.LogDateTime
    , tg.TotalRowCount
    , tg.PrevRowCount
    , tg.RowCountDelta
    , RawPctGrowth = IIF(tg.PrevRowCount = 0, 0, RowCountDelta/(1.0 * tg.PrevRowCount))
    , PctGrowth = FORMAT(IIF(tg.PrevRowCount = 0, 0, RowCountDelta/(1.0 * tg.PrevRowCount)),'P2')
    FROM TableGrowth tg
    ) tg2
    WHERE ABS(tg2.RawPctGrowth) > @Threshold;
    GO

    Not perfect, but it's a start. Then you just filter out the instances where a table has a recordcount of zero. Then you can use LAG() to calculate growth.

  • I'm thinking that you don't actually want an Average... I'm thinking that you want a Linear Regression and, perhaps, the Mode.  You should also relegate a "0" input to a non-considered throw-out value.  That latter thing is pretty easy to do... just change your zeros to NULL and SQL Server will auto-magically ignore them for things like Averages, Sums, etc.

    So far as the big deltas go, I'm thinking that Linear Regression is the way to go.  And, yeah, I don't believe that it's that difficult to do all in T-SQL but I always get overruled in the places I work so thought I'd ask.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply