Stats histogram comparison with data in table (Windows function)

  • Hi,

    I'm comparing the stats histogram with the actual data distribution from a table (for this example Person.LastName on AdventureWorks) like so:

    ;with e as (
    select lastname, count(*) norows, ROW_NUMBER() OVER (order by (select 1)) rn
    from person.person
    group by lastname
    having count(*) >=1
    )
    select e.lastname
    , e.norows
    , e.rn
    , h.step_number step
    , h.range_rows, h.equal_rows, h.distinct_range_rows, h.average_range_rows
    from e
    left join sys.dm_db_stats_histogram (274100017, 2) h
    on e.LastName = h.range_high_key
    ORDER BY rn

    which will give me something like so:

    histogram

    If the stats are up-to-date the range_rows will equal the sum of rows between each step.

    What I'd like to do is tally the number of rows between each step in order to check they agree.

    To use a windows function I'll need to update the step where it is NULL in order to partition by it.

    Is there a nice way to do this.  I'm thinking I'll have to put this resultset into a temp table first and update the step.

    This will then allow me to use windows function partitioning on step.

     

     

    By the way I'm finding the the histogram like so:

    USE AdventureWorks2019

    -- find the stats histogram for the on Person.Person.LastName
    select s.name, c.Name
    ,'select * from sys.dm_db_stats_histogram (' + cast(s.object_id as varchar(30)) + ', '+cast(s.stats_id as varchar(10)) +'); ' ShowHistogram
    from sys.stats s
    left join sys.tables t
    on s.object_id = t.object_id
    left join sys.stats_columns sc
    on s.object_id = sc.object_id
    and s.stats_id = sc.stats_id
    join sys.columns c
    on sc.object_id = c.object_id
    and sc.column_id = c.column_id
    where t.name = 'person'
    AND c.name = 'LastName'
  • So going down the temp table route:

    DROP TABLE IF EXISTS #tmptmp;

    WITH e
    AS (SELECT LastName,
    COUNT(*) norows,
    ROW_NUMBER() OVER (ORDER BY (SELECT 1)) rn
    FROM Person.Person
    GROUP BY LastName
    HAVING COUNT(*) >= 1)
    SELECT e.LastName,
    e.norows,
    e.rn,
    h.step_number step,
    h.range_rows,
    h.equal_rows,
    h.distinct_range_rows,
    h.average_range_rows
    INTO #tmptmp
    FROM e
    LEFT JOIN sys.dm_db_stats_histogram(274100017, 2) h
    ON e.LastName = h.range_high_key
    ORDER BY rn DESC;

    gives me this:

    histopre

    and the only way I can think to update the step column is like so (there must be a better way):

    WHILE 1 = 1
    BEGIN
    ;WITH stps (rn, step)
    AS (SELECT rn,
    LAG(step) OVER (ORDER BY rn DESC) step
    FROM #tmptmp)
    UPDATE t
    SET step = stps.step
    FROM #tmptmp t
    JOIN stps
    ON t.rn = stps.rn
    WHERE t.step IS NULL
    AND stps.step IS NOT NULL;

    IF @@Rowcount <= 0
    BREAK;
    END

    which gives me the step column :

    histopost

    which now allows me to use windows functions like so:

    SELECT LastName,
    norows,
    rn,
    step,
    SUM(norows) OVER (PARTITION BY step) - equal_rows CalculatedRangeRows,
    ISNULL((SUM(norows) OVER (PARTITION BY step) - equal_rows) - range_rows,0) discrepancy,
    range_rows,
    equal_rows,
    distinct_range_rows,
    average_range_rows
    FROM #tmptmp
    ORDER BY rn DESC;

    There has to be a better way of doing the update, seriously an update in a loop?.

    Is there anyone that can do a better job of this?

    Recursive CTE perhaps....

     

  • I am at this point searching for some obvious plan.

    MyMileStoneCard

    • This reply was modified 1 year, 9 months ago by  Wilson254.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply