Update Gaps

  • CELKO (12/7/2012)


    >> Here's the SQL to create the table and populate it with data. <<

    Do you really want to do FLOAT? How will you handle the rounding errors and slowness of floating point math without special hardware? The data element names are vague; pressure of what? NULL is the right choice for a missing reading because zero is a real value for reading on a busted meter, an empty pipe or whatever. I will guess that two decimal is good enough, rather than 1.79E+308 you have now. You also need to start using the ANSI/ISO Standard INSERT INTO syntax.

    This is an inherited database, and I am not allowed to change the schema except to add new objects. There are many things I would LOVE to change. The zero values are a result of a bulk load, and not real-time pressure values. The column names are not the actual schema column names, and I don't think it matters for what I was needing help with. The pressure column does allow for NULL values, however, there is a default value of 0. I assume when they loaded the data, they must not have explicitly populated that column.

    Regarding your ANSI/ISO INSERT INTO comment, unless I'm mistaken, your suggestion only works on 2008 or higher. The DATE data type is also not a valid 2005 data type. Most people here are probably on 2012 at least for testing purposes, but I wanted to make it as generic as possible.

    Thanks for your suggestions.

  • Sam S Kolli (12/6/2012)


    Let me say before hand that Jeff Moden's solution is better than this for the specific task you asked for; (on my laptop, my solution is 85% of the cost, whereas jeff's solution is 15% of the cost)

    Oh... we have to be careful here, Sam. My query is based on a "Triangular Join" and it relies on just the right index or it can be hundreds and, sometimes, thousands of times slower and more resource intensive than a While Loop. It turns out that this solution worked perfectly because of the indexing and data available in the columns but such is not always the case.

    Also, a word of caution. The costs you find in execution plans, especially the % of batch cost are fickle lovers and they can lie like you wouldn't believe. While such costs are helpful in finding potential problems, they should never be used to make the "final decision".

    For example, here's some very simple code to build 30 years worth of dates using two different methods. The details, of course, are in the comments in the code.

    /****************************************************************************************

    Purpose:

    This code demonstrates that the estimated and actual execution plans in SQL Server can

    be 100% INCORRECT and that the execution plan should only be relied on to provide hints

    as to what may be wrong with a query rather than an absolute indication. This code runs

    in SQL Server 2005 only.

    The code creates 30 years worth of dates starting with 2000-01-01 using two different

    methods. The first method uses a recursive CTE and the second method uses a "Tally"

    table. The output of each method is directed to a "throw-away" variable to take

    display delays out of the picture.

    Please check both the actual and estimated execution plans and compare the % of batch.

    Please see the following article on how to build a Tally table and how they can be used

    to replace certain While Loops.

    http://www.sqlservercentral.com/articles/T-SQL/62867/

    ****************************************************************************************/

    SET NOCOUNT ON

    --=======================================================================================

    -- Recursive method shown by (Name with-held)

    --=======================================================================================

    PRINT '========== Recursive method =========='

    --===== Turn on some performance counters ===============================================

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    DECLARE @Bitbucket DATETIME --Holds display output so display times aren't measured.

    --===== Execute the code being tested ===================================================

    DECLARE @DateVal DATETIME

    SET @DateVal = '2000-01-01'

    ;with mycte as

    (

    select @DateVal AS DateVal

    union all

    select DateVal + 1

    from mycte

    where DateVal + 1 < DATEADD(yy, 30, @DateVal)

    )

    select @Bitbucket = d.dateval

    from mycte d

    OPTION (MAXRECURSION 0)

    --===== Turn off the performance counters and print a separator =========================

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT REPLICATE('=',90)

    GO

    --=======================================================================================

    -- Tally table method by Jeff Moden

    --=======================================================================================

    PRINT '========== Tally table method =========='

    --===== Turn on some performance counters ===============================================

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    DECLARE @Bitbucket DATETIME --Holds display output so display times aren't measured.

    --===== Execute the code being tested ===================================================

    DECLARE @StartDate AS DATETIME

    SET @StartDate = '2000-01-01'

    SELECT TOP (DATEDIFF(dd,@StartDate,DATEADD(yy,30,@StartDate)))

    @Bitbucket = @StartDate-1+t.N

    FROM dbo.Tally t

    ORDER BY N

    --===== Turn off the performance counters and print a separator =========================

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT REPLICATE('=',90)

    GO

    Here are the execution plans for the code. Notice that the % of batch is 0% for the Recursive CTE and its 100% for the Tally Table method. One would look at this and insist that Recursive CTEs are much better than the Tally Table method for doing such a thing.

    But, if we look at the performance of the code, we can see that the % of Batch is 100% incorrect. The Tally Table code is actually MUCH faster than the Recursive CTE.

    ========== Recursive method ==========

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    Table 'Worktable'. Scan count 2, logical reads 65749, physical reads 0,

    read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 391 ms, elapsed time = 449 ms.

    ==========================================================================================

    ========== Tally table method ==========

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    Table 'Tally'. Scan count 1, logical reads 20, physical reads 0,

    read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 31 ms, elapsed time = 27 ms.

    ==========================================================================================

    Try your code and measure it for performance. Also, be careful about using SET STATISTICS. It can also lie (not in this case, though). See the following article about how SET STATISTICS can lie about performance.

    http://www.sqlservercentral.com/articles/T-SQL/91724/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Modified my code to update the table:

    CREATE TABLE dbo.UpdateGaps(

    MachineID int NOT NULL,

    RecordedDate datetime NOT NULL,

    Pressure float NULL,

    CONSTRAINT PK_UpdateGaps PRIMARY KEY CLUSTERED (MachineID,RecordedDate)

    )

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-02', 0);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-03', 300);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-04', 300);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-05', 0);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-06', 0);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-07', 0);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-08', 330);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-09', 0);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-10', 300);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-02', 380);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-03', 0);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-04', 0);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-05', 350);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-06', 350);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-07', 0);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-08', 0);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-09', 400);

    INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-10', 0);

    GO

    select * from dbo.UpdateGaps;

    go

    with Base0Data as (

    select

    MachineID,

    RecordedDate,

    Pressure,

    rn = row_number() over (partition by MachineID, case when Pressure = 0 then 0 else 1 end order by RecordedDate)

    from

    dbo.UpdateGaps

    )

    ,BaseData as (

    select

    MachineID,

    RecordedDate,

    Pressure,

    GrpDate = dateadd(dd,-rn,RecordedDate),

    rn,

    rn1 = row_number() over (partition by MachineID, dateadd(dd,-rn,RecordedDate) order by dateadd(dd,-rn,RecordedDate))

    from

    Base0Data

    )

    --select * from BaseData where Pressure = 0

    --select

    -- bd1.MachineID,

    -- bd1.RecordedDate,

    -- Pressure = case when bd1.Pressure = 0 then coalesce(bd2.Pressure, bd1.Pressure) else bd1.Pressure end

    update up set

    Pressure = case when bd1.Pressure = 0 then coalesce(bd2.Pressure, bd1.Pressure) else bd1.Pressure end

    from

    dbo.UpdateGaps up

    inner join BaseData bd1

    on up.MachineID = bd1.MachineID and up.RecordedDate = bd1.RecordedDate

    left outer join BaseData bd2

    on (bd1.MachineID = bd2.MachineID

    and dateadd(dd, -bd1.rn1, bd1.RecordedDate) = bd2.RecordedDate);

    go

    select * from dbo.UpdateGaps;

    go

    drop table dbo.UpdateGaps;

    go

  • CELKO (12/9/2012)


    Regarding your ANSI/ISO INSERT INTO comment, unless I'm mistaken, your suggestion only works on 2008 or higher. The DATE data type is also not a valid 2005 data type. Most people here are probably on 2012 at least for testing purposes, but I wanted to make it as generic as possible.

    Yep, but the DML is valid back to SQL-92 Standards.

    Not your fault but that DML is still just as useless in all versions up to and not icluding SQL Server 2012, as it was back then. This is also one of the many reasons why true portability is a myth and I actually hope it always is. If all makes of SQL engines had identical capabilities, there would be no competition to improve. Many former proprietary enchancements have made their way into ISO/ANSI standards and that's a good thing to continue.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • CELKO (12/10/2012)


    .. why true portability is a myth and I actually hope it always is. If all makes of SQL engines had identical capabilities, there would be no competition to improve. Many former proprietary enhancements have made their way into ISO/ANSI standards and that's a good thing to continue.

    Again, I have earned a living writing portable SQL for decades. It has been a good paying myth for me 😀

    Most proprietary features do not make it. The beauty of SQL is that it is abstract. The same SQL will run on a hand-held device, a mainframe, hashed tables, columnar data stores, ISAM and any other architectures. The improvements come not in the SQL language, but in the optimizer and storage engines. The idea of that abstraction was the code could stay the same while the hardware changes.

    Heh... throw in just one variable and portability goes to hell in a handbasket. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 5 posts - 16 through 19 (of 19 total)

You must be logged in to reply to this topic. Login to reply