Friday Fun - Excel Formula to SQL Functions

  • David Jackson - Thursday, June 14, 2018 4:07 AM

    Here's what I ended up with

    /****** Object: UserDefinedFunction [dbo].[UTC_LocalTime]  Script Date: 14/06/2018 10:27:10 ******/
    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO

    -- =============================================
    -- Author:        Dave Jackson
    -- Create date: 14 June 2018
    -- Description:    Parses a DATETIME value and adds 1 hour if it is BST
    -- =============================================
    ALTER FUNCTION [dbo].[UTC_LocalTime]
    (    
         @Date DATETIME
    )
    RETURNS TABLE
    AS
    RETURN
    (
        SELECT @Date AS RecordedTime, CASE
        WHEN DATETIMEFROMPARTS(
           YEAR(@Date),
           MONTH(@Date),
           DATEPART(DAY, @Date),
           DATEPART(HOUR, @Date),
           DATEPART(MINUTE, @Date),
           DATEPART(SECOND, @Date),
           DATEPART(ms, @Date)) BETWEEN DATETIMEFROMPARTS(
                      YEAR(@Date),
                      3,
                      DATEPART(
                       DAY,
                       DATEADD(
                        DAY,
                        DATEDIFF(
                          DAY,
                          '19000107',
                          DATEADD(MONTH, DATEDIFF(MONTH, 0, @Date), 30)) / 7 * 7,
                        '19000107')),
                      1,
                      0,
                      0,
                      0) AND DATETIMEFROMPARTS(
                         YEAR(@Date),
                         10,
                         DATEPART(
                           DAY,
                           DATEADD(
                            DAY,
                            DATEDIFF(
                             DAY,
                             '19000107',
                             DATEADD(
                               MONTH,
                               DATEDIFF(MONTH, 0, @Date),
                               30)) / 7 * 7,
                            '19000107')),
                         0,
                         59,
                         59,
                         997) THEN DATEADD(HOUR, 1, @Date)
        ELSE @Date END AS GMTBST_Time
    )

    And here is a test harness


    --Standard TestEnvironment of 1,000,000 rows of random-ish data
    IF object_id('tempdb..#testEnvironment') IS NOT NULL
    BEGIN
    DROP TABLE #testEnvironment
    END

    --1,000,000 Random rows of data
    SELECT TOP 1000000 IDENTITY(INT,1,1) AS ID,
    RAND(CHECKSUM(NEWID())) * 30000 + CAST('1945' AS DATETIME) AS randomDate,
    ABS(CHECKSUM(NEWID())) AS randomBigInt,
    (ABS(CHECKSUM(NEWID())) % 100) + 1 AS randomSmallInt,
    RAND(CHECKSUM(NEWID())) * 100 AS randomSmallDec,
    RAND(CHECKSUM(NEWID())) AS randomTinyDec,
    RAND(CHECKSUM(NEWID())) * 100000 AS randomBigDec,
    CONVERT(VARCHAR(6),CONVERT(MONEY,RAND(CHECKSUM(NEWID())) * 100),0) AS randomMoney
    INTO #testEnvironment
    FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3

    PRINT '========== BASELINE =========='
    SET STATISTICS TIME ON
    SELECT RandomDate
    FROM #testEnvironment

    SELECT utc.RecordedTime, utc.GMTBST_Time
    FROM #testEnvironment
    CROSS APPLY dbo.UTC_LocalTime(RandomDate) utc

    SET STATISTICS TIME OFF
    PRINT REPLICATE('=',80)

    It replaces a function call that takes 18 seconds on this data set.  This runs in 4! 😀

    Thanks for your help

    Dave

    Glad I could help out.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • I think something like below is much easier to follow.  Also, you want to avoid BETWEEN when dealing with dates / times.

    I can't tell for sure if the final time value is the last Sun of Oct at 11:59:59.997 or the last Sat of Oct (or something else?).  If you need to, add a day to "Last_Sun_In_Oct" so the < still gives you the range you need.


    ALTER FUNCTION
    ...
    RETURN
    (
      WITH cte_base_dates AS (
         SELECT DATEADD(YEAR, YEAR(GETDATE()) - 1900, '19000407') AS Apr_07,
             DATEADD(YEAR, YEAR(GETDATE()) - 1900, '19001107') AS Nov_07,
             6 AS Sunday
      ),
      cte_key_dates AS (
         SELECT DATEADD(DAY, -DATEDIFF(DAY, Sunday, Apr_07) % 7 - 7, Apr_07) AS Last_Sun_In_Mar,
             DATEADD(DAY, -DATEDIFF(DAY, Sunday, Nov_07) % 7 - 7, Nov_07) AS Last_Sun_In_Oct
         FROM cte_base_dates
      )
        SELECT DATEADD(HOUR, CASE WHEN @Date >= Last_Sun_In_Mar AND @Date < Last_Sun_In_Oct
            THEN 1 ELSE 0 END, @Date) AS GMTBST_Time
        FROM cte_key_dates

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • I think something like below is much easier to follow. 
    I completely agree

    Also, you want to avoid BETWEEN when dealing with dates / times.

    I have heard this often but I have never seen any proof as to why.  I am not disagreeing, I 'm happy to learn. DATETIMEs are stored as 2 4 byte INTs, so why is BETWEEN bad?

    I can't tell for sure if the final time value is the last Sun of Oct at 11:59:59.997 or the last Sat of Oct (or something else?).  If you need to, add a day to "Last_Sun_In_Oct" so the < still gives you the range you need.

    The Period changes at 1:00 am GMT in both months, easily sorted out in the code I nicked  paid homage to below.
    This is the *new* function

    ALTER FUNCTION [dbo].[UTC_LocalTime2]

      @Date DATETIME
    )
    RETURNS TABLE
    AS
    RETURN
    WITH cte_base_dates AS (
         SELECT DATEADD(YEAR, YEAR(@date) - 1900, '19000407') AS Apr_07,
             DATEADD(YEAR, YEAR(@date) - 1900, '19001107') AS Nov_07,
             6 AS Sunday
    ),
    cte_key_dates AS (
         SELECT DATEADD(hh, 1, DATEADD(DAY, -DATEDIFF(DAY, Sunday, Apr_07) % 7 - 7, Apr_07)) AS Last_Sun_In_Mar,
             DATEADD(hh, 1, DATEADD(DAY, -DATEDIFF(DAY, Sunday, Nov_07) % 7 - 7, Nov_07)) AS Last_Sun_In_Oct
         FROM cte_base_dates
    )
       
    SELECT @Date RecordedTime, DATEADD(HOUR, CASE WHEN @Date >= Last_Sun_In_Mar AND @Date < Last_Sun_In_Oct
            THEN 1 ELSE 0 END, @Date) AS GMTBST_Time
        FROM cte_key_dates
    GO

    This is the test harness


    --Standard TestEnvironment of 1,000,000 rows of random-ish data
    IF OBJECT_ID('tempdb..#testEnvironment') IS NOT NULL
    BEGIN
        DROP TABLE #testEnvironment;
    END;

    --1,000,000 Random rows of data
    SELECT TOP 1000000 IDENTITY(INT, 1, 1) AS ID,
                       RAND(CHECKSUM(NEWID())) * 30000 + CAST('1945' AS DATETIME) AS randomDate,
                       ABS(CHECKSUM(NEWID())) AS randomBigInt,
                       (ABS(CHECKSUM(NEWID())) % 100) + 1 AS randomSmallInt,
                       RAND(CHECKSUM(NEWID())) * 100 AS randomSmallDec,
                       RAND(CHECKSUM(NEWID())) AS randomTinyDec,
                       RAND(CHECKSUM(NEWID())) * 100000 AS randomBigDec,
                       CONVERT(VARCHAR(6), CONVERT(MONEY, RAND(CHECKSUM(NEWID())) * 100), 0) AS randomMoney
      INTO #testEnvironment
      FROM master.dbo.syscolumns sc1,
           master.dbo.syscolumns sc2,
           master.dbo.syscolumns sc3;
    PRINT '========== BASELINE ==========';
    SET STATISTICS TIME ON;
    SELECT randomDate
      FROM #testEnvironment;
    SELECT        utc.RecordedTime,
                  utc.GMTBST_Time
      FROM        #testEnvironment t
      CROSS APPLY dbo.UTC_LocalTime(t.randomDate) utc;
    SELECT        utc.RecordedTime,
                  utc.GMTBST_Time
      FROM        #testEnvironment t
      CROSS APPLY dbo.UTC_LocalTime2(t.randomDate) utc;

    SET STATISTICS TIME OFF;
    PRINT REPLICATE('=', 80);

    And here are the results:


    ========== BASELINE ==========
    SQL Server parse and compile time:
       CPU time = 0 ms, elapsed time = 0 ms.

    (1000000 rows affected)
     SQL Server Execution Times:
       CPU time = 109 ms,  elapsed time = 168 ms.
    SQL Server parse and compile time:
       CPU time = 0 ms, elapsed time = 1 ms.
    (1000000 rows affected)
     SQL Server Execution Times:
       CPU time = 1391 ms,  elapsed time = 5261 ms.
    SQL Server parse and compile time:
       CPU time = 0 ms, elapsed time = 1 ms.
    (1000000 rows affected)
     SQL Server Execution Times:
       CPU time = 703 ms,  elapsed time = 2376 ms.
    ================================================================================

    Which is awesome!  Thanks Very Much!

    EDIT:  I have no idea why the code blocks are not rendering properly.  Sorry!

    Dave
    😀


    http://glossopian.co.uk/
    "I don't know what I don't know."

  • Also, you want to avoid BETWEEN when dealing with dates / times.
    I have heard this often but I have never seen any proof as to why. I am not disagreeing, I 'm happy to learn. DATETIMEs are stored as 2 4 byte INTs, so why is BETWEEN bad?

    Because of how specific you must be on the ending value. Note how the ending datetime in the BETWEEN function ends in ".997" seconds. That's correct for a datetime value. (Btw, easy to forget that and code .999 instead -- wrong!, that rounds to the next minute and .000).

    Even with .997, suppose it switches to a datetime2?  Now the ending value of .997 secs will skip any rows with values from .99700001 to .9999999 secs.  As Scooby Doo would say, ruh roh!  Similar issue for other smalldattime, time only, etc..

    But by specifying < the next time break, it works whether it's .997 or .9999999 or .000 or .999 or whatever.  Any amount less than the next break will automatically be matched.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Hello

    Returning to this, I and another co-worker have collapsed the 3 CTEs above into one statement.  Here's an example

    DECLARE @Date DATE = '2019-04-01 00:00:00';

    SELECT @Date, DATEADD(
    HOUR,
    CASE
    WHEN @Date >= DATEADD(
    hh,
    1,
    DATEADD(
    DAY,
    -DATEDIFF(
    DAY,
    6,
    DATEADD(YEAR, YEAR(@Date) - 1900, '19000407')
    ) % 7 - 7,
    DATEADD(YEAR, YEAR(@Date) - 1900, '19000407')
    )
    )
    AND @Date < DATEADD(
    hh,
    1,
    DATEADD(
    DAY,
    -DATEDIFF(
    DAY,
    6,
    DATEADD(YEAR, YEAR(@Date) - 1900, '19001107')
    ) % 7 - 7,
    DATEADD(YEAR, YEAR(@Date) - 1900, '19001107')
    )
    ) THEN
    1
    ELSE
    0
    END,
    @Date
    );

    We would like to use this code as a computed column, persisting the values so we can index it.  The @Date variable would be replaced with CreatedDate, which is a DATETIME column in said table.  However we get the good old error:

    Computed column 'GMTBST_Time' in table 'tblTestTable' cannot be persisted because the column is non-deterministic.

    Can anyone see a way to make this happen?

    tia

    Dave J

    • This reply was modified 4 years, 10 months ago by  David Jackson.
    • This reply was modified 4 years, 10 months ago by  David Jackson.
    • This reply was modified 4 years, 10 months ago by  David Jackson.


    http://glossopian.co.uk/
    "I don't know what I don't know."

  • David Jackson wrote:

    Also, you want to avoid BETWEEN when dealing with dates / times.

    <strong style="background-color: transparent; background-image: none; font-family: arial,tahoma; font-size: 13.33px; font-style: normal; font-variant: normal; font-weight: bold; letter-spacing: normal; text-align: left; text-decoration: none; text-indent: 0px; text-transform: none; width: auto; border-width: 0px; border-style: none; margin: 0px;">I have heard this often but I have never seen any proof as to why.  I am not disagreeing, I 'm happy to learn. DATETIMEs are stored as 2 4 byte INTs, so why is BETWEEN bad?

    For INTs, it's not bad because  there's not "hangers" like there are in datetime.  See Scott's post on that.

    The reason why I say it's bad even for INTs is simple.  You now have to remember two different methods but... that's not the worst of it.  If you're in the middle of troubleshooting during a fire-storm of a problem, it's helpful to have the consistency so that you don't have to lookup the datatype(s) your working with.  If you use BETWEEN, you do have to find out.  Also, non-BETWEEN method is fairly well bullet-proof if someone decides they want to change a temporal datatype for one reason or another.  I've had it (many times) where someone finally got the idea that it actually is a bad idea to have DATE in one column and TIME in another.  Usually, the keep the name of the DATE column when they do such a thing to help keep from breaking code.  BETWEEN wouldn't survive such a change correctly.  I've also seen people go hair-twiddling, thumb-sucking nuts and convert DATETIME columns to DATETIME2(7) columns and then wonder why their bloody BETWEEN was missing rows afterwards.

    On a personal note, a lot of code is difficult enough to wade through and having a bazillion '23:59:59.997"'s all over hell's half acre doesn't make for a fun read.  Of course, you also have the people that don't actually know much and use '23:59:59.999' and never pick up on the fact the value actually rounds up and will put you into the next day, which also gives you correct answers.

    And you should see the problems when they use BETWEEN on SMALLDATETIME.  Lot's more rounding there.

    So, the bottom line (for me anyway) is, learn to do it the way that works for virtually EVERYTHING and then always do it that way.  I say "virtually" because someone always knows of and edge case.  That someone wouldn't be me, though.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ScottPletcher wrote:

    Also, you want to avoid BETWEEN when dealing with dates / times.

    I have heard this often but I have never seen any proof as to why. I am not disagreeing, I 'm happy to learn. DATETIMEs are stored as 2 4 byte INTs, so why is BETWEEN bad?

    Because of how specific you must be on the ending value. Note how the ending datetime in the BETWEEN function ends in ".997" seconds. That's correct <i>for a datetime value</i>. (Btw, easy to forget that and code .999 instead -- wrong!, that rounds to the next minute and .000).

    Even with .997, suppose it switches to a datetime2?  Now the ending value of .997 secs will skip any rows with values from .99700001 to .9999999 secs.  As Scooby Doo would say, ruh roh!  Similar issue for other smalldattime, time only, etc..

    But by specifying < the next time break, it works whether it's .997 or .9999999 or .000 or .999 or whatever.  <i>Any</i> amount less than the next break will automatically be matched.

    +1,000,000!

    >= Inclusive and < Exclusive is what I use even for INTs mostly because it's the habit I've gotten into and it's saved by hinny more than once.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 7 posts - 16 through 21 (of 21 total)

You must be logged in to reply to this topic. Login to reply