Painful datetime conversion

  • Just to add to my original suggestion , since your using this for a ETl process , Here is what I think would work

    -create a date table for i guess the next 10 years

    -Take the data from these table and make it part of a cache transform for the SSIS package

    -Have a lookup transform join with the cache dataset, this way you get a table like structure against which you can perform joins and filters ( within the SSIS package) without violating any warranties

    I have tested this approach a number of times and its works great for 10s of millions of rows as long as there are no blocking transforms elsewhere in the package.

    Jayanth Kurup[/url]

  • Eirikur Eiriksson (9/18/2015)


    Kristen-173977 (9/18/2015)


    The front runners were neck-and-neck in Luis's script?

    http://www.sqlservercentral.com/Forums/FindPost1720510.aspx

    I've found the method I use for timing more accurate than the time statistics

    😎

    And it would take everything into account. No timing problems from scalar functions or the timing being on impacting anything...just total execution time. I think I'm going to have to play with that on Monday. Thanks.

  • Eirikur Eiriksson (9/18/2015)


    I've found the method I use for timing more accurate than the time statistics

    Ah ... 🙂 Not to dis Luis or anything, but when I ran the STATISTICS example my repeat timings were all over the shop. I thought i might have been varying server load, but I was a bit surprised, at the time, that it could be that as there was hardly anyone else using it.

    I'll run your script here and see if the Min/Max/etc vary by much.

  • Kristen-173977 (9/18/2015)


    Eirikur Eiriksson (9/18/2015)


    I've found the method I use for timing more accurate than the time statistics

    Ah ... 🙂 Not to dis Luis or anything, but when I ran the STATISTICS example my repeat timings were all over the shop. I thought i might have been varying server load, but I was a bit surprised, at the time, that it could be that as there was hardly anyone else using it.

    I'll run your script here and see if the Min/Max/etc vary by much.

    There's no problem. Sometimes, different servers/computers might react differently with similar data.

    I found similar results to Eirikur using his script, which might reflect a problem with measurements with system functions (I learned the problem existed with udfs from this article[/url]).

    I'd be interested in seeing your results.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Sean Lange (9/17/2015)


    I can't change the table at all. It will violate our support contract. Not to mention it would be possible to get trashed with an update at some point. We are not allowed to do anything other than select statements or we are on our own.

    Are you allowed to add objects? I'm thinking an indexed view that exposes and persists this column.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • WayneS (9/18/2015)


    Sean Lange (9/17/2015)


    I can't change the table at all. It will violate our support contract. Not to mention it would be possible to get trashed with an update at some point. We are not allowed to do anything other than select statements or we are on our own.

    Are you allowed to add objects? I'm thinking an indexed view that exposes and persists this column.

    Quick thought, if no object can be created in the database then an option is to create another database on the same server and create the objects there. There is really no penalty querying objects across databases on the same server.

    😎

  • Can you have an Indexed View in another database?

    Sorry, being lazy I haven't checked, but my recollection is "no"

    If that is the case is there some other, cool :-), way to get data in a.n.other database to reflect changes in the main database? (A trigger, in the main table, being not allowed due to Licence terms).

  • Eirikur Eiriksson (9/17/2015)


    Using my normal test harness, Kristen's DATETIMEFROMPARTS solution is a clear winner, more than twice faster than the next one. Note that the slowest solutions and those generating conversion errors are not included in the test.

    What's the configuration of your box, Eirikur? I ask because when I run this on the Dev box at work, Kristen's times come up in the 900ms range and the rest also suffer a similar nearly 3X increase. Something's obviously wrong on that box and I'm trying to hammer out what it is.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (9/19/2015)


    Eirikur Eiriksson (9/17/2015)


    Using my normal test harness, Kristen's DATETIMEFROMPARTS solution is a clear winner, more than twice faster than the next one. Note that the slowest solutions and those generating conversion errors are not included in the test.

    What's the configuration of your box, Eirikur? I ask because when I run this on the Dev box at work, Kristen's times come up in the 900ms range and the rest also suffer a similar nearly 3X increase. Something's obviously wrong on that box and I'm trying to hammer out what it is.

    That sounds both slow and odd, especially when taking into consideration that my test boxes are mostly second generation i5 CPUs, some dual and some quad core with nominal memory of 8-12 Gb. That last test was run on a Dell E6220 i5-2520M and 8Gb Ram. On the other hand, all of those boxes have SSD storage for the databases (or in case of the laptops, the system drive) and that is where I spare no expense.

    😎

    Do you see high IO or CPU wait times when running these tests? There is a nifty little tool called SQL Performance Monitor (http://sqltouch.blogspot.ca/2014/03/free-real-time-sql-server-performance.html) which I often use when troubleshooting one off problems like this.

  • WayneS (9/18/2015)


    Sean Lange (9/17/2015)


    I can't change the table at all. It will violate our support contract. Not to mention it would be possible to get trashed with an update at some point. We are not allowed to do anything other than select statements or we are on our own.

    Are you allowed to add objects? I'm thinking an indexed view that exposes and persists this column.

    I would have to check and see if they allow that. I am guessing I probably could but they will say that it may break in the future if they change the table. Not a big deal really.

    At this point the conversion in the query is working acceptably. This is an ETL process so it isn't like somebody is actually going to notice if it takes 3 seconds instead of 1. And it only runs once an hour so it isn't going to cause large amount of waits. I have leveraged a cte to filter out a the large majority of rows initially using simple math for the date as suggested by Eric Russell. Then I do the inline conversion against the cte.

    Here is the gist of what I am currently using.

    with InitialPass as

    (

    select *

    from #Something

    where VHRGDT > cast(replace(convert(varchar(10), @LastExtractDate, 102), '.', '') as int)

    )

    select *

    from InitialPass

    where CAST(CAST(VHRGDT AS char(9)) + STUFF(STUFF(RIGHT(VHRGTM + 1000000, 6), 5, 0, ':'), 3, 0, ':') AS datetime) > @LastExtractDate

    This allows the index on the VHRGDT column to filter out the data up until the current day. Then I can add the costly datetime conversion to eliminate rows based on the time. Seems to be pretty efficient and simple. 😀

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Eirikur Eiriksson (9/19/2015)


    Jeff Moden (9/19/2015)


    Eirikur Eiriksson (9/17/2015)


    Using my normal test harness, Kristen's DATETIMEFROMPARTS solution is a clear winner, more than twice faster than the next one. Note that the slowest solutions and those generating conversion errors are not included in the test.

    What's the configuration of your box, Eirikur? I ask because when I run this on the Dev box at work, Kristen's times come up in the 900ms range and the rest also suffer a similar nearly 3X increase. Something's obviously wrong on that box and I'm trying to hammer out what it is.

    That sounds both slow and odd, especially when taking into consideration that my test boxes are mostly second generation i5 CPUs, some dual and some quad core with nominal memory of 8-12 Gb. That last test was run on a Dell E6220 i5-2520M and 8Gb Ram. On the other hand, all of those boxes have SSD storage for the databases (or in case of the laptops, the system drive) and that is where I spare no expense.

    😎

    Do you see high IO or CPU wait times when running these tests? There is a nifty little tool called SQL Performance Monitor (http://sqltouch.blogspot.ca/2014/03/free-real-time-sql-server-performance.html) which I often use when troubleshooting one off problems like this.

    I'm not seeing anything in the free tool that I'm not already collecting data for. 🙂 Also, the Dev box that I'm talking about has virtually no usage yet (it's almost brand new and we've not migrated to it yet). I can tell you that it has no SSDs, though. It's a VM, to boot.

    On the boxes you spoke of, are they Windows servers or laptops or desktops or ???

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (9/21/2015)


    I'm not seeing anything in the free tool that I'm not already collecting data for. 🙂

    Doesn't come as a surprise,thought it was worth mentioning though.

    😎

    Also, the Dev box that I'm talking about has virtually no usage yet (it's almost brand new and we've not migrated to it yet). I can tell you that it has no SSDs, though. It's a VM, to boot.

    Do you have other VM's on that host and how do those perform?

    The most plausible explanation would be the read performance but of course I'm just guessing here.

    On the boxes you spoke of, are they Windows servers or laptops or desktops or ???

    The ones used for the tests are laptops and desktops running desktop OSs (Win7), each having multiple instances of SQL Servers and they are relatively busy boxes.

    BTW, just ran the test on a tiny (smallest) Amazon RDS instance

    T_TEXT DURATION

    -------------- -----------

    DRY RUN 140395

    Kristen 374388

    Luis 842373

    EE 842373

    tripleAxe-1 1263560

  • Sean Lange (9/17/2015)


    ... I would rather peel M&Ms in the desert than work with this system.

    Sounds like the ERP system used by a previous employer. I told my friends that the name of That Vendor would never appear on my resume or my LinkedIn profile, and it hasn't. I started a new job and found out that their accounting system had been bought out by That Vendor.

    *sigh* Sometimes you just can't get away from certain people/things. I only hope that the division managing our system is different than their ERP division.

    I found out that at that previous employer that the vendor took over managing several other systems, which included disabling my DBCCs, log monitoring, etc. That Vendor didn't notice an error creep in and lost a couple of months of data.

    -----
    [font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]

  • Jeff Moden (9/21/2015)


    Eirikur Eiriksson (9/19/2015)


    Jeff Moden (9/19/2015)


    Eirikur Eiriksson (9/17/2015)


    Using my normal test harness, Kristen's DATETIMEFROMPARTS solution is a clear winner, more than twice faster than the next one. Note that the slowest solutions and those generating conversion errors are not included in the test.

    What's the configuration of your box, Eirikur? I ask because when I run this on the Dev box at work, Kristen's times come up in the 900ms range and the rest also suffer a similar nearly 3X increase. Something's obviously wrong on that box and I'm trying to hammer out what it is.

    That sounds both slow and odd, especially when taking into consideration that my test boxes are mostly second generation i5 CPUs, some dual and some quad core with nominal memory of 8-12 Gb. That last test was run on a Dell E6220 i5-2520M and 8Gb Ram. On the other hand, all of those boxes have SSD storage for the databases (or in case of the laptops, the system drive) and that is where I spare no expense.

    😎

    Do you see high IO or CPU wait times when running these tests? There is a nifty little tool called SQL Performance Monitor (http://sqltouch.blogspot.ca/2014/03/free-real-time-sql-server-performance.html) which I often use when troubleshooting one off problems like this.

    I'm not seeing anything in the free tool that I'm not already collecting data for. 🙂 Also, the Dev box that I'm talking about has virtually no usage yet (it's almost brand new and we've not migrated to it yet). I can tell you that it has no SSDs, though. It's a VM, to boot.

    On the boxes you spoke of, are they Windows servers or laptops or desktops or ???

    Jeff, did you figure out why your dev box was/is 3 times slower than the smallest amazon RDS instance?

    😎

  • Kristen-173977 (9/16/2015)


    Lynn Pettis (9/16/2015)


    Or this if you want to use a MS provided scalar function:

    ... but somewhere on my travels I saw this which satisfies my perception that an arithmetic algorithm will perform better than a string one, but the formula is a bit Smoke & Mirrors to me

    CONVERT(datetime, RTRIM(VHRGDT))

    + (VHRGTM * 9 +VHRGTM % 10000 * 6 + VHRGTM % 100 * 10)

    / 216e4

    Heh, that is a funny bit of math there. In case it still seems like smoke and mirrors, I'll try to explain it as clearly as possible.

    If you want to turn an integer representing HHMMSS into something that can be added to a date (let's say seconds for now) using just math, you have a couple choices.

    First, you could rely on modulo and integer division to split up HH, MM, and SS, multiplying them each by the appropriate factor (3600, 60, 1), and sum them.

    If you don't want to rely on integer division to cut off digits to the right, then you have to do something like this smoke and mirrors treatment.

    Using just modulo, you'll be able to get the integer representations of HHMMSS, MMSS, and SS. Now you'll again want to multiply each of them by an appropriate factor and sum them, but now the integer representation of SS will get multiplied by three numbers before summing, and MM will get multiplied by two numbers before summing.

    That means choosing the appropriate factors is a little trickier. Let's represent the problem like this:

    (hoursFactor)*HHMMSS +

    (minutesFactor)*MMSS +

    (secondsFactor)*SS

    Now, since those are just integers we're multiplying, that means minutes will be counted 100 times what seconds are. To bring this down to the appropriate 60, we'll need to make sure that the factors by which the minutes will be multiplied sum to 60% of the sum of the factors by which seconds will be multiplied. To put it another way, (hoursFactor)+(minutesFactor) must be 60% of (hoursFactor)+(minutesFactor)+(secondsFactor).

    Similarly, in integer form, hours will count as 10,000 seconds, when they're really only 3600, so (hoursFactor) must be 36% of (hoursFactor)+(minutesFactor)+(secondsFactor).

    In yet other words, we know that:

    (hoursFactor)/[(hoursFactor)+(minutesFactor)+(secondsFactor)]=0.36

    [(hoursFactor)+(minutesFactor)]/[(hoursFactor)+(minutesFactor)+(secondsFactor)]=0.6

    From there we just have to assume a value for one of the factors, and solve the resulting system of two equations for one of the other factors and we will have three appropriate numbers.

    If we end up with 9,6, and 10 as in that snippet, that means that we end up multiplying seconds by 25, resulting in the number of twenty-fifths of a second that time represents. If we want to directly add that number to a datetime value, then we need to convert it to a fraction of a day, so we divide it by the number of twenty-fifths of a second in a day, which is 216e4.

    Easy as pie yet clear as mud, and as Sean pointed out, not exactly something you want to be documenting in your SQL Server code 🙂

    Cheers!

Viewing 15 posts - 46 through 60 (of 62 total)

You must be logged in to reply to this topic. Login to reply