Hidden Formatting Troubles with STR() (SQL Spackle)

  • jimgawn (1/3/2014)


    Thank you for your gracious response!

    With respect to formatted Date tables, I think a lot of data warehousing folks would beg to differ. People like Ralph Kimball advocate using a date table to (a) simplify query-writing and (b) to reduce computational overhead when running queries and generating reports.

    That would certainly be one of the exceptions to the rule. Thanks again for the feedback and welcome aboard!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thank you Jeff for a good "filler", there are many cracks "float"ing around when it comes to math and numbers on the platform. It is also far to easy to miss the fact that float and real are Approximate-number data types, in fact not reliable after the 15th digit as this little factorial sample here shows;

    /* Calculate Factorial */

    /* SQL Server => 2012 */

    DECLARE @TOPBIGINT= 22;

    ;WITH NUMBERS AS

    (

    SELECT

    CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS FLOAT(53)) AS N

    FROM

    (VALUES (1), (1), (1), (1), (1), (1), (1), (1)) AS A(N)

    ,(VALUES (1), (1), (1), (1), (1), (1), (1), (1)) AS B(N)

    ORDER BY 1 OFFSET 0 ROWS FETCH FIRST (@TOP) ROWS ONLY

    )

    ,N_EXP AS

    (

    SELECT

    ROW_NUMBER() OVER (PARTITION BY N2.N ORDER BY (SELECT NULL)) AS N_RID

    ,N2.N AS N

    ,EXP(SUM(LOG(N1.N)) OVER (PARTITION BY N2.N)) AS N_EXP

    ,CAST(EXP(CAST((SUM(CAST(LOG(N1.N) AS DECIMAL(38,16)))

    OVER (PARTITION BY N2.N)) AS DECIMAL(38,16)))

    AS DECIMAL(38,16)) AS N_DCEXP

    FROM NUMBERS N1

    CROSS JOIN NUMBERS N2

    WHERE N1.N <= N2.N

    )

    SELECT

    NX.N AS N

    ,NX.N_EXP AS NX_EXP

    ,CAST(NX.N_EXP AS DECIMAL(38,16)) AS NX_EXPD

    ,NX.N_DCEXP AS NX_DDEXP

    ,STR(NX.N_EXP,38,16) AS NX_STR

    FROM N_EXP NX

    WHERE NX.N_RID = 1;

  • Eirikur Eiriksson (1/4/2014)


    Thank you Jeff for a good "filler", there are many cracks "float"ing around when it comes to math and numbers on the platform. It is also far to easy to miss the fact that float and real are Approximate-number data types, in fact not reliable after the 15th digit as this little factorial sample here shows;

    /* Calculate Factorial */

    /* SQL Server => 2012 */

    DECLARE @TOPBIGINT= 22;

    ;WITH NUMBERS AS

    (

    SELECT

    CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS FLOAT(53)) AS N

    FROM

    (VALUES (1), (1), (1), (1), (1), (1), (1), (1)) AS A(N)

    ,(VALUES (1), (1), (1), (1), (1), (1), (1), (1)) AS B(N)

    ORDER BY 1 OFFSET 0 ROWS FETCH FIRST (@TOP) ROWS ONLY

    )

    ,N_EXP AS

    (

    SELECT

    ROW_NUMBER() OVER (PARTITION BY N2.N ORDER BY (SELECT NULL)) AS N_RID

    ,N2.N AS N

    ,EXP(SUM(LOG(N1.N)) OVER (PARTITION BY N2.N)) AS N_EXP

    ,CAST(EXP(CAST((SUM(CAST(LOG(N1.N) AS DECIMAL(38,16)))

    OVER (PARTITION BY N2.N)) AS DECIMAL(38,16)))

    AS DECIMAL(38,16)) AS N_DCEXP

    FROM NUMBERS N1

    CROSS JOIN NUMBERS N2

    WHERE N1.N <= N2.N

    )

    SELECT

    NX.N AS N

    ,NX.N_EXP AS NX_EXP

    ,CAST(NX.N_EXP AS DECIMAL(38,16)) AS NX_EXPD

    ,NX.N_DCEXP AS NX_DDEXP

    ,STR(NX.N_EXP,38,16) AS NX_STR

    FROM N_EXP NX

    WHERE NX.N_RID = 1;

    I have a question about your code above, why the semicolon (;) before the WITH when the preceding statement is terminated by a semicolon?

  • Hi Lynn,its an old habit I guess rather than belt and suspender kind of thing.

  • Okay, I was just curious as it is a pet peeve of mine. The semicolon is a statement terminator not a statement begininator.

  • Eirikur Eiriksson (1/4/2014)


    Thank you Jeff for a good "filler", there are many cracks "float"ing around when it comes to math and numbers on the platform. It is also far to easy to miss the fact that float and real are Approximate-number data types, in fact not reliable after the 15th digit as this little factorial sample here shows;

    Thanks for the feedback, Eirikur,

    Just to be clear and to prevent other bombastic posts from appearing on this thread, both FLOAT and REAL are entirely reliable with what they were designed to do. The times when they appear to be "unreliable" is when you have full-precision decimal expectations rather than the expectations of what FLOAT and REAL were designed for. The other thing that folks need to understand is that the floating point math behind the scenes of FLOAT and REAL is not base 10... its base 2 or binary. That's why certain numbers can't be duplicated within the precision of FLOAT and REAL.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Lynn Pettis (1/4/2014)


    Okay, I was just curious as it is a pet peeve of mine. The semicolon is a statement terminator not a statement begininator.

    BWAAA-HAAAA!!!! The use of semicolons at all used to be a pet peeve of mine and still kinda is. I love it when some programming language comes back at me and basically states "Error... you're missing a semicolon at line #nnn". Great... if the language is that bloody smart, why does it need a semicolon to begin with. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Think you nailed it here, just what I was trying to say :Whistling:

  • Lynn Pettis (1/4/2014)


    Okay, I was just curious as it is a pet peeve of mine. The semicolon is a statement terminator not a statement begininator.

    I confess to being one of Lynn's converts. 😉


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • One slip Jeff, and I am surprised both that you made it and that no one else mentioned it yet (in my quick review of the thread): you used an Nvarchar(10) to store the output. That introduces a CONVERT_IMPLICIT into the Compute Scalar operation that is unnecessary and which skews the CPU and duration of the runs.

    Second, I tried this, expecting the CONVERT_IMPLICIT to float removal to make the STR() run much more efficient. I was mistaken and quite surprised that it did not make it much faster!

    CREATE TABLE #t2 (a float)

    GO

    INSERT #t2

    SELECT CAST(somenumber AS float)

    FROM #MyHead

    DECLARE @Bitbucket VARCHAR(10)

    SELECT @Bitbucket = STR(a,10) FROM #t2

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Jeff Moden (1/3/2014)


    ...

    Ok... that tears it. :blush: I have to get off my dead hinny and install 2K12. Thanks to folks like you, Wayne Sheffield, and a couple of others, I not falling behind in knowledge of the version but I really need to start playing with it because of things like the test you did above.

    I knew some good would come of that!

    Jeff Moden (1/3/2014)


    I do have a suggestion though. As I pointed out in the article at http://www.sqlservercentral.com/articles/T-SQL/91724/, SET STATISTICS can have some really undesirable impact on what gets reported for performance. Would you try the same test using SQL Profiler and see if you get similar results?

    Thanks for that and thanks for the feedback.

    Indeed, I am happy to oblige. I didn't expect much difference given the magnitude of the original results I obtained.

    SQL Profiler results (1 run):

    CPU Reads Writes Duration

    Cast, Concatenate, and Size 343 1612 0 369

    The STR() Method is SLOWER 858 1612 0 888

    The FORMAT() Method is WAY, WAY SLOWER 35319 1612 0 44126

    To be fair to Microsoft and FORMAT, the function does a lot more than just what is being done here. It does follow the pattern that I espouse, namely that a generalized tool will usually perform more slowly than one that is specifically tailored to the task at hand. That doesn't mean you shouldn't use them, just that you need to make sure in the end after your query is working that you're satisfied with the performance and that maybe it would take great pains to make it only a little faster (clearly that is not the case here).

    I am curious about one thing. I read the article you linked to when it came out and it seemed like the specific case of biased time results for SET STATISTICS was limited to testing of UDFs. At least that is what I (for some reason) assumed when I read it.

    I believe you've suggested before that it may not only apply to that limiting case. Do you have some specific evidence of that Jeff? It is not like you to make generalizations unless you've got some evidence to back it up.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • First, thanks for the test. On the subject of defending MS for the slowness of FORMAT based on the fact that it's a generic function, I'm sure that it was designed to operate at the machine language level and the fact that it's 119 times slower than some T-SQL "hack" code is absolutely deplorable. They should turn in their "man" cards as punishment. 😉

    On the SET STATISTICS thing, I haven't looked for evidence that it affects anything other than Scalar and Multi-Statement UDFs (not to be confused with iTVFs) but it has made me very, very cautious in its use. It would have been a joy if that was the reason why FORMAT ran so slow instead of it falling into the category of being seriously performance challenged code.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sorry if I'm just being dense by asking this, I've never used STR() either so this is new to me. The article says many people use STR() for its ability to right-justify figures, and that besides the rounding issues, which I understand and am now aware of, there is a performance penalty when using it. You then run a timed test to compare it to a more complex, but quicker approach. However when I run a modified version of the code (to strip out the timing code - I'll take your word for it it's quicker) and the output to a variable, the faster approach doesn't right justify the data like STR() does.

    Here is what I'm running (limited to 10 rows here)

    --===== Conditionally drop and rebuild a test table in a nice safe place

    -- that everyone has. This only takes several seconds.

    IF OBJECT_ID('tempdb..#MyHead','U') IS NOT NULL

    DROP TABLE #MyHead

    ;

    SELECT TOP 10

    SomeNumber = IDENTITY(INT,1,1)

    INTO #MyHead

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    ;

    SELECT RIGHT('0000000000' + CAST(SomeNumber AS VARCHAR(10)),10)

    FROM #MyHead

    ;

    SELECT STR(SomeNumber,10)

    FROM #MyHead

    and here is the output

    (10 row(s) affected)

    ----------

    0000000001

    0000000002

    0000000003

    0000000004

    0000000005

    0000000006

    0000000007

    0000000008

    0000000009

    0000000010

    (10 row(s) affected)

    ----------

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    (10 row(s) affected)

    Is there something I'm not understanding, or am doing incorrectly? I would expect that any alternative would output the same results as STR(). Any insight would be appreciated.

    Thanks

    Paul

  • Sure. There are actually three implied reasons. The first is that zeros are simply easier to see and verify the presence of in an article than a blob of spaces are. The second is to show a bit of flexibility of using different characters, the more common of which is "0". The third is that "a lot of people" have to produce output for fixed length files and the numeric fields need to be left-padded with zeros, something that the already slower STR() doesn't do without extra code. Those weren't specifically mentioned in the article because I didn't want to distract folks from the main purposes of the article. As with any article, there is the potential for casual learning. Hopefully, even beginners will immediately understand that you should simply replace the zeros with spaces to make the output like that of the STR() function.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks Jeff, makes sense. If someone asked me to right justify some text I think the quicker solution would be closer to what I would have attempted anyway as i'm more used to using those sorts of string manipulations, but at least now I know for sure it's the better implementation

    Thanks

    Paul

Viewing 15 posts - 61 through 75 (of 80 total)

You must be logged in to reply to this topic. Login to reply