Creating Strings with Replicate

  • Comments posted to this topic are about the item Creating Strings with Replicate

  • Really interesting question, thanks David

    ____________________________________________
    Space, the final frontier? not any more...
    All limits henceforth are self-imposed.
    β€œlibera tute vulgaris ex”

  • Got this right as I was hit by it a while back. Still feels like a bug to me, but Microsoft say not so it can't be πŸ˜‰
     You have to code it like this

    DECLARE @string NVARCHAR(max) = (
    SELECT REPLICATE(cast(N'X' as nvarchar(max)), 9000)
    )

  • Toreador - Friday, September 21, 2018 1:39 AM

    Got this right as I was hit by it a while back. Still feels like a bug to me, but Microsoft say not so it can't be πŸ˜‰
     You have to code it like this

    DECLARE @string NVARCHAR(max) = (
    SELECT REPLICATE(cast(N'X' as nvarchar(max)), 9000)
    )

    Annoying yes. But not a bug. The string literal / constant (" N'X' " in this case) is interpreted as being NVARCHAR(4000) when the query is parsed. Just like how " 1 " (by itself, not within single-quotes) is interpreted as being INT and not BIGINT or TINYINT or DECIMAL/NUMERIC.

    Take care,
    Solomon..

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • Solomon Rutzky - Friday, September 21, 2018 7:16 AM

    Toreador - Friday, September 21, 2018 1:39 AM

    Got this right as I was hit by it a while back. Still feels like a bug to me, but Microsoft say not so it can't be πŸ˜‰
     You have to code it like this

    DECLARE @string NVARCHAR(max) = (
    SELECT REPLICATE(cast(N'X' as nvarchar(max)), 9000)
    )

    Annoying yes. But not a bug. The string literal / constant (" N'X' " in this case) is interpreted as being NVARCHAR(4000) when the query is parsed. Just like how " 1 " (by itself, not within single-quotes) is interpreted as being INT and not BIGINT or TINYINT or DECIMAL/NUMERIC.

    Take care,
    Solomon..

    So than if (" N'X' " in this case) is interpreted as being NVARCHAR(4000) .  How does the length get to 8000?
    And, well perhaps that explanation will explain the results of these;
    DECLARE @Nstring NVARCHAR(max) = (SELECT REPLICATE(N'X', 19000))
    SELECT DATALENGTH(@Nstring)
    GO
    DECLARE @Nstring NVARCHAR(max) = (SELECT REPLICATE('X', 19000))
    SELECT DATALENGTH(@Nstring)
    GO
    DECLARE @string VARCHAR(max) = (SELECT REPLICATE(N'X', 19000))
    SELECT DATALENGTH(@string)
    GO
    DECLARE @string VARCHAR(max) = (SELECT REPLICATE('X', 19000))
    SELECT DATALENGTH(@string)

  • Budd - Friday, September 21, 2018 7:55 AM

    Solomon Rutzky - Friday, September 21, 2018 7:16 AM

    Toreador - Friday, September 21, 2018 1:39 AM

    Got this right as I was hit by it a while back. Still feels like a bug to me, but Microsoft say not so it can't be πŸ˜‰
     You have to code it like this

    DECLARE @string NVARCHAR(max) = (
    SELECT REPLICATE(cast(N'X' as nvarchar(max)), 9000)
    )

    Annoying yes. But not a bug. The string literal / constant (" N'X' " in this case) is interpreted as being NVARCHAR(4000) when the query is parsed. Just like how " 1 " (by itself, not within single-quotes) is interpreted as being INT and not BIGINT or TINYINT or DECIMAL/NUMERIC.

    Take care,
    Solomon..

    So than if (" N'X' " in this case) is interpreted as being NVARCHAR(4000) .  How does the length get to 8000?
    And, well perhaps that explanation will explain the results of these;
    DECLARE @Nstring NVARCHAR(max) = (SELECT REPLICATE(N'X', 19000))
    SELECT DATALENGTH(@Nstring)
    GO
    DECLARE @Nstring NVARCHAR(max) = (SELECT REPLICATE('X', 19000))
    SELECT DATALENGTH(@Nstring)
    GO
    DECLARE @string VARCHAR(max) = (SELECT REPLICATE(N'X', 19000))
    SELECT DATALENGTH(@string)
    GO
    DECLARE @string VARCHAR(max) = (SELECT REPLICATE('X', 19000))
    SELECT DATALENGTH(@string)

    This is due to the function being DATALENGTH and not LEN. LEN measures "characters", while DATALENGTH measures actual bytes used to store the value. NVARCHAR is Unicode (via the UTF-16 Little Endian encoding) which is 2 or 4 bytes per "character" (with the most commonly used characters being the 2-byte variety). Hence 4000 NVARCHAR characters (assuming that they are all 2-byte characters), takes up 8000 bytes. VARCHAR (without the upper-case "N" prefix on the string literal) is 1 or 2 bytes per character (the vast majority of the time being just 1 byte), hence 4000 VARCHAR characters (assuming that they are all 1-byte characters), takes up just 4000 bytes.

    The reason that you see the output you are seeing for those 4 queries is due to "Datatype Precedence". The datatypes in a particular operation need to match, so lower-precedence types will implicitly up-convert to the highest precedence type being used. NVARCHAR is higher than VARCHAR, hence VARCHAR values will implicitly convert to NVARCHAR to complete the operation.

    The second query (where the upper-case "N" has been removed from the string literal, thus making it a VARCHAR(8000) ), returns 16,000 due to the REPLICATE operation returning 8000 VARCHAR characters which then get implicitly converted into NVARCHAR, which uses 2 bytes per each of those same characters.

    The third query (using the upper-case "N", but the variable is now VARCHAR instead of NVARCHAR), returns 4000 due to the REPLICATE, still being of type NVARCHAR(4000), returns 4000 characters, which initially take up 8000 bytes, but in the end are converted into VARCHAR (due to that being the datatype of the variable) which only requires 1 byte per each of those characters.

    The fourth query (no upper-case "N" and the VARCHAR variable) returns 8000 due to the REPLICATE operation returning 8000 VARCHAR characters (remember 'X' is interpreted as being VARCHAR(8000) ) which are then stored in a VARCHAR variable, which does not change anything, and these 8000 characters still require one byte per each character.

    Does that all make sense?

    Take care, Solomon...

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • Solomon Rutzky - Friday, September 21, 2018 8:05 AM

    Budd - Friday, September 21, 2018 7:55 AM

    Solomon Rutzky - Friday, September 21, 2018 7:16 AM

    Toreador - Friday, September 21, 2018 1:39 AM

    Got this right as I was hit by it a while back. Still feels like a bug to me, but Microsoft say not so it can't be πŸ˜‰
     You have to code it like this

    DECLARE @string NVARCHAR(max) = (
    SELECT REPLICATE(cast(N'X' as nvarchar(max)), 9000)
    )

    Annoying yes. But not a bug. The string literal / constant (" N'X' " in this case) is interpreted as being NVARCHAR(4000) when the query is parsed. Just like how " 1 " (by itself, not within single-quotes) is interpreted as being INT and not BIGINT or TINYINT or DECIMAL/NUMERIC.

    Take care,
    Solomon..

    So than if (" N'X' " in this case) is interpreted as being NVARCHAR(4000) .  How does the length get to 8000?
    And, well perhaps that explanation will explain the results of these;
    DECLARE @Nstring NVARCHAR(max) = (SELECT REPLICATE(N'X', 19000))
    SELECT DATALENGTH(@Nstring)
    GO
    DECLARE @Nstring NVARCHAR(max) = (SELECT REPLICATE('X', 19000))
    SELECT DATALENGTH(@Nstring)
    GO
    DECLARE @string VARCHAR(max) = (SELECT REPLICATE(N'X', 19000))
    SELECT DATALENGTH(@string)
    GO
    DECLARE @string VARCHAR(max) = (SELECT REPLICATE('X', 19000))
    SELECT DATALENGTH(@string)

    This is due to the function being DATALENGTH and not LEN. LEN measures "characters", while DATALENGTH measures actual bytes used to store the value. NVARCHAR is Unicode (via the UTF-16 Little Endian encoding) which is 2 or 4 bytes per "character" (with the most commonly used characters being the 2-byte variety). Hence 4000 NVARCHAR characters (assuming that they are all 2-byte characters), takes up 8000 bytes. VARCHAR (without the upper-case "N" prefix on the string literal) is 1 or 2 bytes per character (the vast majority of the time being just 1 byte), hence 4000 VARCHAR characters (assuming that they are all 1-byte characters), takes up just 4000 bytes.

    HOLD ON WHILE I ADD EXPLANATION OF THE OUTPUT OF THOSE 4 QUERIES...

    Take care, Solomon...

    AH - This is what I failed to recall correctly... >  LEN measures "characters", while DATALENGTH measures actual bytes .
    Thanks

  • See, this is proof T/SQL is insane and full of gotchas. There is no reason replicate() shouldn't convert its arguments to nvarchar(max) since it can return nvarchar(max). 

    MS REALLY needs to smooth out these little nits. How many people could have debugged this on their own? Worse, how many would have been unaware of this behavior and coded it improperly and not even caught the issue until some weird day years after the fact? And then spent a week on what really IS a bug, even if MS tries to shed the blame with a technical truth.

    Worst of all, T/SQL is FULL of these little gotchas.

    Bad show, MS. Very bad show.

  • Budd - Friday, September 21, 2018 8:12 AM

    AH - This is what I failed to recall correctly... >  LEN measures "characters", while DATALENGTH measures actual bytes .
    Thanks

    That's a large part of it. I have now updated my answer with details on why each of those other 3 queries returned those particular values.

    Take care, Solomon..

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • roger.plowman - Friday, September 21, 2018 8:14 AM

    See, this is proof T/SQL is insane and full of gotchas. There is no reason replicate() shouldn't convert its arguments to nvarchar(max) since it can return nvarchar(max). 

    MS REALLY needs to smooth out these little nits. How many people could have debugged this on their own? Worse, how many would have been unaware of this behavior and coded it improperly and not even caught the issue until some weird day years after the fact? And then spent a week on what really IS a bug, even if MS tries to shed the blame with a technical truth.

    Worst of all, T/SQL is FULL of these little gotchas.

    Bad show, MS. Very bad show.

    Again, while this is definitely annoying, and not an area that most folks are well versed in, I think it is a stretch to call it a bug. The REPLICATE function was provided a value of a specific type, in this case NVARCHAR(4000). To return NVARCHAR(MAX) when the replicated value is over 4000 bytes, but otherwise return NVARCHAR(4000), is not really possible, at least not without the return type being SQL_VARIANT. A function's signature is not dynamic. Yes, overloading is used to determine which version of the function to process internally, but once that function/method has been called, I don't see how it could then determine, at run-time, what the return type will be.

    Likewise, mathematical operations do not change the return type depending on what the result will be. If you divide two INT values such as "SELECT 3 / 2;", you get an INT value back (i.e. "1") instead of a DECIMAL or FLOAT of 1.500000 (such as you would get if you did "SELECT 3 / 2.0;").

    Take care, Solomon..

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • roger.plowman - Friday, September 21, 2018 8:14 AM

    See, this is proof T/SQL is insane and full of gotchas. There is no reason replicate() shouldn't convert its arguments to nvarchar(max) since it can return nvarchar(max). 

    MS REALLY needs to smooth out these little nits. How many people could have debugged this on their own? Worse, how many would have been unaware of this behavior and coded it improperly and not even caught the issue until some weird day years after the fact? And then spent a week on what really IS a bug, even if MS tries to shed the blame with a technical truth.

    Worst of all, T/SQL is FULL of these little gotchas.

    Bad show, MS. Very bad show.

    My problem with it all is that it's a SILENT truncation that produces no error but, I absolutely agree... MS didn't do well on these types of things.  Either do what is implied by the otherwise correctly written code or raise an error that something explicit needs to be done.  Personally, I'd prefer the error because the implied result may not actually be the desired result.

    Great question, BTW.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Friday, September 21, 2018 8:30 AM

    Either do what is implied by the otherwise correctly written code or raise an error that something explicit needs to be done.  Personally, I'd prefer the error because the implied result may not actually be the desired result.

    Or perhaps a warning?
    I think erroring would only be acceptable if the message is just "binary or string data would be truncated", with no indication that it was coming from the REPLICATE function πŸ˜‰ .

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • Solomon Rutzky - Friday, September 21, 2018 7:16 AM

    Toreador - Friday, September 21, 2018 1:39 AM

    Got this right as I was hit by it a while back. Still feels like a bug to me, but Microsoft say not so it can't be πŸ˜‰
     You have to code it like this

    DECLARE @string NVARCHAR(max) = (
    SELECT REPLICATE(cast(N'X' as nvarchar(max)), 9000)
    )

    Annoying yes. But not a bug. The string literal / constant (" N'X' " in this case) is interpreted as being NVARCHAR(4000) when the query is parsed. Just like how " 1 " (by itself, not within single-quotes) is interpreted as being INT and not BIGINT or TINYINT or DECIMAL/NUMERIC.

    Take care,
    Solomon..

    This is only because Microsoft doesn't consider poor design to be a bug. Silent truncation, which is what this is, is poor design.

  • I'd argue this IS a bug.

    If you pass a normal nvarchar LITERAL you get truncation. If you pass an nvarchar(max) you don't.

    Thus there ARE two overloaded functions or passing an nvarchar(max) would still produce the incorrect answer.

    However the real problem is that Unicode literals aren't nvarchar(max)!

  • roger.plowman - Friday, September 21, 2018 9:57 AM

    However the real problem is that Unicode literals aren't nvarchar(max)!

    Well, a Unicode literal of just N'a' is not NVARCHAR(MAX). But, any Unicode literal consisting of at least 8002 bytes is NVARCHAR(MAX) πŸ˜‰ .

    Run this in SSMS:
    SELECT REPLICATE(N'a', 4000);

    Copy and paste the resulting string into the following query:
    SELECT DATALENGTH(REPLICATE(N'paste_here', 2));

    Execute that and the result is 8000, because the input is interpreted as being NVARCHAR(4000). This is consistent with what we have been seeing and discussing so far.

    Now, add a "b" (or any single character) to the end of that Unicode literal inside the REPLICATE function (thus it will actually be 4001 characters long). Execute that and the result will be 16,004 because the literal was interpreted as being NVARCHAR(MAX).

    Take care, Solomon...

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply