Creating Strings with Replicate

  • Shayn Thomas

    SSCertifiable

    Points: 5603

    Nice question and discussion

    Cheers

    ---------------------------------------------------------------------------------------
    The more you know, the more you know that you dont know

  • DamianC

    SSCertifiable

    Points: 7828

    interesting question

    - Damian

  • roger.plowman

    SSChampion

    Points: 10243

    Solomon Rutzky - Friday, September 21, 2018 10:14 AM

    roger.plowman - Friday, September 21, 2018 9:57 AM

    However the real problem is that Unicode literals aren't nvarchar(max)!

    Well, a Unicode literal of just N'a' is not NVARCHAR(MAX). But, any Unicode literal consisting of at least 8002 bytes is NVARCHAR(MAX) 😉 .

    Run this in SSMS:
    SELECT REPLICATE(N'a', 4000);

    Copy and paste the resulting string into the following query:
    SELECT DATALENGTH(REPLICATE(N'paste_here', 2));

    Execute that and the result is 8000, because the input is interpreted as being NVARCHAR(4000). This is consistent with what we have been seeing and discussing so far.

    Now, add a "b" (or any single character) to the end of that Unicode literal inside the REPLICATE function (thus it will actually be 4001 characters long). Execute that and the result will be 16,004 because the literal was interpreted as being NVARCHAR(MAX).

    Take care, Solomon...

    You do realize you've just proven my point, right?

  • Solomon Rutzky

    SSCoach

    Points: 16251

    roger.plowman - Wednesday, September 26, 2018 6:57 AM

    Solomon Rutzky - Friday, September 21, 2018 10:14 AM

    roger.plowman - Friday, September 21, 2018 9:57 AM

    However the real problem is that Unicode literals aren't nvarchar(max)!

    Well, a Unicode literal of just N'a' is not NVARCHAR(MAX). But, any Unicode literal consisting of at least 8002 bytes is NVARCHAR(MAX) 😉 .

    Run this in SSMS:
    SELECT REPLICATE(N'a', 4000);

    Copy and paste the resulting string into the following query:
    SELECT DATALENGTH(REPLICATE(N'paste_here', 2));

    Execute that and the result is 8000, because the input is interpreted as being NVARCHAR(4000). This is consistent with what we have been seeing and discussing so far.

    Now, add a "b" (or any single character) to the end of that Unicode literal inside the REPLICATE function (thus it will actually be 4001 characters long). Execute that and the result will be 16,004 because the literal was interpreted as being NVARCHAR(MAX).

    Take care, Solomon...

    You do realize you've just proven my point, right?

    Clearly not 😉 .  Perhaps you can elaborate? What I have proven is:

    1. string literals that are prefixed with an upper-case "N" and consist of 0 - 4000 characters (technically 0 - 8000 bytes) are interpreted as being NVARCHAR(4000).
    2. string literals that are prefixed with an upper-case "N" and consist of 4001 or more characters (technically 8002 or more bytes) are interpreted as being NVARCHAR(MAX).
    3. Passing in NVARCHAR(4000) to REPLICATE returns NVARCHAR(4000)
    4. Passing in NVARCHAR(MAX) to REPLICATE returns NVARCHAR(MAX)
    Hence:

    1. There are at least two overloads for REPLICATE -- one for NVARCHAR(4000) and one for NVARCHAR(MAX) -- (and at least 2 more that would handle VARCHAR(8000) and VARCHAR(MAX), plus they might also have overloads to handle NCHAR and CHAR); but we never disagreed about this.
    2. Your statement that " Unicode literals aren't nvarchar(max) " is incorrect (because they certainly can be).
    3. This is not a bug. It is an area that might could be improved via a warning or error if truncation would occur.
    Take care, Solomon...

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • roger.plowman

    SSChampion

    Points: 10243

    Solomon Rutzky - Wednesday, September 26, 2018 7:36 AM

    roger.plowman - Wednesday, September 26, 2018 6:57 AM

    Solomon Rutzky - Friday, September 21, 2018 10:14 AM

    roger.plowman - Friday, September 21, 2018 9:57 AM

    However the real problem is that Unicode literals aren't nvarchar(max)!

    Well, a Unicode literal of just N'a' is not NVARCHAR(MAX). But, any Unicode literal consisting of at least 8002 bytes is NVARCHAR(MAX) 😉 .

    Run this in SSMS:
    SELECT REPLICATE(N'a', 4000);

    Copy and paste the resulting string into the following query:
    SELECT DATALENGTH(REPLICATE(N'paste_here', 2));

    Execute that and the result is 8000, because the input is interpreted as being NVARCHAR(4000). This is consistent with what we have been seeing and discussing so far.

    Now, add a "b" (or any single character) to the end of that Unicode literal inside the REPLICATE function (thus it will actually be 4001 characters long). Execute that and the result will be 16,004 because the literal was interpreted as being NVARCHAR(MAX).

    Take care, Solomon...

    You do realize you've just proven my point, right?

    Clearly not 😉 .  Perhaps you can elaborate? What I have proven is:

    1. string literals that are prefixed with an upper-case "N" and consist of 0 - 4000 characters (technically 0 - 8000 bytes) are interpreted as being NVARCHAR(4000).
    2. string literals that are prefixed with an upper-case "N" and consist of 4001 or more characters (technically 8002 or more bytes) are interpreted as being NVARCHAR(MAX).
    3. Passing in NVARCHAR(4000) to REPLICATE returns NVARCHAR(4000)
    4. Passing in NVARCHAR(MAX) to REPLICATE returns NVARCHAR(MAX)
    Hence:

    1. There are at least two overloads for REPLICATE -- one for NVARCHAR(4000) and one for NVARCHAR(MAX) -- (and at least 2 more that would handle VARCHAR(8000) and VARCHAR(MAX), plus they might also have overloads to handle NCHAR and CHAR); but we never disagreed about this.
    2. Your statement that " Unicode literals aren't nvarchar(max) " is incorrect (because they certainly can be).
    3. This is not a bug. It is an area that might could be improved via a warning or error if truncation would occur.
    Take care, Solomon...

    The point being:

    1) There are two datatypes called 'NVARCHAR', one being limited and one being (more or less) unlimited.

    2) The replicate function can use either, but this is invisible to the developer. Also, it's pretty damn obscure, with no warning.

    3) Replicate's argument is NVARCHAR(4000), not NVARCHAR(MAX).

    4) This is unexpected behavior for most developers.

    5) The simplest fix would be for MS to convert all arguments for nvarchar(4000) to nvarchar(max). This would prevent a damn near impossible to find (and potentially extremely rare) bug from occurring.

    T/SQL is full of things like this and there's simply no excuse for it to have dragged on as long as it has. Fix the damn language already! It's already a fossil, no need to make it infuriatingly quirky as well! 🙂

  • Solomon Rutzky

    SSCoach

    Points: 16251

    roger.plowman - Wednesday, September 26, 2018 7:48 AM

    Solomon Rutzky - Wednesday, September 26, 2018 7:36 AM

    roger.plowman - Wednesday, September 26, 2018 6:57 AM

    You do realize you've just proven my point, right?

    Clearly not 😉 .  Perhaps you can elaborate? What I have proven is:

    1. string literals that are prefixed with an upper-case "N" and consist of 0 - 4000 characters (technically 0 - 8000 bytes) are interpreted as being NVARCHAR(4000).
    2. string literals that are prefixed with an upper-case "N" and consist of 4001 or more characters (technically 8002 or more bytes) are interpreted as being NVARCHAR(MAX).
    3. Passing in NVARCHAR(4000) to REPLICATE returns NVARCHAR(4000)
    4. Passing in NVARCHAR(MAX) to REPLICATE returns NVARCHAR(MAX)
    Hence:

    1. There are at least two overloads for REPLICATE -- one for NVARCHAR(4000) and one for NVARCHAR(MAX) -- (and at least 2 more that would handle VARCHAR(8000) and VARCHAR(MAX), plus they might also have overloads to handle NCHAR and CHAR); but we never disagreed about this.
    2. Your statement that " Unicode literals aren't nvarchar(max) " is incorrect (because they certainly can be).
    3. This is not a bug. It is an area that might could be improved via a warning or error if truncation would occur.
    Take care, Solomon...

    The point being:

    1) There are two datatypes called 'NVARCHAR', one being limited and one being (more or less) unlimited.

    Correct.

    2) The replicate function can use either, but this is invisible to the developer. Also, it's pretty damn obscure, with no warning.

    Correct that the REPLICATE function can accept either (along with some others), incorrect that it is "invisible" to developers. Lack of understanding is not the same as lack of visibility (please see #4 below).

    3) Replicate's argument is NVARCHAR(4000), not NVARCHAR(MAX).

    Not sure of your point here. It can also accept VARCHAR as well, even VARBINARY as shown here:

    SELECT REPLICATE(0x5540, 2);

    That returns "U@U@" which is clearly VARCHAR, not NVARCHAR, else it would have produced only 2 characters. The REPLICATE documentation even states that the first argument "Is an expression of a character string or binary data type. string_expression can be either character or binary data."

    4) This is unexpected behavior for most developers.

    Agreed. But again, this is due to common lack of understanding, hence why this was a good question. Also, again looking at the documentation, it states (regarding the first argument):
    "!NOTE: If string_expression is not of type varchar(max) or nvarchar(max), REPLICATE truncates the return value at 8,000 bytes. To return values greater than 8,000 bytes, string_expression must be explicitly cast to the appropriate large-value data type."

    Then, under Return Types, the documentation states:
    "Returns the same type as string_expression."

    5) The simplest fix would be for MS to convert all arguments for nvarchar(4000) to nvarchar(max). This would prevent a damn near impossible to find (and potentially extremely rare) bug from occurring.

    Simple yes, but also wrong. There is a definite performance hit for using the MAX types over their non-MAX counterparts. Just try the following test:

    DECLARE @NVC_4k NVARCHAR(4000) = N'aa';
    SET STATISTICS TIME ON;
    SELECT TOP (100000) REPLICATE(@NVC_4k, 1000)
    FROM [master].[sys].[columns] col
    CROSS JOIN [master].[sys].[all_columns] obj
    SET STATISTICS TIME OFF;
    -- CPU time = 922 ms, elapsed time = 7436 ms.

    DECLARE @NVC_MAX NVARCHAR(MAX) = N'aa';
    SET STATISTICS TIME ON;
    SELECT TOP (100000) REPLICATE(@NVC_MAX, 1000)
    FROM [master].[sys].[columns] col
    CROSS JOIN [master].[sys].[all_columns] obj
    SET STATISTICS TIME OFF;
    -- CPU time = 844 ms, elapsed time = 9346 ms.

    The comments below each query represent the best "elapsed time" across 10 executions (executed individually, not together as that sometimes skews the results). The NVARCHAR(MAX) variable, even though the value (and result of the REPLICATE)  would easily fit within NVARCHAR(4000), caused the query to take nearly 2 full seconds longer.

    Take care, Solomon...

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

Viewing 6 posts - 16 through 21 (of 21 total)

You must be logged in to reply to this topic. Login to reply