Them Con+cat+enatin’ Blues


We ran into a funny problem not long ago, revolving around a stored procedure that was building up a dynamic SQL string. The code had worked without a hitch in our development and test environments but blew up when it was moved to production due to the dynamic SQL being mysteriously truncated.

After considerable head-scratching and rummaging about, I found that Red Gate SQL Prompt was to blame, specifically the Format SQL function. It seems that one of the things it does is add the Unicode identifier “N” to any string literal being assigned to an NVARCHAR variable, and that innocuous looking character was wreaking havoc with our concatenation.

How, you ask? Answering that question involved digging under the hood of concatenation, and learning all sorts of interesting things, which I’m now going to share with you, because I can’t have been the only one to have been flummoxed by this behavior.

The place to start is with an understanding of the concatenation operator, and the concatenation function. The “+” operator – the overloaded plus sign – will be familiar to all. The CONCAT function first appeared in SQL Server 2012. Although the syntax differs, both work in very much the same way, and deliver the same results.

Both the function and the operator return a result that is the data type with the highest precedence. Let me repeat that because it’s important: the result is the data type with the highest precedence. What’s that mean? We’ll see in a bit, but meantime, there’s a little caveat hidden in the documentation. It’s this, taken from the MS doc on the concatenation operator:

“If the result of the concatenation of strings exceeds the limit of 8,000 bytes, the result is truncated. However, if at least one of the strings concatenated is a large value type, truncation does not occur.”

The CONCAT function works in much the same way:

“If none of the input arguments has a supported large object (LOB) type, then the return type truncates to 8000 characters in length, regardless of the return type.”

In practical terms, this means that the size of your chunks matters. If you’ve declared a variable of NVARCHAR(MAX) to pass along to sp_executesql, you must be cognizant of the size of the individual strings that are being put together – or else the results may not be what you’d expect.

I ran a series of tests concatenating strings of varying lengths into a variable of type NVARCHAR(MAX) and using LEN and DATALENGTH to catalog the resulting concatenated string:

12K characters assigned in four 3K chunks Length in bytes
8000 16000

This one surprised me initially. Then I read the entry on the operator and things became clearer. I combined four 3K string literals, for a total of 12K. Since none of them exceeded the 8K character limit, and since the literals themselves were not cast as NVARCHAR, SQL Server returned the result of the concatenation as VARCHAR, with the implied limit of 8K characters. Did SQL Server complain about – or even mention – the truncation? Nary a peep. How we love thee, o silent truncation! (The length in bytes reflects the nature of the NVARCHAR variable I assigned the result to).

Next, I tried combining two 6K chunks, with identical results:

12K characters assigned in two 6K chunks Length in bytes
8000 16000

The last test was with two chunks of unequal size, one 9K and one 3K:

12K characters assigned in one 3K and one 9K chunk Length in bytes
12004 24008

As you’d expect, since the 9K string is a “large value type” (or “large object type” if you prefer), the result was returned as NVARCHAR(MAX), and all was well. Tests with CONCAT returned identical results.

So, what’s all this got to do with the Unicode identifier N? And how did it cause our dynamic SQL to blow a gasket? Glad you asked. Remember that line I emphasized above? “The result is the data type with the highest precedence.”

Here’s what MS Docs has to say about N:

“When prefixing a string constant with the letter N, the implicit conversion will result in a Unicode string if the constant to convert does not exceed the max length for a Unicode string data type (4,000). Otherwise, the implicit conversion will result in a Unicode large-value (max).”

Our dynamic SQL consisted of several chunks, none of which broke the 8K barrier. The overall length was right around 4100 characters.  As long as SQL Server was able to treat the chunks as VARCHAR, we were fine, because our overall length was well within the 8K max. But as soon as the identifier was added, SQL Server returned NVARCHAR, the data type with the highest precedence, which silently truncated the results at 4K characters.

As is so often the case with SQL Server, things aren’t always as straightforward as they appear, and gremlins await the unwary.

Some takeaways:

  • Avoid the N identifier when concatenating (except as I’m going to mention in the next bullet point), and avoid explicit casts to NVARCHAR. That will insure that you have a minimum of 8K characters to work with.
  • If your final string needs to be longer than 8K, make sure that at least one of the constituent strings can be identified as a “large object type”. Ironically, the N identifier can actually help in this case: a 6K string left as VARCHAR will not qualify as a LOB, but converted to Unicode, it will, as shown here:
12K characters assigned in two 6K chunks Length in bytes
8000 16000
12K characters assigned in two 6K chunks with N Length in bytes
12001 24002

And courtesy of Solomon Rutzky, some better takeaways:

* always using the upper-case “N”

* always store into a variable of type `NVARCHAR(MAX)`

* to avoid truncation of string literals, simply ensure that one piece is converted to `NVARCHAR(MAX)`. For example: `SET @NVCmaxVariable = CONVERT(NVARCHAR(MAX), N’anything’) + N’something else’ + N’another’;`

References and further reading:

nchar and nvarchar (Transact-SQL)

char and varchar (Transact-SQL)

+ (String Concatenation) (Transact-SQL)

CONCAT (Transact-SQL)