Hi all. I have replied to Thom's forum post regarding storing data in
NVARCHAR . I mentioned in my reply that Unicode characters can be stored here, but that they need to be HTML-encoded since that sends only standard ASCII characters to the server (and if the article / post / question / etc is ever edited, all such characters need to be re-encoded else you will fall back into the original problem).
Looking at the db<>fiddle, the character in question is: U+25D8
That can be stored here via: ◘
Which will stored as that encoded value, and sent to the browser like that, but displayed as the desired character: ◘
However, for the most stable / transportable representation (one that won't get altered either by a different encoding or being processed multiple times), you should have such characters generated at the server by using the
NCHAR() function (as Jon noted above). This goes for ALL non-standard ASCII characters (meaning: any character with a value above 127, or
0x7F in hex). For example:
NCHAR(0x25D8) . Please note that for Supplementary Characters, it's best to specify them in terms of surrogate pairs instead of their U+10000 through U+10FFFF scalar value (i.e. use
NCHAR(0xD800) + NCHAR(0xDC00) instead of
NCHAR(0x10000) for that first value as the latter only works in DB's that have a default collation with
_SC in the name).
So, to recreate and expand on the original question, let's see if the following works:
CREATE TABLE dbo.Test (S VARCHAR(10) COLLATE Latin1_General_100_CI_AI_SC_UTF8);
DELETE FROM dbo.Test;
INSERT INTO dbo.Test (S)
VALUES (N'?'), -- 1 byte (standard ASCII)
(NCHAR(0x0102)), -- 2 bytes (Ă)
(NCHAR(0x25D8)), -- 3 bytes (◘)
(NCHAR(0xD83D) + NCHAR(0xDC30)); -- 4 bytes (🐰; all Supplementary characters)
SELECT S, LEN(REPLICATE(S,10000)) AS ReplicatedLength
For more details on UTF-8, please see my post:
Native UTF-8 Support in SQL Server 2019: Savior or False Prophet?