Can post please be stored in nvarchar?

  • It actually surprises me this should even be a request. Post data, however, appears to be stored as a varchar, not an nvarchar which is causing malformed post, including today's QOTD. Obviously this data has not been lost, but this should be fixed, as it can't be assumed that post will only contain characters from the code page.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Thom, the issue you are talking about appears to be what I reported 19 months ago:

    https://www.sqlservercentral.com/forums/topic/major-issue-to-fix-asap-data-loss-for-characters-not-included-in-code-page-1252-2

    My guess is that the data is using NVARCHAR as the datatype (based on looking at the datamodel provided by the WordPress via .NET project that RedGate used for this site), but that there's either a VARCHAR input parameter / variable along the way, or that the input data is being wrapped in single-quotes without the "N" prefix. Either way, very frustrating indeed.

    You can still get Unicode characters to show up in posts, questions, etc, but you need to HTML encode them first so that the data sent into the server is all standard ASCII characters. However, if a post / question / article / etc containing HTML encoded entities is ever edited, such entities will be rendered in the editor as the actual Unicode character, and thus saving the update will then convert the character into one or two "?" depending on the character being BMP or Supplementary. For example:

    Passing in: Ă

    should store correctly as that encoded string and appear as follows when displayed later: Ă

    For more details, please see my investigation here:

    https://www.sqlservercentral.com/forums/topic/new-blog-import-process-mangles-blog-content-2#post-2255689

     

    Take care,

    Solomon..

     

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • Thanks Solomon. It's bitterly disappointing that this was raised back in April of last year and not fixed yet; I didn't even think to look for a report and I (foolishly) assumed it would likely something that was caused in a recent update.

    Looks like it's upstream though, as I tested on a Project Nami install I have and experienced the same problem. I'll see if i can find the problem myself in my own site and pass the feedback on.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Thom A wrote:

    Looks like it's upstream though, as I tested on a Project Nami install I have and experienced the same problem. I'll see if i can find the problem myself in my own site and pass the feedback on.

    Sorry, I didn't mean to imply that the source of the problem is in RedGate's implementation of the code, just that I believe the problem is in code, not in the datatype used to store the data.

    I haven't thought about this in 19 months, but I did some investigation at the time (going through the Project Nami GitHub repo) and believe that I had found the problem, but it was buried pretty deep and I was about to post an issue or even PR but they had updated the master branch with a partial update of the code (or something like that) and it was unclear if what I had found would be changing so I didn't want to do a bunch of work just to have it invalidated by changes about to merge in. I believe that I took some notes at the time and will see if I can find them. Part of the issue is that I don't have an install for this and so can't really test it (nor any time to set up such an environment). If I can find what I saw at the time and pass along some suggested changes, could you help me test it? And if successful, I can post the PR?

    Please let me know. Take care, Solomon..

     

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • Solomon Rutzky wrote:

    If I can find what I saw at the time and pass along some suggested changes, could you help me test it? And if successful, I can post the PR? 

    Sure, sounds like a good idea. 🙂

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • I haven't found the specific cause yet, but I am getting closer. I did just find, and submit a fix / PR for, an occasionally serious bug:

    Improve RegEx for "IF in SELECT statement" in translations.php

     

    Will keep looking for the cause of this one, though I did come across an open issue that appears to suggest that a config setting might also get this working:

    Special characters not showing correct #303

    Take care,

    Solomon...

     

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • Test post:

    trying to see if the following character is converted to a best-fit mapping: 〛

    ]

    If it comes back as a regular ] then it is being converted via best-fit, which points more directly at the problem being in SQL Server than in the PHP code.

    And the following should come back as ?? :  𒍅

    ??

     

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • Hey Thom. I think I found a potential fix for this issue. I actually have two solutions, but one requires setting up the Project Nami DB to use a _UTF8 collation during the install, so probably not an option for your existing system. That also then requires SQL Server 2019 (or newer, whenever that happens). That option is the easier of the two, but as I said, kinda needs to happen from the beginning (technically you can change the collation of the DB and all of the objects in one shot, but if you had any characters with values in the 128 - 255 range, then they would get corrupted).

    The second option is to add a single upper-case N to a single line of the PHP code, in the prepare() function that all of the SQL statements (I believe) get run through. I have this written up in the following Issue for that project:

    Unicode (NVARCHAR) data converted to 8-bit code page (VARCHAR) as it is stored #417

    Currently, the line to edit is #1351 of wp-db.php, but that same line of code might be a different line # if you have a different version than what's checked into the master branch. The line in question is currently:

    $query = preg_replace( '/(?<!%)%s/', "'%s'", $query ); // Quote the strings, avoiding escaped strings like %%s.

    and it should probably be:

    $query = preg_replace( '/(?<!%)%s/', "N'%s'", $query ); // Quote the strings, avoiding escaped strings like %%s.

    I have no way to test this, though I have tested the theory with PHP in general and it does fix the problem. As I mention in that GitHub issue, I'm a little concerned about non-string values that still get quoted, such as dates, times, and datetimes, and UNIQUEIDENTIFIERs / GUIDs. Those types of values will work in the current scheme of using single-quotes without an upper-case N but won't work if the N is there.

    Please note that you need to have your connection using the 'CharacterSet'=> 'UTF-8' option in the array that gets passed into sqlsrv_connect(). Most likely you already have this set (if you save this character, 👾 , and it comes back as ?? instead of 👾 then you have it set).

     

    Take care,

    Solomon...

     

     

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • Hi Solomon,

    Sorry not replied to this, December wasn't a great month for me, so didn't have my head in the game. I'll have a look at this later this week and report back. Thanks for your effort on this!

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Yer welcome. And, no worries. 2020 wasn't a great month for me (and it was terribly long) ;-). Just let me know what you observe. If it works I will submit a PR for it.

    Take care,

    Solomon...

     

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • Hi Solomon,

    Yep, adding the notation character to that one line appears to do this job. I tested Comments, Posts, and Pages, and all 3 now correctly stored the characters.Well done the work here!

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Great! Glad to hear that the testing so far is looking positive. Do you mind a few more tests, just to help make sure that other scenarios work, or at least aren't broken? One area that requires single quotes but cannot be prefixed with an upper-case "N" is dates. There aren't too many places that would have a date, but perhaps change the post date from "immediate" to either scheduled in the future or post-dated? And creating a post / page is the main thing, but what about round-trip issues when updating? So here is the list of tests that I think will mostly confirm this as a real fix:

    1. Use supplementary character(s) in titles for posts / pages.
    2. Use supplementary character(s) in post description.
    3. Use supplementary character(s) in post slug.
    4. Change post date to a future date.
    5. Change date (for same post that was just set to a future date) to a date in the past.
    6. After creating a post with one or more supplementary characters, update the post with minor changes and save. Does it view correctly? And, can it be edited again without losing the supplementary characters?

    Thanks again for taking the time to test this. If all of these scenarios pass, then I will submit the PR.

    Take care,

    Solomon...

     

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • All looks good; you can see snippet of the silliness below. 1, 2 and 6 I'd already tested. I didn't, however, test changing the scheduled date to a historical one as that would publish the post and I didn't want to do that. I did use a future date, and then changed it to another in the future and it was fine.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Hi Thom. Thanks again for doing this testing. I had forgotten about tags and categories, but I do see that you were able to add "Alien Monster" to them. Cool. And yes, that future date change should be fine.

    Just out of curiosity, since I only mentioned updating the post content before, can you please confirm that you were able to successfully update the following without losing the supplementary character:

    1. Title
    2. Description
    3. Tag name
    4. Category name

     

    Thanks. And in the mean time, I will go ahead and submit that PR given that it's highly unlikely these remaining tests will fail given all that have succeeded so far.

     

    Take care,

    Solomon..

     

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • Hi Solomon,

    Yep, I amended everything I could noticed no loss of characters, so it appears that everything does go through that one line that you highlighted for change.

    If there is something missed, hopefully whomever reviews your PR addresses it in the feedback or even adds to it.

    Thanks again!

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply