• the sqlist - Wednesday, June 28, 2017 9:59 AM

    @Luis Cazares
    Your code is faulty. That STUFF is not needed and it causes at least the first letter omission from the text. This is how it should be:

    WITH cteBook AS(
    SELECT
        *,
      (
            SELECT SUBSTRING( TEXT, n, 1)
            FROM Tally
            WHERE SUBSTRING( TEXT, n, 1) LIKE '[0-9a-zA-Z ,.:;'']' COLLATE Latin1_General_Bin
            AND n <= LEN(TEXT)
            FOR XML PATH(''), TYPE
        ).value('./text()[1]', 'varchar(max)') CleanText
    FROM #BOOK
    )
    UPDATE cteBook
    SET TEXT = CleanText;

    The WHERE clause in the CleanText subquery already filters out the characters that don't match the regular expression. However this only eliminates "bad" characters but does not replace them with something equivalent in the case of encoding mismatch. For instance, how do you solve a text like this:

    Sir George Étienne Cartier Parkway

    If you eliminate Ãƒâ€° the result will be an incomplete street name.

    You're right, I forgot to remove that when I included the template. The code becomes even simpler. If a replacement is needed, we would just need to use a CASE expression and remove the WHERE clause. I coded it to only remove characters because that's what the code from the article is doing. I'm usually against changing data like this because it's easy to get into trouble and change data that shouldn't be changed.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2