Need to replace HTML encoded special characters and language charcarters in tables

  • Historically entries have been stored in a DB with web styling encoding and are similar to these (i know bad practice):

    select 'Α&ampNu;ΤΩΝΙ&ampOmicron;&ampUpsilon;'

    UNION ALL select 'ΑΡΒΑΝΙΤΗΣ'

    UNION ALL select 'Αρβανιτάκη'

    UNION ALL select 'Ασημένι&ampomicron;&ampupsilon;'

    Are there any functions in SQL that can decode this or will i need to call a web service from SSIS to get to HTML.decode?

    They should be stored as follows:

    ΑΝΤΩΝΙΟΥ

    ΑΡΒΑΝΙΤΗΣ

    Αρβανιτάκη

    Ασημένιου

    Thanks

    B

  • This works for me:

    DECLARE @XmlEntities as NVarchar(MAX) = N'<?xml version="1.0" encoding="utf-16"?>

    <!DOCTYPE schema [

    <!ENTITY Alpha "?">

    <!ENTITY ampNu "?">

    <!ENTITY Nu "?">

    <!ENTITY Tau "T">

    <!ENTITY Omega "O">

    <!ENTITY Omicron "?">

    <!ENTITY Iota "?">

    <!ENTITY Upsilon "?">

    <!ENTITY Rho "?">

    <!ENTITY Beta "?">

    <!ENTITY Eta "?">

    <!ENTITY Sigma "S">

    <!ENTITY alpha "a">

    <!ENTITY nu "?">

    <!ENTITY tau "t">

    <!ENTITY omega "O">

    <!ENTITY omicron "?">

    <!ENTITY iota "?">

    <!ENTITY upsilon "?">

    <!ENTITY rho "?">

    <!ENTITY beta "ß">

    <!ENTITY eta "?">

    <!ENTITY sigma "s">

    <!ENTITY Kappa "K">

    <!ENTITY kappa "?">

    <!ENTITY mu "µ">

    ]>

    ';

    WITH cte as

    (

    select '&ampAlpha;&ampampNu;&ampTau;&ampOmega;&ampNu;&ampIota;&ampOmicron;&ampUpsilon;' As htxt

    UNION ALL select '&ampAlpha;&ampRho;&ampBeta;&ampAlpha;&ampNu;&ampIota;&ampTau;&ampEta;&ampSigma;'

    UNION ALL select '&ampAlpha;&amprho;&ampbeta;&ampalpha;&ampnu;&ampiota;&amptau;ά&ampkappa;&ampeta;'

    UNION ALL select '&ampAlpha;&ampsigma;&ampeta;&ampmu;έ&ampnu;&ampiota;&ampomicron;&ampupsilon;'

    )

    , cteWrapped As

    (

    SELECT CONVERT(XML, @XmlEntities+N'<txt>'+htxt+N'</txt>', 2) as xWrapped

    FROM cte

    )

    SELECT T.c.value('.','nvarchar(80)')

    FROM cteWrapped As w

    CROSS APPLY w.xWrapped.nodes('/txt') T(c)

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Thanks Barry, thats pretty awesome. Is there anywhere I can a full list of characters its not just Greek specific.

    D

  • bugg (11/17/2013)


    Thanks Barry, thats pretty awesome. Is there anywhere I can a full list of characters its not just Greek specific.

    D

    Sure, here: http://www.fileformat.info/info/charset/UTF-16/list.htm, but you're looking at encoding all of Unicode/UTF-16. I think you'd be better off taking another approach (i.e., using SQLCLR, or fixing it before it gets into SQL).

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Hi Barry

    Thanks for that list, this is going to be a one off job to migrate some data, so still toying with using SQLCLR to do this.

    Cheers

    D

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply