• Oleg Netchaev (4/29/2010)


    skjoldtc (4/29/2010)


    65533 is a valid unicode number but represents a special replacement character. It is the highest value in the character set. dbowlin makes a good point about what values to check for. You would ahve to know the source and the target databases to know what's reasonable. I doubt that 65533 is reasonable under 99.9% of the cases but there may be rare instances.

    Good question, though.

    Do you know why 65533 is the highest value? Theoretically, the highest should be 65535. This is consistent with nchar implementation, i.e.

    select nchar(65533); -- returns value

    select nchar(65535); -- returns value

    select nchar(65536); -- returns null because 65536 is obviously not valid

    -- as it cannot really fit into 2 bytes

    Good question, I learned something new today.

    Oleg

    No. I don't know why that is.

    I now recall that the AS400 has a max CCSID (coded character set ID) of 65533. That makes sense since the original data came from a mainframe. IBM mainframes and AS400 both use EBCDIC. There are some unprintable and undisplayable characters and 65533 is used as a replacement on those systems. Basically, IIRC, it ends being a printable/displayble character defined system-wide that is substituted. So, you could define that a ~ (or any other printable/displayable character) prints or displays instead of throwing an error.

    At my age, that's a lot to recall, so, I may be mistaken. 😉