|
|
|
Ten Centuries
      
Group: General Forum Members
Last Login: Wednesday, May 01, 2013 4:52 PM
Points: 1,379,
Visits: 2,626
|
|
skjoldtc (4/29/2010) 65533 is a valid unicode number but represents a special replacement character. It is the highest value in the character set. dbowlin makes a good point about what values to check for. You would ahve to know the source and the target databases to know what's reasonable. I doubt that 65533 is reasonable under 99.9% of the cases but there may be rare instances.
Good question, though.
I believe 65533 is the highest unicode value. I don't think you identified what the junk data actually was but that it is outside of what can be interpreted. Out of curiosity did you look as the offending line, specifically around this area to see what the actuall hex was?
--------------------------------------------------------------------- Use Full Links: KB Article from Microsoft on how to ask a question on a Forum
|
|
|
|
|
SSCoach
         
Group: General Forum Members
Last Login: Today @ 10:25 AM
Points: 18,752,
Visits: 12,337
|
|
|
|
|
|
SSCommitted
      
Group: General Forum Members
Last Login: Tuesday, May 21, 2013 10:53 AM
Points: 1,662,
Visits: 1,709
|
|
skjoldtc (4/29/2010) 65533 is a valid unicode number but represents a special replacement character. It is the highest value in the character set. dbowlin makes a good point about what values to check for. You would ahve to know the source and the target databases to know what's reasonable. I doubt that 65533 is reasonable under 99.9% of the cases but there may be rare instances.
Good question, though. Do you know why 65533 is the highest value? Theoretically, the highest should be 65535. This is consistent with nchar implementation, i.e.
select nchar(65533); -- returns value select nchar(65535); -- returns value select nchar(65536); -- returns null because 65536 is obviously not valid -- as it cannot really fit into 2 bytes Good question, I learned something new today.
Oleg
|
|
|
|
|
SSCrazy
      
Group: General Forum Members
Last Login: Today @ 6:52 AM
Points: 2,018,
Visits: 2,852
|
|
Oleg Netchaev (4/29/2010)
skjoldtc (4/29/2010) 65533 is a valid unicode number but represents a special replacement character. It is the highest value in the character set. dbowlin makes a good point about what values to check for. You would ahve to know the source and the target databases to know what's reasonable. I doubt that 65533 is reasonable under 99.9% of the cases but there may be rare instances.
Good question, though.Do you know why 65533 is the highest value? Theoretically, the highest should be 65535. This is consistent with nchar implementation, i.e. select nchar(65533); -- returns value select nchar(65535); -- returns value select nchar(65536); -- returns null because 65536 is obviously not valid -- as it cannot really fit into 2 bytes Good question, I learned something new today. Oleg
No. I don't know why that is.
I now recall that the AS400 has a max CCSID (coded character set ID) of 65533. That makes sense since the original data came from a mainframe. IBM mainframes and AS400 both use EBCDIC. There are some unprintable and undisplayable characters and 65533 is used as a replacement on those systems. Basically, IIRC, it ends being a printable/displayble character defined system-wide that is substituted. So, you could define that a ~ (or any other printable/displayable character) prints or displays instead of throwing an error.
At my age, that's a lot to recall, so, I may be mistaken.
|
|
|
|
|
SSCommitted
      
Group: General Forum Members
Last Login: Tuesday, May 21, 2013 10:53 AM
Points: 1,662,
Visits: 1,709
|
|
skjoldtc (4/29/2010)
No. I don't know why that is. I now recall that the AS400 has a max CCSID (coded character set ID) of 65533. That makes sense since the original data came from a mainframe. IBM mainframes and AS400 both use EBCDIC. There are some unprintable and undisplayable characters and 65533 is used as a replacement on those systems. Basically, IIRC, it ends being a printable/displayble character defined system-wide that is substituted. So, you could define that a ~ (or any other printable/displayable character) prints or displays instead of throwing an error. At my age, that's a lot to recall, so, I may be mistaken.  Thank you very much for information, now everything makes perfect sense. I just found on the List of Unicode Characters page that the maximum available printable character code is not FFFF (65535) like I assumed, but is indeed FFFD (65533), and it is officially called "Replacement Character". 65534 and 65535 do not represent anything, they are so-called noncharacters much like anything in the range from FDD0 to FDEF (64976 to 65007).
Oleg
|
|
|
|
|
SSC-Dedicated
           
Group: Administrators
Last Login: Today @ 9:24 AM
Points: 31,433,
Visits: 13,748
|
|
|
|
|
|
Mr or Mrs. 500
      
Group: General Forum Members
Last Login: Yesterday @ 1:37 PM
Points: 579,
Visits: 869
|
|
As usual the discussion adds to the knowledge I gain from the QotD.
Thanks for the knowledge.
|
|
|
|
|
SSC Veteran
      
Group: General Forum Members
Last Login: Saturday, May 18, 2013 5:49 AM
Points: 202,
Visits: 1,043
|
|
Hey,
There is missing any concrete example "Not Concrete Programming " to get rid of problem "Data could not be imported because text was truncated or characters do not exist in the destination codepage"
If i identified it is the code page conflict... how it can be done with in the code, rather than to manually change the destination/code pages.
Any Concrete or Flexible bee over it?
Thanks
|
|
|
|
|
SSCrazy
      
Group: General Forum Members
Last Login: Wednesday, May 08, 2013 3:44 AM
Points: 2,173,
Visits: 1,350
|
|
| The CODEPOINT() is a rarely used function in the SSIS expression language, but can be an effective tool in your ETL arsenal in some cases.
|
|
|
|
|
Hall of Fame
       
Group: General Forum Members
Last Login: Yesterday @ 8:31 AM
Points: 3,129,
Visits: 4,312
|
|
Abrar Ahmad_ (4/29/2010) Hey,
There is missing any concrete example "Not Concrete Programming " to get rid of problem "Data could not be imported because text was truncated or characters do not exist in the destination codepage"
If i identified it is the code page conflict... how it can be done with in the code, rather than to manually change the destination/code pages.
Any Concrete or Flexible bee over it?
Thanks
In SSIS, an example of how to identify and correct would be:
CODEPOINT(ColumnName) < MinCodePageValue || CODEPOINT(ColumnName) >MaxCodePageValue ? (DT_WSTR,26)NULL(DT_WSTR,26): (DT_WSTR,26)ColumnName
a T-SQL equivalent would be:
SELECT CASE WHEN ASCII(ColumnName) < MinCodePageValue OR ASCII(ColumnName) > MaxCodePageValue THEN NULL ELSE ColumnName END AS ColumnName . . .
____________________________________________ Space, the final frontier? not any more... All limits henceforth are self-imposed. “libera tute vulgaris ex”
|
|
|
|