SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Bad data import


Bad data import

Author
Message
Trey Staker
Trey Staker
Ten Centuries
Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)

Group: General Forum Members
Points: 1272 Visits: 2788
skjoldtc (4/29/2010)
65533 is a valid unicode number but represents a special replacement character. It is the highest value in the character set. dbowlin makes a good point about what values to check for. You would ahve to know the source and the target databases to know what's reasonable. I doubt that 65533 is reasonable under 99.9% of the cases but there may be rare instances.

Good question, though.


I believe 65533 is the highest unicode value. I don't think you identified what the junk data actually was but that it is outside of what can be interpreted. Out of curiosity did you look as the offending line, specifically around this area to see what the actuall hex was?

---------------------------------------------------------------------
Use Full Links:
KB Article from Microsoft on how to ask a question on a Forum
SQLRNNR
SQLRNNR
SSC-Dedicated
SSC-Dedicated (32K reputation)SSC-Dedicated (32K reputation)SSC-Dedicated (32K reputation)SSC-Dedicated (32K reputation)SSC-Dedicated (32K reputation)SSC-Dedicated (32K reputation)SSC-Dedicated (32K reputation)SSC-Dedicated (32K reputation)

Group: General Forum Members
Points: 32266 Visits: 18552
Great question. I learned something from this one.

Nice job.



Jason AKA CirqueDeSQLeil
I have given a name to my pain...
MCM SQL Server, MVP


SQL RNNR

Posting Performance Based Questions - Gail Shaw

Oleg Netchaev
Oleg Netchaev
SSCommitted
SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)

Group: General Forum Members
Points: 1777 Visits: 1813
skjoldtc (4/29/2010)
65533 is a valid unicode number but represents a special replacement character. It is the highest value in the character set. dbowlin makes a good point about what values to check for. You would ahve to know the source and the target databases to know what's reasonable. I doubt that 65533 is reasonable under 99.9% of the cases but there may be rare instances.

Good question, though.

Do you know why 65533 is the highest value? Theoretically, the highest should be 65535. This is consistent with nchar implementation, i.e.

select nchar(65533); -- returns value
select nchar(65535); -- returns value
select nchar(65536); -- returns null because 65536 is obviously not valid
-- as it cannot really fit into 2 bytes



Good question, I learned something new today.

Oleg
OCTom
OCTom
Hall of Fame
Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)

Group: General Forum Members
Points: 3121 Visits: 4152
Oleg Netchaev (4/29/2010)
skjoldtc (4/29/2010)
65533 is a valid unicode number but represents a special replacement character. It is the highest value in the character set. dbowlin makes a good point about what values to check for. You would ahve to know the source and the target databases to know what's reasonable. I doubt that 65533 is reasonable under 99.9% of the cases but there may be rare instances.

Good question, though.

Do you know why 65533 is the highest value? Theoretically, the highest should be 65535. This is consistent with nchar implementation, i.e.

select nchar(65533); -- returns value
select nchar(65535); -- returns value
select nchar(65536); -- returns null because 65536 is obviously not valid
-- as it cannot really fit into 2 bytes



Good question, I learned something new today.

Oleg


No. I don't know why that is.

I now recall that the AS400 has a max CCSID (coded character set ID) of 65533. That makes sense since the original data came from a mainframe. IBM mainframes and AS400 both use EBCDIC. There are some unprintable and undisplayable characters and 65533 is used as a replacement on those systems. Basically, IIRC, it ends being a printable/displayble character defined system-wide that is substituted. So, you could define that a ~ (or any other printable/displayable character) prints or displays instead of throwing an error.

At my age, that's a lot to recall, so, I may be mistaken. ;-)
Oleg Netchaev
Oleg Netchaev
SSCommitted
SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)

Group: General Forum Members
Points: 1777 Visits: 1813
skjoldtc (4/29/2010)


No. I don't know why that is.

I now recall that the AS400 has a max CCSID (coded character set ID) of 65533. That makes sense since the original data came from a mainframe. IBM mainframes and AS400 both use EBCDIC. There are some unprintable and undisplayable characters and 65533 is used as a replacement on those systems. Basically, IIRC, it ends being a printable/displayble character defined system-wide that is substituted. So, you could define that a ~ (or any other printable/displayable character) prints or displays instead of throwing an error.

At my age, that's a lot to recall, so, I may be mistaken. ;-)

Thank you very much for information, now everything makes perfect sense. I just found on the List of Unicode Characters page that the maximum available printable character code is not FFFF (65535) like I assumed, but is indeed FFFD (65533), and it is officially called "Replacement Character". 65534 and 65535 do not represent anything, they are so-called noncharacters much like anything in the range from FDD0 to FDEF (64976 to 65007).

Oleg
Steve Jones
Steve Jones
SSC Guru
SSC Guru (62K reputation)SSC Guru (62K reputation)SSC Guru (62K reputation)SSC Guru (62K reputation)SSC Guru (62K reputation)SSC Guru (62K reputation)SSC Guru (62K reputation)SSC Guru (62K reputation)

Group: Administrators
Points: 62390 Visits: 19102
Very interesting question. Once again the debate has proven extremely valuable.

Follow me on Twitter: @way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com
jlennartz
jlennartz
SSC Eights!
SSC Eights! (830 reputation)SSC Eights! (830 reputation)SSC Eights! (830 reputation)SSC Eights! (830 reputation)SSC Eights! (830 reputation)SSC Eights! (830 reputation)SSC Eights! (830 reputation)SSC Eights! (830 reputation)

Group: General Forum Members
Points: 830 Visits: 1197
As usual the discussion adds to the knowledge I gain from the QotD.

Thanks for the knowledge. :-)
Abrar Ahmad_
Abrar Ahmad_
SSC-Addicted
SSC-Addicted (444 reputation)SSC-Addicted (444 reputation)SSC-Addicted (444 reputation)SSC-Addicted (444 reputation)SSC-Addicted (444 reputation)SSC-Addicted (444 reputation)SSC-Addicted (444 reputation)SSC-Addicted (444 reputation)

Group: General Forum Members
Points: 444 Visits: 1294
Hey,

There is missing any concrete example "Not Concrete Programming Hehe " to get rid of problem "Data could not be imported because text was truncated or characters do not exist in the destination codepage"

If i identified it is the code page conflict... how it can be done with in the code, rather than to manually change the destination/code pages.

Any Concrete or Flexible bee over it?

Thanks

Arjun SreeVastsva
Arjun SreeVastsva
SSCrazy
SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)

Group: General Forum Members
Points: 2672 Visits: 1658
The CODEPOINT() is a rarely used function in the SSIS expression language, but can be an effective tool in your ETL arsenal in some cases.
Stewart "Arturius" Campbell
Stewart "Arturius" Campbell
SSCrazy Eights
SSCrazy Eights (8.9K reputation)SSCrazy Eights (8.9K reputation)SSCrazy Eights (8.9K reputation)SSCrazy Eights (8.9K reputation)SSCrazy Eights (8.9K reputation)SSCrazy Eights (8.9K reputation)SSCrazy Eights (8.9K reputation)SSCrazy Eights (8.9K reputation)

Group: General Forum Members
Points: 8859 Visits: 7281
Abrar Ahmad_ (4/29/2010)
Hey,

There is missing any concrete example "Not Concrete Programming Hehe " to get rid of problem "Data could not be imported because text was truncated or characters do not exist in the destination codepage"

If i identified it is the code page conflict... how it can be done with in the code, rather than to manually change the destination/code pages.

Any Concrete or Flexible bee over it?

Thanks


In SSIS, an example of how to identify and correct would be:

CODEPOINT(ColumnName) < MinCodePageValue || CODEPOINT(ColumnName) >MaxCodePageValue ? (DT_WSTR,26)NULL(DT_WSTR,26): (DT_WSTR,26)ColumnName



a T-SQL equivalent would be:

SELECT CASE
WHEN ASCII(ColumnName) < MinCodePageValue OR ASCII(ColumnName) > MaxCodePageValue THEN NULL
ELSE ColumnName
END AS ColumnName
.
.
.



____________________________________________
Space, the final frontier? not any more...
All limits henceforth are self-imposed.
“libera tute vulgaris ex”
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search