﻿<?xml version='1.0' encoding='UTF-8'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/"><channel><title>SQLServerCentral / Article Discussions / Article Discussions by Author / Discuss content posted by Stewart Campbell  / Bad data import / Latest Posts</title><generator>InstantForum.NET v2.9.0</generator><description>SQLServerCentral</description><link>http://www.sqlservercentral.com/Forums/</link><webMaster>notifications@sqlservercentral.com</webMaster><lastBuildDate>Thu, 23 May 2013 12:47:15 GMT</lastBuildDate><ttl>20</ttl><item><title>RE: Bad data import</title><link>http://www.sqlservercentral.com/Forums/Topic912500-1684-1.aspx</link><description>Very interesting.Thanks!</description><pubDate>Fri, 30 Apr 2010 11:02:55 GMT</pubDate><dc:creator>Tom Garth</dc:creator></item><item><title>RE: Bad data import</title><link>http://www.sqlservercentral.com/Forums/Topic912500-1684-1.aspx</link><description>[quote][b]stewartc-708166 (4/30/2010)[/b][hr][quote]In SSIS, an example of how to identify and correct would be:[code="vb"]CODEPOINT([i]ColumnName[/i]) &amp;lt; [i]MinCodePageValue[/i] || CODEPOINT([i]ColumnName[/i]) &amp;gt;[i]MaxCodePageValue[/i] ? (DT_WSTR,26)NULL(DT_WSTR,26): (DT_WSTR,26)ColumnName[/code]a T-SQL equivalent would be:[code="sql"]SELECT CASE     WHEN ASCII(ColumnName) &amp;lt; [i]MinCodePageValue[/i] OR ASCII(ColumnName) &amp;gt; [i]MaxCodePageValue[/i] THEN NULL     ELSE ColumnNameEND AS ColumnName...[/code][/quote]Rather than use NULL here, mightn't it be better to find out what replacement character the mainframe app would have used if required to display this data and use that character?very good question and discussion.  Lots for me to learn from it.</description><pubDate>Fri, 30 Apr 2010 03:39:00 GMT</pubDate><dc:creator>L' Eomot Inversé</dc:creator></item><item><title>RE: Bad data import</title><link>http://www.sqlservercentral.com/Forums/Topic912500-1684-1.aspx</link><description>[quote][b]Abrar Ahmad_ (4/29/2010)[/b][hr][font="Verdana"]Hey, There is missing any concrete example "Not Concrete Programming :hehe: " to get rid of problem "Data could not be imported because text was truncated or characters do not exist in the destination codepage"If i identified it is the code page conflict... how it can be done with in the code, rather than to manually change the destination/code pages.Any Concrete or Flexible bee over it?Thanks[/font][/quote]In SSIS, an example of how to identify and correct would be:[code="vb"]CODEPOINT([i]ColumnName[/i]) &amp;lt; [i]MinCodePageValue[/i] || CODEPOINT([i]ColumnName[/i]) &amp;gt;[i]MaxCodePageValue[/i] ? (DT_WSTR,26)NULL(DT_WSTR,26): (DT_WSTR,26)ColumnName[/code]a T-SQL equivalent would be:[code="sql"]SELECT CASE     WHEN ASCII(ColumnName) &amp;lt; [i]MinCodePageValue[/i] OR ASCII(ColumnName) &amp;gt; [i]MaxCodePageValue[/i] THEN NULL     ELSE ColumnNameEND AS ColumnName...[/code]</description><pubDate>Fri, 30 Apr 2010 03:17:54 GMT</pubDate><dc:creator>Stewart "Arturius" Campbell</dc:creator></item><item><title>RE: Bad data import</title><link>http://www.sqlservercentral.com/Forums/Topic912500-1684-1.aspx</link><description>The CODEPOINT() is a rarely used function in the SSIS expression language, but can be an effective tool in your ETL arsenal in some cases.</description><pubDate>Fri, 30 Apr 2010 02:51:36 GMT</pubDate><dc:creator>Sree Arjun Div</dc:creator></item><item><title>RE: Bad data import</title><link>http://www.sqlservercentral.com/Forums/Topic912500-1684-1.aspx</link><description>[font="Verdana"]Hey, There is missing any concrete example "Not Concrete Programming :hehe: " to get rid of problem "Data could not be imported because text was truncated or characters do not exist in the destination codepage"If i identified it is the code page conflict... how it can be done with in the code, rather than to manually change the destination/code pages.Any Concrete or Flexible bee over it?Thanks[/font]</description><pubDate>Thu, 29 Apr 2010 23:31:04 GMT</pubDate><dc:creator>Abrar Ahmad_</dc:creator></item><item><title>RE: Bad data import</title><link>http://www.sqlservercentral.com/Forums/Topic912500-1684-1.aspx</link><description>As usual the discussion adds to the knowledge I gain from the QotD.Thanks for the knowledge.    :-)</description><pubDate>Thu, 29 Apr 2010 15:17:03 GMT</pubDate><dc:creator>jlennartz</dc:creator></item><item><title>RE: Bad data import</title><link>http://www.sqlservercentral.com/Forums/Topic912500-1684-1.aspx</link><description>Very interesting question. Once again the debate has proven extremely valuable.</description><pubDate>Thu, 29 Apr 2010 13:42:47 GMT</pubDate><dc:creator>Steve Jones - SSC Editor</dc:creator></item><item><title>RE: Bad data import</title><link>http://www.sqlservercentral.com/Forums/Topic912500-1684-1.aspx</link><description>[quote][b]skjoldtc (4/29/2010)[/b][hr]No. I don't know why that is. I now recall that the AS400 has a max CCSID (coded character set ID) of 65533. That makes sense since the original data came from a mainframe. IBM mainframes and AS400 both use EBCDIC. There are some unprintable and undisplayable characters and 65533 is used as a replacement on those systems. Basically, IIRC, it ends being a printable/displayble character defined system-wide that is substituted. So, you could define that a ~ (or any other printable/displayable character) prints or displays instead of throwing an error. At my age, that's a lot to recall, so, I may be mistaken. ;-)[/quote]Thank you very much for information, now everything makes perfect sense. I just found on the [url=http://en.wikipedia.org/wiki/List_of_Unicode_characters]List of Unicode Characters[/url] page that the maximum available printable character code is not FFFF (65535) like I assumed, but is indeed FFFD (65533), and it is officially called "Replacement Character". 65534 and 65535 do not represent anything, they are so-called noncharacters much like anything in the range from FDD0 to FDEF (64976 to 65007).Oleg</description><pubDate>Thu, 29 Apr 2010 12:57:28 GMT</pubDate><dc:creator>Oleg Netchaev</dc:creator></item><item><title>RE: Bad data import</title><link>http://www.sqlservercentral.com/Forums/Topic912500-1684-1.aspx</link><description>[quote][b]Oleg Netchaev (4/29/2010)[/b][hr][quote][b]skjoldtc (4/29/2010)[/b][hr]65533 is a valid unicode number but represents a special replacement character. It is the highest value in the character set.  dbowlin makes a good point about what values to check for. You would ahve to know the source and the target databases to know what's reasonable. I doubt that 65533 is reasonable under 99.9% of the cases but there may be rare instances. Good question, though.[/quote]Do you know why 65533 is the highest value? Theoretically, the highest should be 65535. This is consistent with nchar implementation, i.e.[code="sql"]select nchar(65533); -- returns valueselect nchar(65535); -- returns valueselect nchar(65536); -- returns null because 65536 is obviously not valid-- as it cannot really fit into 2 bytes[/code]Good question, I learned something new today.Oleg[/quote]No. I don't know why that is. I now recall that the AS400 has a max CCSID (coded character set ID) of 65533. That makes sense since the original data came from a mainframe. IBM mainframes and AS400 both use EBCDIC. There are some unprintable and undisplayable characters and 65533 is used as a replacement on those systems. Basically, IIRC, it ends being a printable/displayble character defined system-wide that is substituted. So, you could define that a ~ (or any other printable/displayable character) prints or displays instead of throwing an error. At my age, that's a lot to recall, so, I may be mistaken. ;-)</description><pubDate>Thu, 29 Apr 2010 12:12:23 GMT</pubDate><dc:creator>OCTom</dc:creator></item><item><title>RE: Bad data import</title><link>http://www.sqlservercentral.com/Forums/Topic912500-1684-1.aspx</link><description>[quote][b]skjoldtc (4/29/2010)[/b][hr]65533 is a valid unicode number but represents a special replacement character. It is the highest value in the character set.  dbowlin makes a good point about what values to check for. You would ahve to know the source and the target databases to know what's reasonable. I doubt that 65533 is reasonable under 99.9% of the cases but there may be rare instances. Good question, though.[/quote]Do you know why 65533 is the highest value? Theoretically, the highest should be 65535. This is consistent with nchar implementation, i.e.[code="sql"]select nchar(65533); -- returns valueselect nchar(65535); -- returns valueselect nchar(65536); -- returns null because 65536 is obviously not valid-- as it cannot really fit into 2 bytes[/code]Good question, I learned something new today.Oleg</description><pubDate>Thu, 29 Apr 2010 11:13:40 GMT</pubDate><dc:creator>Oleg Netchaev</dc:creator></item><item><title>RE: Bad data import</title><link>http://www.sqlservercentral.com/Forums/Topic912500-1684-1.aspx</link><description>Great question.  I learned something from this one.Nice job.</description><pubDate>Thu, 29 Apr 2010 09:47:02 GMT</pubDate><dc:creator>SQLRNNR</dc:creator></item><item><title>RE: Bad data import</title><link>http://www.sqlservercentral.com/Forums/Topic912500-1684-1.aspx</link><description>[quote][b]skjoldtc (4/29/2010)[/b][hr]65533 is a valid unicode number but represents a special replacement character. It is the highest value in the character set.  dbowlin makes a good point about what values to check for. You would ahve to know the source and the target databases to know what's reasonable. I doubt that 65533 is reasonable under 99.9% of the cases but there may be rare instances. Good question, though.[/quote]I believe 65533 is the highest unicode value.  I don't think you identified what the junk data actually was but that  it is outside of what can be interpreted.  Out of curiosity did you look as the offending line, specifically around this area to see what the actuall hex was?</description><pubDate>Thu, 29 Apr 2010 09:19:44 GMT</pubDate><dc:creator>Trey Staker</dc:creator></item><item><title>RE: Bad data import</title><link>http://www.sqlservercentral.com/Forums/Topic912500-1684-1.aspx</link><description>Yeah, sounds like a HIGH-VALUE declaration in the COBOL (or whatever) code.</description><pubDate>Thu, 29 Apr 2010 08:37:01 GMT</pubDate><dc:creator>Phil Brammer</dc:creator></item><item><title>RE: Bad data import</title><link>http://www.sqlservercentral.com/Forums/Topic912500-1684-1.aspx</link><description>65533 is a valid unicode number but represents a special replacement character. It is the highest value in the character set.  dbowlin makes a good point about what values to check for. You would ahve to know the source and the target databases to know what's reasonable. I doubt that 65533 is reasonable under 99.9% of the cases but there may be rare instances. Good question, though.</description><pubDate>Thu, 29 Apr 2010 08:11:27 GMT</pubDate><dc:creator>OCTom</dc:creator></item><item><title>RE: Bad data import</title><link>http://www.sqlservercentral.com/Forums/Topic912500-1684-1.aspx</link><description>Nice question.</description><pubDate>Thu, 29 Apr 2010 08:03:08 GMT</pubDate><dc:creator>Steve Cullen</dc:creator></item><item><title>RE: Bad data import</title><link>http://www.sqlservercentral.com/Forums/Topic912500-1684-1.aspx</link><description>[quote][b]stewartc-708166 (4/29/2010)[/b][hr][quote][b]Hugo Kornelis (4/29/2010)[/b][hr]I got it right, because of elimination, but I don't understand the entire scenario.Since the destination column is [b]N[/b]VARCHAR(26), how can a value be outside the codepage range? I though codepages were relevant for non-Unicode data only?[/quote]in the situation that gave rise to this question, CODEPOINT identified the value of the character as 65533.As far as I am aware, there is no charater (in any character set, albeit unicode or non-unicode) with that value.[/quote]This is very interesting.  I am wondering where to practically draw the line with regard to how high a number resulting from a CODEPOINT test to draw the line between alpha/numeric characters from any language and other symbols/characters.  Looking at this reference:[url]http://www.ssec.wisc.edu/~tomw/java/unicode.html[/url] I see there is a Unicode character(?) for the value 65533.  Additional searching shows that Unicode goes up to 10FFFF or 1114111.  I suspect the place to draw the line is either 65518 or 65276.  I would certainly appreciate a link reference or a bit of feedback with regard to how to slice Unicode for the data flow in this question.Thanks.</description><pubDate>Thu, 29 Apr 2010 07:34:56 GMT</pubDate><dc:creator>Daniel Bowlin</dc:creator></item><item><title>RE: Bad data import</title><link>http://www.sqlservercentral.com/Forums/Topic912500-1684-1.aspx</link><description>[quote][b]Hugo Kornelis (4/29/2010)[/b][hr][quote][b]stewartc-708166 (4/29/2010)[/b][hr][quote][b]Hugo Kornelis (4/29/2010)[/b][hr]I got it right, because of elimination, but I don't understand the entire scenario.Since the destination column is [b]N[/b]VARCHAR(26), how can a value be outside the codepage range? I though codepages were relevant for non-Unicode data only?[/quote]in the situation that gave rise to this question, CODEPOINT identified the value of the character as 65533.As far as I am aware, there is no charater (in any character set, albeit unicode or non-unicode) with that value.[/quote]So that was a bug in the mainframe program that exportet the file, then?[/quote]that is correct.identifying the record and referring it to the custodian facilitated the extract to be rectified.</description><pubDate>Thu, 29 Apr 2010 04:53:34 GMT</pubDate><dc:creator>Stewart "Arturius" Campbell</dc:creator></item><item><title>RE: Bad data import</title><link>http://www.sqlservercentral.com/Forums/Topic912500-1684-1.aspx</link><description>[quote][b]stewartc-708166 (4/29/2010)[/b][hr][quote][b]Hugo Kornelis (4/29/2010)[/b][hr]I got it right, because of elimination, but I don't understand the entire scenario.Since the destination column is [b]N[/b]VARCHAR(26), how can a value be outside the codepage range? I though codepages were relevant for non-Unicode data only?[/quote]in the situation that gave rise to this question, CODEPOINT identified the value of the character as 65533.As far as I am aware, there is no charater (in any character set, albeit unicode or non-unicode) with that value.[/quote]So that was a bug in the mainframe program that exportet the file, then?</description><pubDate>Thu, 29 Apr 2010 04:46:58 GMT</pubDate><dc:creator>Hugo Kornelis</dc:creator></item><item><title>RE: Bad data import</title><link>http://www.sqlservercentral.com/Forums/Topic912500-1684-1.aspx</link><description>[quote][b]Hugo Kornelis (4/29/2010)[/b][hr]I got it right, because of elimination, but I don't understand the entire scenario.Since the destination column is [b]N[/b]VARCHAR(26), how can a value be outside the codepage range? I though codepages were relevant for non-Unicode data only?[/quote]in the situation that gave rise to this question, CODEPOINT identified the value of the character as 65533.As far as I am aware, there is no charater (in any character set, albeit unicode or non-unicode) with that value.</description><pubDate>Thu, 29 Apr 2010 04:42:03 GMT</pubDate><dc:creator>Stewart "Arturius" Campbell</dc:creator></item><item><title>RE: Bad data import</title><link>http://www.sqlservercentral.com/Forums/Topic912500-1684-1.aspx</link><description>I got it right, because of elimination, but I don't understand the entire scenario.Since the destination column is [b]N[/b]VARCHAR(26), how can a value be outside the codepage range? I though codepages were relevant for non-Unicode data only?</description><pubDate>Thu, 29 Apr 2010 00:57:59 GMT</pubDate><dc:creator>Hugo Kornelis</dc:creator></item><item><title>RE: Bad data import</title><link>http://www.sqlservercentral.com/Forums/Topic912500-1684-1.aspx</link><description>Very nice question. I work a lot with SSIS, but I hadn't heard of this function yet.</description><pubDate>Thu, 29 Apr 2010 00:49:05 GMT</pubDate><dc:creator>Koen Verbeeck</dc:creator></item><item><title>Bad data import</title><link>http://www.sqlservercentral.com/Forums/Topic912500-1684-1.aspx</link><description>Comments posted to this topic are about the item [B]&lt;A HREF="/questions/Integration+Services+(SSIS)/69850/"&gt;Bad data import&lt;/A&gt;[/B]</description><pubDate>Wed, 28 Apr 2010 21:26:35 GMT</pubDate><dc:creator>Stewart "Arturius" Campbell</dc:creator></item></channel></rss>