July 12, 2011 at 10:36 am
In SSIS 2008 importing from AS400 mainframe and got truncation errors. Could not redirect row. Could not handle with replace unwanted characters control from Task Factory . So tried script and although it would strip out the "standard" non-printable characters 0-31 there were other Unicode characters that made it puke at the end of some records. (http://www.ssec.wisc.edu/~tomw/java/unicode.html#x0000)
So after trying alot of things finally got the following script to work in VB.NET. (Rant: Pardon me all you C# Mr. Spok types I am just a SQL DBA and have no desire to ever learn C#, I have enough to worry about with partitioning, replication, clustering etc..! But I did some VB6 and VBScript back in the day so I had to go with what I knew 🙂
At any rate this worked.
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Dim LoopCounter As Integer 'Loop Through valid Character Checks for each Row
Dim StringCounter As Integer 'Loop Through positions in field
Dim LastGoodPosition As Integer
LoopCounter = 0
'Unicode values Between 32 and 591 acceptable Latin characters
'See http://www.ssec.wisc.edu/~tomw/java/unicode.html#x0000
'Remove 0-31 as non printable characters
'If over 591 then remove trailing characters
While LoopCounter <= 591
'Remove non printable characters 0-31
If Row.NAME78.ToString.Contains(ChrW(LoopCounter)) Then
'MsgBox(Row.PRODNU.ToString & " " & Row.NAME78.ToCharArray) 'Debug record key and string value
'MsgBox(AscW(ChrW(LoopCounter))) 'Debug Unicode value
Replace(Row.NAME78.ToString, AscW(ChrW(LoopCounter)), "")
End If
LoopCounter = LoopCounter + 1
'32 through 591 ok skip these
If LoopCounter = 32 Then
LoopCounter = 592
End If
'Remove characters > than 591
If LoopCounter > 591 Then
'MsgBox(Row.PRODNU.ToString & " " & Row.NAME78.ToCharArray) 'Debug record key and string value
'MsgBox("Unicode Character > 591") 'Over value we want to accept
StringCounter = Len(Row.NAME78.ToString) 'Length of NAME78 field
'Start from end of string and work backwards to get last good character
While StringCounter >= 1
'Keep position of last valid value
If AscW(Mid(Row.NAME78.ToString, StringCounter, 1)) > 591 Then
LastGoodPosition = StringCounter - 1
End If
StringCounter = StringCounter - 1
End While
'Keep record up to bad character
Row.NAME78 = Left(Row.NAME78.ToString, LastGoodPosition)
End If
End While
End Sub
July 12, 2011 at 11:44 am
OK corrected last part of script. Still somewhat of a hack but cannot seem to trap character coming in some value 65533 max Unicode value so just try to keep entire record if I can if not then arbitrarily keep first 45 as when I keep 50 it bombs.
'Remove characters over 591
If LoopCounter > 591 Then
'MsgBox(Row.PRODNU.ToString & " " & Row.NAME78.ToCharArray) 'Debug record key and string value
'MsgBox("Unicode Character > 591") 'Over value we want to accept
StringCounter = Len(Row.NAME78.ToString) 'Length of NAME78 field
LastGoodPosition = StringCounter 'Initialize to full string
While StringCounter >= 1
If Len(Mid(Row.NAME78.ToString, StringCounter, 1)) > 0 Then
LastGoodPosition = StringCounter
End If
StringCounter = StringCounter - 1
End While
'Keep record up to bad character
If StringCounter >= 1 Then
Row.NAME78 = Left(Row.NAME78.ToString, LastGoodPosition)
Else
'Bad character keep what you can
Row.NAME78 = Left(Row.NAME78.ToString, 45)
End If
End If
July 13, 2011 at 2:26 pm
My apologies please disregard all prior posts. Corrected and yes tested code below to remove non printable and other unwanted characters.
(Rant: People who post code not thoroughly tested = (schmuck, fool, moron, idiot, Senior Dumas, Me!!)
1000 apologies for prior posts.
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Dim LoopCounter As Integer 'Loop Through valid Character Checks for each Row
LoopCounter = 0 'Unicode starts with null character which is 0
'Unicode values Between 32 and 126 acceptable Latin characters
'Unicode values Between 192 and 563 accepatble Latin characters
'See http://www.ssec.wisc.edu/~tomw/java/unicode.html#x0000
'Remove 0-31 as non printable characters and 127-191
While LoopCounter <= 563
'Remove non printable characters 0-31 and 127-191
If Row.NAME78.ToString.Contains(ChrW(LoopCounter)) Then
Row.NAME78 = Replace(Row.NAME78.ToString, ChrW(LoopCounter).ToString, String.Empty)
End If
LoopCounter = LoopCounter + 1
'32 to 126 keep
If LoopCounter = 32 Then
LoopCounter = 127
End If
'192 to 563 keep
If LoopCounter = 192 Then
LoopCounter = 564
End If
End While
End Sub
July 13, 2011 at 2:27 pm
Oh and FYI the other characters besides 0-31 were 147 and 148 that caused the truncation errors. Those were the ones I was not expecting.
July 14, 2011 at 2:28 am
jsquared24 (7/13/2011)
Oh and FYI the other characters besides 0-31 were 147 and 148 that caused the truncation errors. Those were the ones I was not expecting.
Well, you solved your problem entirely yourself in this thread 😀
Anyway, thanks for posting the solution, it will come in handy in the future.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
Viewing 5 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply