June 5, 2026 at 7:12 pm
hi, we couldnt get our upstream data source developers to supply what is sometimes chinese and i think sometimes greek symbol (and maybe other non printables) laden city, state/province columns in english instead. so for now, we want to fill the columns with blanks if the sql condition i show way below is true. we dont have the band width right now to make our warehouse city, stat and country codes nvarchar.
in the next post i'll show an image that makes me think this file is capable of containing non ascii foreign symbols.
the data comes in a utf-8 file which i dont think we control. the code page on the ssis connector is 1252 latin_1. i changed the data type on the staging city , state, country columns to nvarchar. otherwise i think ssis / sql land ascii in staging. in the advanced ssis editor i tried changing the 3 external and output columns to dt_wstr instead of dt_str but they keep changing back. and of course ssis is complaining that my target nvarchars arent compatible with my source dt_str's. when i create 3 derived columns for these 3 which default to unicode, i get an error in execution about no equiv in tyhe target code page.
the best i can tell, leaving this pkg as is (before the nvarchar changes) lands these foreign symbols in varchar as jibberish. i dont think my sql statement can detect some of the jibberish i see landing. interestingly to me, i did see in another app that oomlaats do land in pkgs whose staging is varchar. what i may start doing is eyeballing the jibberish and all the english symbols and tweaking that sql statement to blank out any geo field that doesnt contain the most common english symbols.
Can someone offer some advice?
case when ((coalesce(cstage.[CustomerCity],'')<>'') and (cstage.[CustomerCity] not LIKE '%[^ -~]%')) then ''
else cstage.[CustomerCity] end
June 5, 2026 at 7:19 pm

June 9, 2026 at 2:30 pm
hard lesson, i guess that sql statement doesn't do what i was hoping, ie hold true in the when clause if even one character is not in the list. i think what it is doing is making sure at least one character in customercity satisfies the criterion. I'm using the code you see below right now but it doesn't work. An example of what doesnt work is a city named Sant’Antimo. Probably because of the A.
i think what i'll need to do is shred each city into individual characters and then check that no "row" of an individual character from "this" customers city column meets the "when" criterion.
select customercity--,customerstate,
case when ((coalesce(cstage.[CustomerCity],'')<>'') and (cstage.[CustomerCity] not LIKE '%[0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ,-.abcdefghijklmnopqrstuvwxyz]%')) then ''
else cstage.[CustomerCity] end
from customer cstage
June 9, 2026 at 2:46 pm
this is looking more promising for my needs, maybe not yours. i'll have to figure out how to allow single quote marks, probably by using the replace function. There is a space in that list if its not obvious. i may add parens as allowable too. Im guessing i didnt need to list the lower cases but may leave them in there for clarity.
select customercity,--customerstate,
case when ((coalesce(cstage.[CustomerCity],'')<>'') and (cstage.[CustomerCity] LIKE '%[^ /0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ,-.abcdefghijklmnopqrstuvwxyz]%')) then ''
else cstage.[CustomerCity] end fixed
from customer cstage
Viewing 4 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply