hi, i spent some time today in an existing pkg replumbing 5 flat file connectors to new tsv files. eventually that will be automated.
2 of my connectors show the bizarre character shown below prior to my first hdg name. on one i'm not sure what i did but i chose the file in the connector again and the bizarre character went away. on the other no.
i set up a separate pkg pointing 2 connectors to these files and dont see in columns or advanced that funny character.
i opened each (they are tsvs) in excel and answered those questions about what excel sees as delims etc and dont see the funny character.
does anybody know whats going on?
December 5, 2024 at 12:49 am
that looks like a "UTF-8 with BOM" file being read as non unicode.
ensure you use the correct definition when defining the connector - and fix what generates the file not to generate a UTF file and recreate the connector
December 16, 2024 at 5:21 pm
sorry for the delay. Frederico are you saying that if i found myself changing the conn mgrs from code pg 65001 (utf-8) to 1252 Ansi Latin, that i was telling ssis "even though you found utf encoding with bom, read the tsv files as non unicode?".
I didnt even know that was possible.
is reading utf with 65001 so bad? except maybe for finding myself using derived columns between source and destination as a workaround which i hate?
I'm going to open a file or 2 with notepad to see if these are coming in as utf. I know determining bom requies special software. If i can talk my upstream peer to use other than utf encoding, what would you suggest? preferably something easy as i dont know what kind of tools she has available up there. they arent on our network yet so we are stuck in a circle of hell with bad data types etc on tsv files.
I see via notepad she sent 4 utf files and one ansi. im looking at the list of ansi encodings in conn mgrs and notice that 1252 looks the most familiar. i believe we can go in ay direction, hoping for advice from you and the community first.
December 16, 2024 at 8:47 pm
it seems to me utf is strategic if (like us) you expect some extended character set data from europe etc.
if thats a decent idea, does anybody know what the general approach would be to removing the bom? If there even is a general approach?
think you failed to understand what I intended.
when you create a flat file connection manager you should define it's encoding before opening the file so headers get populated correctly. doing this will cause SSIS to put the correct header without those symbols.
if instead you define a file as non UTF, SSIS will interpret those 4 chars as being part of the first column name.
regarding the source of files - if some are UTF then you most likely will need to have your file defined as UTF as well - SSIS should still read the non UTF files fine, but could potentially have issues in conversion of some characters if you do have them on the incoming files - this is something you will need to discuss with sender of files and see if there is that possibility - if so then on source they need to ensure the correct file type is always used.
And nothing to do with EU characters - its possible to have most of the EU chars being used on a non UTF file and they will be correctly interpreted.
for same reason, having a file as UTF does not always require that the destination table is a NCHAR/NVARCHAR - but some people assume it needs to be. All comes down to what data is being sent, and unless absolutely required, N datatypes should be avoided
December 17, 2024 at 2:49 pm
thx frederico. I'll focus on utf's. and define my connectors as utf's. and tell her all files must be utf. i think all but one of her files is currently utf anyway.
with that in mind i think you are saying not to worry about the bom. we get that it isnt necessary but it sounds like they wont hurt or confuse us if the conn definition is correct. and maybe its not worth convincing her to get them off her files.
December 17, 2024 at 3:10 pm
BOM not worth it - files just should be consistent. throughout.
likely some files are manually created, and others through some utility - this is normally the case when I get those differences in files.
and obviously within SSIS ensure you define the files correctly next time - and on those where it is wrong, remove them from the header - and look to see if codepage is defined correctly while at it.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply