October 1, 2025 at 1:34 pm
Hi we run 2019 standard. Our warehouse's ssis based etl is klunky but so far it does the job.
for one of our recent acquisitions, and really for the first time we have to temporarily deal with tab delimited dimension and fact flat files in our feeds. All our other feeds come from tables in various dbms technologies and connectors.
in this acquisition's facility dim feed we've encountered a greek alphabet based facility name. i'll list what i believe to be the "facts" that perhaps the community can use to help in the dilemma this creates for me...
6. when i look at the value in notepad and stage, i see what you would expect, a combo of lozenges, I's and other letters with a symbol (oomlatt?) above them, lire symbols etc etc.
one of the things i am wondering is if i can somehow capture this value in a unicode format, whether there is a translation function that could be called from ssis/sql to translate the value (maybe transliterated) into english. This is an important field in our dashboards. When i think about the executives that would look at our dashboards, i dont think any of them would understand greek and want to be in a position of distinguishing this acquisition's greek names from those of future acquisitions.
Another thought i'm having right now is using their "province" name for that facility in facility name also. It uses english letters.
October 2, 2025 at 7:36 am
for a variety of reasons, i dont think we can afford the risk right now of changing the latter 2 to nvarchar
Depending on the size of your varchar column, you could also look at using one of the UTF-8 collations. I think Greek characters will be two bytes per character.
October 2, 2025 at 12:49 pm
thx ken, by risk i really meant ssis usual issues with uni to non uni and vice versa challenges. We have millions of records going thru our etl currently i think from approx 15 erps now. And growing.
also, i'm not sure if ssas would hiccup if suddenly we introduced an nvarchar data type where there was once varchar.
Either way i think you are agreeing that 1) (the right) utf probably doesnt preclude unicode, 2) unicode would have to be used in our target fields if we had the appetite to record greek letters.
I had a chance to think about this some more since i posted. It seems to me that a column like facility name that is so prominent in our dashboards etc should never be shown in greek letters anyway. If i was an executive, i would rather recognize facilities like this by their transliteration to an alphabet i recognize.
AI did provide some interesting and seeming accurate info on transliterating greek to the english alphabet. But we'd have to build this function ourselves as sql has no built in capability. ..or we'd have to use one of many libraries out there embedded in a script in ssis.

October 2, 2025 at 6:28 pm
if suddenly we introduced an nvarchar data type where there was once varchar.
A utf-8 collation will work with varchar. Extended characters will use 2, 3, or 4 bytes so if you have a lot of them they can take up more space than nvarchar which is always 2 bytes per character.
October 7, 2025 at 5:04 pm
thx, not totally getting it. all of our targets are varchar. so even if this flavor of utf allows extended character sets in a column meant for a varchar landing, and even if each char takes more than the usual 2 positions that would be used in nvarchar, how would our target varchar columns be manipulated to show the extended chars on our dashboards? simply a cast of varchar to nvarchar?
Viewing 5 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply