caught off guard by some greek lettering in a warehouse feed

Question

Post reply

caught off guard by some greek lettering in a warehouse feed

stan

Ten Centuries

Points: 1267
More actions
October 1, 2025 at 1:34 pm

#4663169
Hi we run 2019 standard. Our warehouse's ssis based etl is klunky but so far it does the job.
for one of our recent acquisitions, and really for the first time we have to temporarily deal with tab delimited dimension and fact flat files in our feeds. All our other feeds come from tables in various dbms technologies and connectors.
in this acquisition's facility dim feed we've encountered a greek alphabet based facility name. i'll list what i believe to be the "facts" that perhaps the community can use to help in the dilemma this creates for me...
1. both the target stage and wh column for facility name are varchar, not nvarchar
2. for a variety of reasons, i dont think we can afford the risk right now of changing the latter 2 to nvarchar
3. when i open their dim facility file in notepad, i see utf-8 in the bottom right which i asked them not to do but as far as i know that has less to do with encoding and more to do with an extra byte on the front of the file that isnt considered of much use anymore anyway in the industry
4. by default when i added this and some other files of theirs to ssis, 1252 ansi was chosen by ssis
5. we can probably talk them into using the/an English alphabet alternative for now in setting a value in this 1 of 12 facilities
6. when i look at the value in notepad and stage, i see what you would expect, a combo of lozenges, I's and other letters with a symbol (oomlatt?) above them, lire symbols etc etc.
one of the things i am wondering is if i can somehow capture this value in a unicode format, whether there is a translation function that could be called from ssis/sql to translate the value (maybe transliterated) into english. This is an important field in our dashboards. When i think about the executives that would look at our dashboards, i dont think any of them would understand greek and want to be in a position of distinguishing this acquisition's greek names from those of future acquisitions.
Another thought i'm having right now is using their "province" name for that facility in facility name also. It uses english letters.
- This topic was modified 1 months, 2 weeks ago by stan.

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

Ken McKelvey SSCoach Points: 18976 More actions · Answer 1

stan wrote:

for a variety of reasons, i dont think we can afford the risk right now of changing the latter 2 to nvarchar

Depending on the size of your varchar column, you could also look at using one of the UTF-8 collations. I think Greek characters will be two bytes per character.

This reply was modified 1 months, 2 weeks ago by Ken McKelvey.
This reply was modified 1 months, 2 weeks ago by Ken McKelvey.

stan Ten Centuries Points: 1267 More actions · Answer 2

thx ken, by risk i really meant ssis usual issues with uni to non uni and vice versa challenges. We have millions of records going thru our etl currently i think from approx 15 erps now. And growing.

also, i'm not sure if ssas would hiccup if suddenly we introduced an nvarchar data type where there was once varchar.

Either way i think you are agreeing that 1) (the right) utf probably doesnt preclude unicode, 2) unicode would have to be used in our target fields if we had the appetite to record greek letters.

I had a chance to think about this some more since i posted. It seems to me that a column like facility name that is so prominent in our dashboards etc should never be shown in greek letters anyway. If i was an executive, i would rather recognize facilities like this by their transliteration to an alphabet i recognize.

AI did provide some interesting and seeming accurate info on transliterating greek to the english alphabet. But we'd have to build this function ourselves as sql has no built in capability. ..or we'd have to use one of many libraries out there embedded in a script in ssis.

greektoenglish

This reply was modified 1 months, 2 weeks ago by stan.

Ken McKelvey SSCoach Points: 18976 More actions · Answer 3

stan wrote:

if suddenly we introduced an nvarchar data type where there was once varchar.

A utf-8 collation will work with varchar. Extended characters will use 2, 3, or 4 bytes so if you have a lot of them they can take up more space than nvarchar which is always 2 bytes per character.

stan Ten Centuries Points: 1267 More actions · Answer 4

thx, not totally getting it. all of our targets are varchar. so even if this flavor of utf allows extended character sets in a column meant for a varchar landing, and even if each char takes more than the usual 2 positions that would be used in nvarchar, how would our target varchar columns be manipulated to show the extended chars on our dashboards? simply a cast of varchar to nvarchar?

This reply was modified 1 months, 1 weeks ago by stan.
This reply was modified 1 months, 1 weeks ago by stan.