Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««1234»»

Full-Text Search – Thesaurus Languages Expand / Collapse
Author
Message
Posted Tuesday, March 19, 2013 4:44 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 2:32 AM
Points: 5,926, Visits: 8,176
demonfox (3/19/2013)
These are the references I could find ..

http://www.loc.gov/standards/iso639-2/php/code_list.php

here is a discussion reference and an included further references .. I think, this might the standard followed by ms in sql server.. but, then again , a guess
http://social.msdn.microsoft.com/Forums/en-US/wpf/thread/efa9b596-3bc4-4be7-aeeb-4d97ad31f1dd

http://msdn.microsoft.com/en-us/library/system.globalization.cultureinfo.threeletterisolanguagename.aspx


Thanks for your digging, Demonfox! Much appreciated.

The first reference is to a standards body. I would not automatically assume that Microsoft adheres to any standard they didn't invent themselves. ;) And indeed - I checked the ENU code that is rlevant to this discussion, and it's not included in the list.

The second link is a discussion on the non-standard nature of the three-letter codes used by MS, and the third reference lists a C# program one could use to output the list from Windows. The cropped output shown shows that, at least for American English, SQL Server does not use the code listed as "ISO", but does use the code listed as "WIN".
I'm not sure if that means that I could run that program and use the entire list for my Thesaurus files, as I still have not seen a reference telling me that the three-letter code used by full-text search is always equal to that "WIN" code. Or that all languages in that output are supported by full-text search. Or that that list includes all supported languages. And even if that all would be the case, then I still maintain what I previously replied to Tom - this information should be included in Books Online, in a place that is easy to find, and in the form of a table listing all supported languages and the corresponding three-letter code. Not in the form of a program I'd have to copy, paste, compile and run first. In my opinion, Microsoft really dropped the ball here.



Hugo Kornelis, SQL Server MVP
Visit my SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
Post #1432939
Posted Tuesday, March 19, 2013 8:43 PM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Yesterday @ 7:42 PM
Points: 8,567, Visits: 9,071
Hugo Kornelis (3/19/2013)
L' Eomot Inversé (3/19/2013)
Good question, but the definite cultural bias is perhaps unfortunate. I suppose it's fair enough, as the default installation will use LCID 1033, not 2057. But there may be some Brits around for whom teseng.xml is the right file and they wouldn't stand much chance of spotting the right answer, would they?

Even for Brits, the tseng.xml file is NOT the right choice when "working with an American English SQL Server instance" (quote from question text; emphasis added by me). I guess you overlooked that part of the question?

Yes, I should remember to read the question properly before commenting! I'm getting too careless these days.


Tom
Post #1432976
Posted Tuesday, March 19, 2013 9:52 PM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Yesterday @ 7:42 PM
Points: 8,567, Visits: 9,071
Toreador (3/19/2013)
demonfox (3/19/2013)
only English , when it comes to britain ...


I'm looking forward to Tom's reply to that one

Well, just to keep Toreador happy I'll reply, although completely off topic here.

There are at least four versions of spoken English in Britain: Scottish English, Welsh English, and two English Englishes: that awful baabaa they speak in SE England, and the English of the rest of England. If you want to count phrase like "all mang the cudders akay"(travellers language - maybe Rom) or "bickering brattle" (Scots -lallans/doric) as English - I don't, especially since "bickering" in that phrase means something completely different from what the English word with the same spelling means, but some do - then there are at least another two versions; and if you want to count minor dialectal variations like Geordie English and Brumagen English, ie versions with lots of pronunciation variation but only trivial grammar variation, as well as variants with seriously different grammar and vocabulary (I don't, it would be pointless - as silly as in the USA counting Boston English as different from Cambridge English would be) there are hundreds.
But even though there are at least four versions, those four versions have a lot in common, especially in written form: while one version uses "I am after going" and another uses "I am gone" and yet another uses " I have gone" everyone understands all those variants, so in that sense there is a single British English that is a union of those versions. Unless of course you count things like the two non-English example I gave above as English - if you did that you would have to accept that there are three or more mutually incomprehensible English languages in Britain.

I suspect someone from SE England would take exception to the lower case "b" in demofox's "britain". I'm perfectly happy with lower case for the first letters of country names and language names. I usually use upper case for them when writing English because so many Englsh speakers take exception to lower case and always when writing German because all nouns get initial capitals in German, but usually stick to lower case for them except at the beginning of a sentence when writing in other languages, especially in languages like Spanish, Scots Gaelic, and Irish where capitalising language names is formally incorrect. I even use lower case in english when the capital slips my mind or I'm bent on teasing na sasunnaich.


Tom
Post #1432979
Posted Tuesday, March 19, 2013 10:21 PM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Saturday, May 31, 2014 9:19 PM
Points: 1,128, Visits: 1,162
L' Eomot Inversé (3/19/2013)
Toreador (3/19/2013)
demonfox (3/19/2013)
only English , when it comes to britain ...


I'm looking forward to Tom's reply to that one

Well, just to keep Toreador happy I'll reply, although completely off topic here.

There are at least four versions of spoken English in Britain: Scottish English, Welsh English, and two English Englishes: that awful baabaa they speak in SE England, and the English of the rest of England. If you want to count phrase like "all mang the cudders akay"(travellers language - maybe Rom) or "bickering brattle" (Scots -lallans/doric) as English - I don't, especially since "bickering" in that phrase means something completely different from what the English word with the same spelling means, but some do - then there are at least another two versions; and if you want to count minor dialectal variations like Geordie English and Brumagen English, ie versions with lots of pronunciation variation but only trivial grammar variation, as well as variants with seriously different grammar and vocabulary (I don't, it would be pointless - as silly as in the USA counting Boston English as different from Cambridge English would be) there are hundreds.
But even though there are at least four versions, those four versions have a lot in common, especially in written form: while one version uses "I am after going" and another uses "I am gone" and yet another uses " I have gone" everyone understands all those variants, so in that sense there is a single British English that is a union of those versions. Unless of course you count things like the two non-English example I gave above as English - if you did that you would have to accept that there are three or more mutually incomprehensible English languages in Britain.

I suspect someone from SE England would take exception to the lower case "b" in demofox's "britain". I'm perfectly happy with lower case for the first letters of country names and language names. I usually use upper case for them when writing English because so many Englsh speakers take exception to lower case and always when writing German because all nouns get initial capitals in German, but usually stick to lower case for them except at the beginning of a sentence when writing in other languages, especially in languages like Spanish, Scots Gaelic, and Irish where capitalising language names is formally incorrect. I even use lower case in english when the capital slips my mind or I'm bent on teasing na sasunnaich.


now , that's something something as a wholesome picture of english in Britain may be more is there ; makes me curious to dig into it ..

and, as for the "britain" and the first letter caps , it is laziness to press SHIFT ..

Edit Now a days , I am typing something else than what I think I am typing .. Missing a word completely .. ) English


~ demonfox
___________________________________________________________________
Wondering what I would do next , when I am done with this one
Post #1432983
Posted Tuesday, March 19, 2013 10:26 PM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Saturday, May 31, 2014 9:19 PM
Points: 1,128, Visits: 1,162
Hugo Kornelis (3/19/2013)
demonfox (3/19/2013)
These are the references I could find ..

http://www.loc.gov/standards/iso639-2/php/code_list.php

here is a discussion reference and an included further references .. I think, this might the standard followed by ms in sql server.. but, then again , a guess
http://social.msdn.microsoft.com/Forums/en-US/wpf/thread/efa9b596-3bc4-4be7-aeeb-4d97ad31f1dd

http://msdn.microsoft.com/en-us/library/system.globalization.cultureinfo.threeletterisolanguagename.aspx


Thanks for your digging, Demonfox! Much appreciated.

The first reference is to a standards body. I would not automatically assume that Microsoft adheres to any standard they didn't invent themselves. ;) And indeed - I checked the ENU code that is rlevant to this discussion, and it's not included in the list.

The second link is a discussion on the non-standard nature of the three-letter codes used by MS, and the third reference lists a C# program one could use to output the list from Windows. The cropped output shown shows that, at least for American English, SQL Server does not use the code listed as "ISO", but does use the code listed as "WIN".
I'm not sure if that means that I could run that program and use the entire list for my Thesaurus files, as I still have not seen a reference telling me that the three-letter code used by full-text search is always equal to that "WIN" code. Or that all languages in that output are supported by full-text search. Or that that list includes all supported languages. And even if that all would be the case, then I still maintain what I previously replied to Tom - this information should be included in Books Online, in a place that is easy to find, and in the form of a table listing all supported languages and the corresponding three-letter code. Not in the form of a program I'd have to copy, paste, compile and run first. In my opinion, Microsoft really dropped the ball here.


yes, that's true . well, I think, since I couldn't find any reference then I will have to agree with you.

Moreover , did you check the link provided by steve in the explanation ;
http://msdn.microsoft.com/en-us/library/39cwe7zf(v=vs.110).aspx
http://msdn.microsoft.com/en-us/library/39cwe7zf(v=vs.100).aspx

If you switch between versions ; then you could see the mention of three letter languages . I am not sure why it's not carried on in the 2012 documentations , but does give a hint about ENU and ENG .


~ demonfox
___________________________________________________________________
Wondering what I would do next , when I am done with this one
Post #1432985
Posted Tuesday, March 19, 2013 11:05 PM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Yesterday @ 7:42 PM
Points: 8,567, Visits: 9,071
demonfox (3/19/2013)
[quote]I am not sure , if this is anywhere related to iso639-2 codes ..

These are the references I could find ..

http://www.loc.gov/standards/iso639-2/php/code_list.php
It seems to use ISO-639-2 some of the time, but not always: for example bgr, chs, cht, and enu are not in ISO-639-2 but are 3 letter language codes used by MS.

here is a discussion reference and an included further references .. I think, this might the standard followed by ms in sql server.. but, then again , a guess
http://social.msdn.microsoft.com/Forums/en-US/wpf/thread/efa9b596-3bc4-4be7-aeeb-4d97ad31f1dd

http://msdn.microsoft.com/en-us/library/system.globalization.cultureinfo.threeletterisolanguagename.aspx

Those look useful - presumably the "ThreeLetterISOLanguageName" property in the CultureInfo object is not actually what it is called but what MS uses (which is sometimes but not always a three letter ISO language code).

Isn't it wonderful that you have to grub about either in the registry or in .NET objects to discover information that ought to be properly documented? And that for all we know grubbing about in the two places may deliver different answers? And that even the number of SQL-Sever supported languages (documented clearly as 33 in BoL) is perhaps 40 or 41 or 44 or 48 depending on which web page one looks at and whether one believes the directry entries installed with SQL Server instead of BoL or some other MSDN web page?


Tom
Post #1432996
Posted Tuesday, March 19, 2013 11:29 PM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Saturday, May 31, 2014 9:19 PM
Points: 1,128, Visits: 1,162

presumably the "ThreeLetterISOLanguageName" property in the CultureInfo object is not actually what it is called but what MS uses (which is sometimes but not always a three letter ISO language code).

Isn't it wonderful that you have to grub about either in the registry or in .NET objects to discover information that ought to be properly documented?





If we combine the link with this one
http://msdn.microsoft.com/en-us/library/39cwe7zf(v=vs.100).aspx

we might get it all

but , that might be a writ to grit.

so +1 for the proper documentation Questionmark .


~ demonfox
___________________________________________________________________
Wondering what I would do next , when I am done with this one
Post #1433000
Posted Wednesday, March 20, 2013 1:45 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 2:35 AM
Points: 13,320, Visits: 10,185
Nice question, thanks.



How to post forum questions.
Need an answer? No, you need a question.
What’s the deal with Excel & SSIS?

Member of LinkedIn. My blog at LessThanDot.

MCSA SQL Server 2012 - MCSE Business Intelligence
Post #1433034
Posted Wednesday, March 20, 2013 6:11 AM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Yesterday @ 6:57 AM
Points: 982, Visits: 752
Nice question - based on painful experience Steve?
Post #1433158
Posted Wednesday, March 20, 2013 7:04 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: Administrators
Last Login: Yesterday @ 11:24 AM
Points: 33,088, Visits: 15,197
david.wright-948385 (3/20/2013)
Nice question - based on painful experience Steve?


Yes. I was working with this for a talk and kept editing what I thought was the English file. Eventually I researched and realized I was editing the wrong file.







Follow me on Twitter: @way0utwest

Forum Etiquette: How to post data/code on a forum to get the best help
Post #1433187
« Prev Topic | Next Topic »

Add to briefcase «««1234»»

Permissions Expand / Collapse