Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Full-Text Search – Thesaurus Languages


Full-Text Search – Thesaurus Languages

Author
Message
Hugo Kornelis
Hugo Kornelis
SSCrazy Eights
SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)

Group: General Forum Members
Points: 8329 Visits: 11580
demonfox (3/19/2013)
These are the references I could find ..

http://www.loc.gov/standards/iso639-2/php/code_list.php

here is a discussion reference and an included further references .. I think, this might the standard followed by ms in sql server.. but, then again , a guess ;-)
http://social.msdn.microsoft.com/Forums/en-US/wpf/thread/efa9b596-3bc4-4be7-aeeb-4d97ad31f1dd

http://msdn.microsoft.com/en-us/library/system.globalization.cultureinfo.threeletterisolanguagename.aspx


Thanks for your digging, Demonfox! Much appreciated.

The first reference is to a standards body. I would not automatically assume that Microsoft adheres to any standard they didn't invent themselves. Wink And indeed - I checked the ENU code that is rlevant to this discussion, and it's not included in the list.

The second link is a discussion on the non-standard nature of the three-letter codes used by MS, and the third reference lists a C# program one could use to output the list from Windows. The cropped output shown shows that, at least for American English, SQL Server does not use the code listed as "ISO", but does use the code listed as "WIN".
I'm not sure if that means that I could run that program and use the entire list for my Thesaurus files, as I still have not seen a reference telling me that the three-letter code used by full-text search is always equal to that "WIN" code. Or that all languages in that output are supported by full-text search. Or that that list includes all supported languages. And even if that all would be the case, then I still maintain what I previously replied to Tom - this information should be included in Books Online, in a place that is easy to find, and in the form of a table listing all supported languages and the corresponding three-letter code. Not in the form of a program I'd have to copy, paste, compile and run first. In my opinion, Microsoft really dropped the ball here.


Hugo Kornelis, SQL Server MVP
Visit my SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
TomThomson
TomThomson
SSChampion
SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)

Group: General Forum Members
Points: 10731 Visits: 12019
Hugo Kornelis (3/19/2013)
L' Eomot Inversé (3/19/2013)
Good question, but the definite cultural bias is perhaps unfortunate. I suppose it's fair enough, as the default installation will use LCID 1033, not 2057. But there may be some Brits around for whom teseng.xml is the right file and they wouldn't stand much chance of spotting the right answer, would they?

Even for Brits, the tseng.xml file is NOT the right choice when "working with an American English SQL Server instance" (quote from question text; emphasis added by me). I guess you overlooked that part of the question?

Yes, I should remember to read the question properly before commenting! I'm getting too careless these days.

Tom

TomThomson
TomThomson
SSChampion
SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)

Group: General Forum Members
Points: 10731 Visits: 12019
Toreador (3/19/2013)
demonfox (3/19/2013)
only English , when it comes to britain ...


I'm looking forward to Tom's reply to that one ;-)

Well, just to keep Toreador happy I'll reply, although completely off topic here.

There are at least four versions of spoken English in Britain: Scottish English, Welsh English, and two English Englishes: that awful baabaa they speak in SE England, and the English of the rest of England. If you want to count phrase like "all mang the cudders akay"(travellers language - maybe Rom) or "bickering brattle" (Scots -lallans/doric) as English - I don't, especially since "bickering" in that phrase means something completely different from what the English word with the same spelling means, but some do - then there are at least another two versions; and if you want to count minor dialectal variations like Geordie English and Brumagen English, ie versions with lots of pronunciation variation but only trivial grammar variation, as well as variants with seriously different grammar and vocabulary (I don't, it would be pointless - as silly as in the USA counting Boston English as different from Cambridge English would be) there are hundreds.
But even though there are at least four versions, those four versions have a lot in common, especially in written form: while one version uses "I am after going" and another uses "I am gone" and yet another uses " I have gone" everyone understands all those variants, so in that sense there is a single British English that is a union of those versions. Unless of course you count things like the two non-English example I gave above as English - if you did that you would have to accept that there are three or more mutually incomprehensible English languages in Britain.

I suspect someone from SE England would take exception to the lower case "b" in demofox's "britain". I'm perfectly happy with lower case for the first letters of country names and language names. I usually use upper case for them when writing English because so many Englsh speakers take exception to lower case and always when writing German because all nouns get initial capitals in German, but usually stick to lower case for them except at the beginning of a sentence when writing in other languages, especially in languages like Spanish, Scots Gaelic, and Irish where capitalising language names is formally incorrect. I even use lower case in english when the capital slips my mind or I'm bent on teasing na sasunnaich.

Tom

demonfox
demonfox
Ten Centuries
Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)

Group: General Forum Members
Points: 1219 Visits: 1192
L' Eomot Inversé (3/19/2013)
Toreador (3/19/2013)
demonfox (3/19/2013)
only English , when it comes to britain ...


I'm looking forward to Tom's reply to that one ;-)

Well, just to keep Toreador happy I'll reply, although completely off topic here.

There are at least four versions of spoken English in Britain: Scottish English, Welsh English, and two English Englishes: that awful baabaa they speak in SE England, and the English of the rest of England. If you want to count phrase like "all mang the cudders akay"(travellers language - maybe Rom) or "bickering brattle" (Scots -lallans/doric) as English - I don't, especially since "bickering" in that phrase means something completely different from what the English word with the same spelling means, but some do - then there are at least another two versions; and if you want to count minor dialectal variations like Geordie English and Brumagen English, ie versions with lots of pronunciation variation but only trivial grammar variation, as well as variants with seriously different grammar and vocabulary (I don't, it would be pointless - as silly as in the USA counting Boston English as different from Cambridge English would be) there are hundreds.
But even though there are at least four versions, those four versions have a lot in common, especially in written form: while one version uses "I am after going" and another uses "I am gone" and yet another uses " I have gone" everyone understands all those variants, so in that sense there is a single British English that is a union of those versions. Unless of course you count things like the two non-English example I gave above as English - if you did that you would have to accept that there are three or more mutually incomprehensible English languages in Britain.

I suspect someone from SE England would take exception to the lower case "b" in demofox's "britain". I'm perfectly happy with lower case for the first letters of country names and language names. I usually use upper case for them when writing English because so many Englsh speakers take exception to lower case and always when writing German because all nouns get initial capitals in German, but usually stick to lower case for them except at the beginning of a sentence when writing in other languages, especially in languages like Spanish, Scots Gaelic, and Irish where capitalising language names is formally incorrect. I even use lower case in english when the capital slips my mind or I'm bent on teasing na sasunnaich.


now , that's something :-D something as a wholesome picture of english in Britain :-) may be more is there ; makes me curious to dig into it ..

and, as for the "britain" and the first letter caps , it is laziness to press SHIFT .. ;-)

Edit Sad Now a days , I am typing something else than what I think I am typing .. Missing a word completely .. ) English

~ demonfox
___________________________________________________________________
Wondering what I would do next , when I am done with this one Ermm
demonfox
demonfox
Ten Centuries
Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)

Group: General Forum Members
Points: 1219 Visits: 1192
Hugo Kornelis (3/19/2013)
demonfox (3/19/2013)
These are the references I could find ..

http://www.loc.gov/standards/iso639-2/php/code_list.php

here is a discussion reference and an included further references .. I think, this might the standard followed by ms in sql server.. but, then again , a guess ;-)
http://social.msdn.microsoft.com/Forums/en-US/wpf/thread/efa9b596-3bc4-4be7-aeeb-4d97ad31f1dd

http://msdn.microsoft.com/en-us/library/system.globalization.cultureinfo.threeletterisolanguagename.aspx


Thanks for your digging, Demonfox! Much appreciated.

The first reference is to a standards body. I would not automatically assume that Microsoft adheres to any standard they didn't invent themselves. Wink And indeed - I checked the ENU code that is rlevant to this discussion, and it's not included in the list.

The second link is a discussion on the non-standard nature of the three-letter codes used by MS, and the third reference lists a C# program one could use to output the list from Windows. The cropped output shown shows that, at least for American English, SQL Server does not use the code listed as "ISO", but does use the code listed as "WIN".
I'm not sure if that means that I could run that program and use the entire list for my Thesaurus files, as I still have not seen a reference telling me that the three-letter code used by full-text search is always equal to that "WIN" code. Or that all languages in that output are supported by full-text search. Or that that list includes all supported languages. And even if that all would be the case, then I still maintain what I previously replied to Tom - this information should be included in Books Online, in a place that is easy to find, and in the form of a table listing all supported languages and the corresponding three-letter code. Not in the form of a program I'd have to copy, paste, compile and run first. In my opinion, Microsoft really dropped the ball here.


yes, that's true . well, I think, since I couldn't find any reference then I will have to agree with you.

Moreover , did you check the link provided by steve in the explanation ;
http://msdn.microsoft.com/en-us/library/39cwe7zf(v=vs.110).aspx
http://msdn.microsoft.com/en-us/library/39cwe7zf(v=vs.100).aspx

If you switch between versions ; then you could see the mention of three letter languages . I am not sure why it's not carried on in the 2012 documentations , but does give a hint about ENU and ENG .

~ demonfox
___________________________________________________________________
Wondering what I would do next , when I am done with this one Ermm
TomThomson
TomThomson
SSChampion
SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)

Group: General Forum Members
Points: 10731 Visits: 12019
demonfox (3/19/2013)
[quote]I am not sure , if this is anywhere related to iso639-2 codes ..

These are the references I could find ..

http://www.loc.gov/standards/iso639-2/php/code_list.php
It seems to use ISO-639-2 some of the time, but not always: for example bgr, chs, cht, and enu are not in ISO-639-2 but are 3 letter language codes used by MS.

here is a discussion reference and an included further references .. I think, this might the standard followed by ms in sql server.. but, then again , a guess ;-)
http://social.msdn.microsoft.com/Forums/en-US/wpf/thread/efa9b596-3bc4-4be7-aeeb-4d97ad31f1dd

http://msdn.microsoft.com/en-us/library/system.globalization.cultureinfo.threeletterisolanguagename.aspx

Those look useful - presumably the "ThreeLetterISOLanguageName" property in the CultureInfo object is not actually what it is called but what MS uses (which is sometimes but not always a three letter ISO language code).

Isn't it wonderful that you have to grub about either in the registry or in .NET objects to discover information that ought to be properly documented? And that for all we know grubbing about in the two places may deliver different answers? And that even the number of SQL-Sever supported languages (documented clearly as 33 in BoL) is perhaps 40 or 41 or 44 or 48 depending on which web page one looks at and whether one believes the directry entries installed with SQL Server instead of BoL or some other MSDN web page?

Tom

demonfox
demonfox
Ten Centuries
Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)

Group: General Forum Members
Points: 1219 Visits: 1192

presumably the "ThreeLetterISOLanguageName" property in the CultureInfo object is not actually what it is called but what MS uses (which is sometimes but not always a three letter ISO language code).

Isn't it wonderful that you have to grub about either in the registry or in .NET objects to discover information that ought to be properly documented?



:-P

If we combine the link with this one
http://msdn.microsoft.com/en-us/library/39cwe7zf(v=vs.100).aspx

we might get it all w00t

but , that might be a writ to grit.

so +1 for the proper documentation Questionmark .

~ demonfox
___________________________________________________________________
Wondering what I would do next , when I am done with this one Ermm
Koen Verbeeck
Koen Verbeeck
SSCoach
SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)

Group: General Forum Members
Points: 16467 Visits: 13207
Nice question, thanks.



How to post forum questions.
Need an answer? No, you need a question.
What’s the deal with Excel & SSIS?

Member of LinkedIn. My blog at SQLKover.

MCSA SQL Server 2012 - MCSE Business Intelligence
david.wright-948385
david.wright-948385
Ten Centuries
Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)

Group: General Forum Members
Points: 1059 Visits: 963
Nice question - based on painful experience Steve? Cool
Steve Jones
Steve Jones
SSC-Dedicated
SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)

Group: Administrators
Points: 36188 Visits: 18751
david.wright-948385 (3/20/2013)
Nice question - based on painful experience Steve? Cool


Yes. I was working with this for a talk and kept editing what I thought was the English file. Eventually I researched and realized I was editing the wrong file.

Follow me on Twitter: @way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search