A Capital Error

Question

A Capital Error

Phil Factor

SSC-Insane

Points: 20244
More actions
February 9, 2019 at 1:05 am

#379293

Comments posted to this topic are about the item A Capital Error
Best wishes,
Phil Factor

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

Toby Ovod-Everett SSC Enthusiast Points: 156 More actions · Answer 1

The problem with case-insensitive is that it's not so simple because the concept of case-insensitivity turns out to be cultural. In English, one can safely round-trip a lower-case string to upper-case and then back to lower-case. That is not true in all languages (in German, ß -> SS -> ss). The Turkish-I problem points out that upper-casing a lower-case string can result in different outputs depending upon the locale ("windows" upper-cases to "WINDOWS" in Turkish, and "WINDOWS" lower-cases to "windows" in Turkish). As a result, there are a huge number of potentially thorny case-insensitivity bugs lying in wait for naive (or should that be naïve) implementations. I agree that SQL Server does a pretty good job with collations, but it is also the case that there are serious performance implications. I'd be curious to see the performance variation in index manipulations depending upon collations.

I personally think we would be better served if computer languages, file systems, URLs, etc. were all case-sensitive. I think it's fine to support case-insensitive (and even accent-insensitive) collations in database columns, but I think it was a poor choice that SQL Server allows the collation to infect object resolution. See https://news.ycombinator.com/item?id=8876722 for a good thread on this.

Imagine having your file-system be rendered inconsistent (or a file-system check has to rename files or relocate them) because someone updates the collation rules, resulting in two files that used to have distinct names now colliding. Or having application software that is collation dependent at the file-system level!

David.Poole SSC Guru Points: 75894 More actions · Answer 2

David.Poole

SSC Guru

Points: 75894

February 11, 2019 at 1:13 am

#2021617

Co = Cobalt
CO = Carbon monoxide.

LinkedIn Profile

hes 49630 Grasshopper Points: 13 More actions · Answer 3

Even in English, most people prefer to distinguish between upper and lower case, particularly with respect to names. I don't like having my name, e.g. in a salutation, in all upper case. Would you be happy to have your mail, documents, ..., use phiL factoR, or perhaps use other random capitalizations?
However, I remember back in the Teletype days that we managed to survive with only upper case - so we could go back to
https://en.wikipedia.org/wiki/Baudot_code#/media/File:International_Telegraph_Alphabet_2.jpg
and do away with lower case, accents, ...

Regarding Unix/Linux, nobody is forcing you to use mixed-case file names. Just don't do it. And if you are working with data which has mixed case information, most Unix/Linux commands have a case-insensitive option (often -i).

x SSC-Insane Points: 23660 More actions · Answer 4

One thing came to me about case sensitive collations, is the readability of code. Sure, you can do whatever case you want in insensitive collations, but if you want to ENFORCE the style of programming text, then case sensitive collations makes sense.

About zero based indexing, I wouldn't call it a unixism, starting your indexes at 1 is a convenience offered to programmers who would rather code that way in my opinion. Arrays in actuality are often implemented using base addresses and offsets, and behind the scenes if you use 1 based indexing, any index must be decreased by 1 internally anyways or else the first position would go unused (which is probably no biggy). Programming languages with indexes that start at 1 really aren't all that operating system specific. When I was writing assembler for CP/M, there were the same addresses and offsets, so I'm going to hazard a guess that Phil is confusing operating systems with programming languages.

2 cents!

Robert Sterbal SSChampion Points: 11042 More actions · Answer 5

This is an issue with mediawiki.

The most memorable issue was with the TRS-80 which only displayed capital letters.

412-977-3526 call/text

Phil Factor SSC-Insane Points: 20244 More actions · Answer 6

I'm going to hazard a guess that Phil is confusing operating systems with programming languages.

I certainly could be confusing operating systems with languages, but the whole business of collation should be part of the operating system, not left to individual languages or components to implement. It is too complicated to leave to the individual language or framework. You should be able to specify the collation and everything responds sensibly the same way by default and over-rides with whatever sort order it needs when necessary. To do good collation is tricky. What do we have instead? Linux is all case sensitive, In Windows, C# is case sensitive, VB isn't. PowerShell generally isn't. Javascript and Java is, URLs and URNs aren't, SQL Server isn't by default, HTML isn't case sensitive but XML is. One ends up defensively having to write everything in lower-case.
CP/M was, at least, written from the ground up to be case insensitive and Gary Kildall, the author, insisted on all components being so. He felt so strongly about that and the zero index that he abandoned his C Compiler to produce PL-1 Sub set G compiler instead, which was a thing of great beauty. This ethos of consistency was carried forward into MSDOS, even when it got its Xenix bits. It all got broken when Windows tried to absorb the C variants and XML. What was really needed at that stage was proper collation that was consistent for the user. It never happened.

Best wishes,
Phil Factor

x SSC-Insane Points: 23660 More actions · Answer 7

Phil Factor - Monday, February 11, 2019 11:05 AM
I'm going to hazard a guess that Phil is confusing operating systems with programming languages.
I certainly could be confusing operating systems with languages, but the whole business of collation should be part of the operating system, not left to individual languages or components to implement. It is too complicated to leave to the individual language or framework. You should be able to specify the collation and everything responds sensibly the same way by default and over-rides with whatever sort order it needs when necessary. To do good collation is tricky. What do we have instead? Linux is all case sensitive, In Windows, C# is case sensitive, VB isn't. PowerShell generally isn't. Javascript and Java is, URLs and URNs aren't, SQL Server isn't by default, HTML isn't case sensitive but XML is. One ends up defensively having to write everything in lower-case.
CP/M was, at least, written from the ground up to be case insensitive and Gary Kildall, the author, insisted on all components being so. He felt so strongly about that and the zero index that he abandoned his C Compiler to produce PL-1 Sub set G compiler instead, which was a thing of great beauty. This ethos of consistency was carried forward into MSDOS, even when it got its Xenix bits. It all got broken when Windows tried to absorb the C variants and XML. What was really needed at that stage was proper collation that was consistent for the user. It never happened.

Not everything on Linux is case sensitive but that's sort of a moot point anyways because its not an operating system question when it comes to programming languages anyways. File systems can have case sensitivity, I can certainly attest to that. Even you are admitting that c on cpm had the possibility of case sensitivity (it really needed to if it was going to have a shot at compiling existing source). I don't really putz around as much with postgres as I like but I don't remember it being case sensitive on Linux. Likewise with Fortran on Linux with the obvious exception of include files (just from what I've read, I haven't coded in fortran on Linux.) I think the key here is that you really don't want to change case sensitivity in programming languages simply because you want to have a chance at using / porting existing source code and in any case, you want to appeal to the language users and that would be a pretty drastic change to sell. Same for java and other languages which often use the case distinction to use between classes and instances of those classes (a programming motif I seem to remember seeing).

MS-DOS had at least two c compiler offerings that I had experience with, so I don't think there was any "trying to absorb the C variants" when it came to windows, from what I've read 1.0 had a c sdk already, although I never used it, I only had any experience using c on windows starting with 3.1, but I'd find it hard to believe if 3.1 didn't have parts in c. From what I've read, even 1.0's file manager was written in c. Simply put, c is case sensitive, its part of the language and by the time windows came around, c was enough of a thing that there wasn't going to be some sort of great breakage of any sort, simply put, operating systems just don't lock you in to case arguments with of course the caveat of the file systems, and even then, c isn't going to change your file system's case requirements to the best of my knowledge.

With Pascal being case insensitive (if I remember correctly), it'd be a bad move to make it case sensitive on Linux right? What about python? Its getting popular on windows with data folks, did it lose its case sensitivity along the way? (Honestly don't know LOL but I am tending toward thinking its still case sensitive to some extent).

The new aspirants for the systems programming language choice include go and rust, go is case sensitive, rust is "case complainy" lol but still these are independent of the OS. I'm betting google doesn't put much search computing power on windows servers but they don't seem to have problems with dealing with case insensitive searches right?

I just think its more that case sensitivity is going to be part of any particular languages experience and culture, and to think either is inherently evil is just sort of unproductive in my opinion. If one or the other takes a bit more effort, just toss it in to the programming languages merits and drawbacks, weigh them all and see if the choice is a good one. Honestly zero's origin falls into the same case in my opinion, plenty of my experience is with zero's origin but that doesn't mean I can't handle 1's origin.

As far as the user is concerned, they're still going to capitalize proper names, beginnings of sentences, etc so in their case case sensitive it is! I'm sure an email package that throws your carefully composed and professional message into all caps might encounter some confusion heh

https://rosettacode.org/wiki/Case-sensitivity_of_identifiers

https://en.wikipedia.org/wiki/Zero-based_numbering

Robert Sterbal SSChampion Points: 11042 More actions · Answer 8

Phil Factor - Monday, February 11, 2019 11:05 AM
I'm going to hazard a guess that Phil is confusing operating systems with programming languages.
I certainly could be confusing operating systems with languages, but the whole business of collation should be part of the operating system, not left to individual languages or components to implement. It is too complicated to leave to the individual language or framework. You should be able to specify the collation and everything responds sensibly the same way by default and over-rides with whatever sort order it needs when necessary. To do good collation is tricky. What do we have instead? Linux is all case sensitive, In Windows, C# is case sensitive, VB isn't. PowerShell generally isn't. Javascript and Java is, URLs and URNs aren't, SQL Server isn't by default, HTML isn't case sensitive but XML is. One ends up defensively having to write everything in lower-case.
CP/M was, at least, written from the ground up to be case insensitive and Gary Kildall, the author, insisted on all components being so. He felt so strongly about that and the zero index that he abandoned his C Compiler to produce PL-1 Sub set G compiler instead, which was a thing of great beauty. This ethos of consistency was carried forward into MSDOS, even when it got its Xenix bits. It all got broken when Windows tried to absorb the C variants and XML. What was really needed at that stage was proper collation that was consistent for the user. It never happened.

https://sterbalssundrystudies.miraheze.org/wiki/Collation

412-977-3526 call/text

andrew gothard SSChampion Points: 12301 More actions · Answer 9

hes 49630 - Monday, February 11, 2019 6:35 AM
Would you be happy to have your mail, documents, ..., use phiL factoR, or perhaps use other random capitalizations?

Why would he care? In a case sensitive world that's a different person

I'm a DBA.
I'm not paid to solve problems. I'm paid to prevent them.

Toby Ovod-Everett SSC Enthusiast Points: 156 More actions · Answer 10

robert.sterbal 56890 - Monday, February 11, 2019 2:15 PM
Phil Factor - Monday, February 11, 2019 11:05 AM
What do we have instead? Linux is all case sensitive, In Windows, C# is case sensitive, VB isn't. PowerShell generally isn't. Javascript and Java is, URLs and URNs aren't, SQL Server isn't by default, HTML isn't case sensitive but XML is.
https://sterbalssundrystudies.miraheze.org/wiki/Collation

Note that in the original quote, Phil Factor stated the URLs on Windows (I assume meaning URLs being served by Windows servers) weren't case sensitive. IIS tends to be non-case sensitive (although one could choose to develop case sensitive ASP.NET applications for it), where as most Unix web servers tend to be case sensitive. Web frameworks can be case sensitive or not depending upon how they are coded. Rails routes are case sensitive, although the programmer can definitely choose to make the code accept the parameters in the route case insensitively.

According to W3 (https://www.w3.org/TR/WD-html40-970708/htmlweb.html):

URLs in general are case-sensitive (with the exception of machine names). There may be URLs, or parts of URLs, where case doesn't matter, but identifying these may not be easy. Users should always consider that URLs are case-sensitive.