SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 

A Capital Error

By Phil Factor,

There was a time, in the late seventies, when we jeered at Unix users. In CP/M, we had a modern operating system. It couldn't do that much, but it would run the payroll and accounting systems of a business, do stock control and a lot of other commercial tasks. I wrote many database-driven applications using it for City-of-London stock-brokers (KSAM/ISAM in those days).

Unix, by contrast, had problems that limited its commercial appeal. The licensing problems were never fixed until Linus and his team rewrote it. Unix was originally written by US university geeks who had little idea of the complexities of nationalization. It had a binary collation because that was easiest with ASCII. This meant that a frog was a different thing to a Frog. A Unix geek could, we suspected, call his three sons john, John and jOhn, and be confident that they were unique identifiers. The idea of producing a commercial software product that could be supplied to all cultures, nationalities and languages never occurred to them.

The CP/M operating system was also written from scratch by a university lecturer and some of his friends and students, but the problem of accommodating the most common languages and cultures was fixed for the in the early Eighties, mostly funded by Xerox who wanted to introduce a range of CP/M-based word processors around Europe. The experience and the solutions soon spread to the new MSDOS.

While doing some Linux-based development work, recently, I was taken aback by hitting that same old 1970s US-Academic-geek culture. It was like coming face-to-face with a velociraptor. The extinct lives. What is the virtue of a binary collation? Capital and lowercase are just two ways of writing the same character. To say they are different is as fat-headed as saying that italic or bold makes them different. What is the reason for saying that an accented character is necessarily different? In some cultures, they are, and in some countries they aren't. The French seem to apply them nowadays with the same abandon as salad dressing. Nationalisation is a messy problem. I remember once doing a big nationalisation project and sitting back in my chair with satisfaction, only to be informed that the Semitic-based Middle-Eastern countries wrote backwards from right to left.

While we're on the topic of Unix nonsenses, what about the bizarre idea that indices start at zero rather than one? This is a ghastly error that makes a mockery of the zero concept, and of the vernacular understanding of sequence. Ah, here is john, my zero'th son. John, my son number 1, was born a year later.

What about databases? Fortunately, in SQL Server they pretty-well nailed the collation problem. However, MongoDB still ships with a binary collation, though now you can impose something more sensible on the data. I still get tripped up with Regular Expressions though, which ignore collation and so are case sensitive, by default.

It seems that anything originating in Unix/Linux is infected with this silliness of binary collation and the zero first index. It is so entrenched that people think that there is method in the madness. Actually, not: it is just madness.

Phil Factor

 
Total article views: 41 | Views in the last 30 days: 41
 
Related Articles
FORUM

Collation Difficulties

Problems relating to restoring different collations

FORUM

view collation problem.

view collation problem.

BLOG

No, Binary Collations are not Case-Sensitive

Quite often people will use, or will recommend using, a binary Collation (one ending in “_BIN” or “_...

FORUM

Different collation

Different collation

FORUM

Collation problem

Collation problem with data base migrated on 2005.

Tags
collation    
database weekly    
editorial    
 
Contribute