Worst Practices - Making Databases Case Sensitive (Or Anything Else)

  • First of all, I have to tell you I am pro case sensitive. Why? First programs I've ever written were for Linux (C), so I was born with CS in my blood :).

    Now let’s get back to SQL problem.

    The article specified something about small loss of performance due to changing all letters in a query to same set (capitals or lowers doesn't matter) - for writing a query it is ok, not too much overhead ("I'm sure that takes a CPU cycle or two"), but consider “like” or “=” on a column in table with several million rows, joining this with other n tables... Your CPU has no idea of letters, it knows only numbers ('A' != 'a') so for each letter the software has to load a letter in a CPU register, check if is in not in, let’s say, capitals range and if it is not, it has to add a number to it and compare it to whatever it has to compare. I counted extra 3 operations on CPU complemented by corresponding memory access (read/write). Imagine doing this for all rows in a table (worst case) and for a number of letters for each row. Maybe for small applications it will be ok, but for large ones with hundreds of queries like this per second you have a problem.

    One other thing that I could not understand from the article - why do you have to use triggers to check data? Why can't you use stored procedures to write data? In the stored procedure, before inserting a string in a column that will be searched you can apply upper() or lower() to the variable that contains the data and search procedures will do the same. This way u can apply unique indexes or whatever you like.

    A case sensitive DB application will always work on a non case sensitive server if you make sure that you do not have columns or variables that differ only by casing. The other way is not true most of the time and I am telling you be experience - recently I have lost 2 hours to make McAfee virus DB to work on a case sensitive server (cheaper than buying new sql server licence) and is not the only time I have spent to correct commercial DB's for misspellings for CS.

    For me case insensitive is for lazy people - hope nobody get's mad on me for this 😉

  • I don't think there's any overhead with CI... it's a difference of only 1 bit (32) and only an internal difference of what the mask looks like.

    I love CI because I AM lazy... hope nobody gets mad it me for that 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • it was always claimed that the binary setting produced the performance gain - not quite the same as CS I think - not sure as I normally create CI servers - however I have a server to build so maybe I'll make it binary and see if I can see a difference. in a case insensitive collation getting the case of a procedure call incorrect still produces cache misses in the proc cache, this was like this in 2000 and is still the same in 2005

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • Thanks to all for your input on this

  • I think that case sensitive database make use of binary sort order

    that non case sensitive cannot.

    It's more related with performance.

  • Hi Andy, Old topic of yours and all be it a small point; MS SQL has since at least SQL 4.2 and maybe farther back I understand that the binary collation is the fastest collation that MS SQL operates under and by definition it is case sensitive. I do not know the white papers on it but at least one large ERP uses it (PeopleSoft). You mentioned how SQL would do a comparison in underlying code by using UCASE. I am thinking not since I would suspect MS did not write separate code lines for each collations. I suspect the lowest reaches most likely are not using a UCASE solution as it would exclude matches for Binary and Case Sensitve collations. FWIW, Rick

  • I suspect that most people who claim the performance gain for binary have never actually read it up. The gain is there but only for selected searches/sorts and in most cases in a prod environment you'd not actually be using them. Can't remember where it is/was in msdn/technet but it makes interesting reading.

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • I've found Zero applications that actually require case sensitivity... If I ever do, I'll change the collation on the columns that do rather than setting the whole database for it.

    It's probably also worth mentioning that we have a pot wad of case sensitive Oracle servers at work... everyone I know ends up using WHERE UPPER(colname1) = UPPER(colname2) because of it... that's a little rough on index usage. Any so called gains in performance are, of course, quickly thrown to the ground because of such things.

    Oddly enough, the people I know who actually do support case sensitivety are absolutely the worst when it comes to writing readable code... they do everything in lower case or upper case except for, of course, "things in quotes".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi,

    changing collation of only some columns get to some trouble

    when comparing/joining two columns of different collation.

    you need to use an explicit conversion which may lead to bad performance

    (not using index due to column convertion)

    http://www.sqlservercentral.com/articles/Administering/collate_part1/875/

  • yeah but you miss Jeff's point that, and I've seen this too, developers end up casting every character column to upper or lower which does screw things up big time.

    Funnily enough the code written by the guys who gave my current client a case sensitive database would probably rate as some of the world's worst. I'm campaigning to get the collation of the server changed!

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • In your argument, why do we have English sentence starting with a capital letter? Let us talk about not using that (bad practice) or just lower case everything in your email fl, tx, ma, wa, hello???

    Why don't we put A, a on the same ASCII code?

    Why don't we all drive on the right-hand side of the street? and use Metric system.:-P

Viewing 11 posts - 76 through 85 (of 85 total)

You must be logged in to reply to this topic. Login to reply