Worst Practices - Making Databases Case Sensitive (Or Anything Else)

  • Please send us any suggest you have for topics (post here, or email us directly). We gauge articles by the hits of course, but also by the amount of feedback we get from users. So far by either measure this has been one our more popular series.

    We've got a few in the pipeline, and maybe a real surprise or two to be announced shortly!

    Andy

  • Raj: your welcome. Nicely written.

    (and everyone): please do send suggestions to Andy (awarren@sqlservercentral.com) or myself (sjones@sqlservercentral.com). We are always looking for new topics. Let us know specifically if what you would like to so or think others would like.

    Also, if you are interested in us extending this to the "Best Practices" as well.

    Steve Jones

    steve@dkranch.net

  • I agree that case sensitivity is sometimes a pain but it does make you a better coder. You know when you release a product that it will work in both case sensitive and non-case sensitive databases.

    I do think it is a load of crap though for companies to require a case sensitive database for their applications that you purchanse. There is no reason that an application should be dependant on the sensitivity of a database. To me this is bad programming on the developers part. It does not take any extra effort to add case statement when comparing strings.

    I wish that all companies would design their applications against a case sensitive database so that they will work in both environments. I have downloaded many utilities and seen many articles that do not work on case sensitive database.

    Later

    Justin

  • It's a good point that tools (and sample code as we discussed earlier in the thread) often fail on case sensitive databases. Not sure how to fix this. For me to invest additional time in scripts or utilities to make sure they support case sensitivity - where is the ROI? If I'm a tool vendor, sure, but not for individuals or even corporations.

    Justin, I'm going to disagree with you about case sensitivity making you a better coder. To me, coding is getting the compiler to do what I need it to do as simply and as quickly as I can. When I look for "good" code, I'm looking for good comments, superb control flow, good readability of code, good use of available constructs and functions, code that reasonably optimized but NOT at the expense of clarity unless it's clearly in a very hard loop where every cycle counts. In my view, coding in a case sensitive language is just adapting to the tool set. With a GOOD tool/IDE, you probably wouldn't see the difference as much.

    I'm about ROI - for our companies to survive, we need to provide good solid maintainable solutions fast, or even faster. As long as your tool of choice will get the tool done as fast as my tool of choice, who cares? But when the tool set or the database is slowing things down, be prepared to justify it. This leads to my argument that setting a entire db to case sensitive is overkill, use it for columns or tables where it solves a problem instead of creating one.

    Gee, do I go on or what? Thanks for your input Justin. We'll have to agree to disagree on this one!

    Andy

  • I seem to be in the minority, but I always use binary sort order on my servers.

    For anything but, I do think that the performance penalty is somewhat severe (30-40%). Is it really worth buying that much nore hardware to compensate for this? If you have a high-throughput OLTP system, the best performance option is always binary sort. On the other hand, if the system does a lot of string operations as a contact management system might, I would probably go with case-insensitivity.

    I also agree with the view that case-sensitivity encourages cleaner code. While it is true that you can have a variable MyVar referenced as MYVAR and myvar, but would you want to do that? Is it not easier to maintain code that looks consistent? I think that it is much easier. This falls under the "a little more time up front save a lot of time down the road" category.

    - Jay

  • What types of things do you do that use binary sorts?

    Steve Jones

    steve@dkranch.net

  • Have you benchmarked to prove that there is really a 30-40% hit for going case insensitive?

    Andy

    http://www.sqlservercentral.com/columnists/awarren/

  • First, the easy question. I have worked on financial industry OLTP systems. Currently, we have a trading system that requires high-throughput for 50-100 concurrent users. The fastest possible database performance is the only option. A slow system means that deals could be lost or the prices become less favorable. In these systems, string processing is a much lower priority than outright performance.

    Now, as far as the 30-40% figure. That figure came from a benchmark that I had read some years back. I have seen more recent numbers, but I will have to do some digging to find the source. Anyhow, back in the SQL Server 4.x and 6.x, I definitely remember a figure of 35% binary over any other sort order. That number came directly from an MS support tech. More recently, I have found articles that state that, while binary is still the fastest, case-insensive is the next best choice.

    If I find the benchmark information I will post it. The best that I can do for SQL Server 2000 is quote MSDN: "In binary collations, comparisons and sorting are based strictly on the bit pattern of the characters. This is the fastest option."

    The difference does seem larger than it should be. Initially, I was under the same impression as you that the cost should be no more than a few clock cycle to handle the sort order differences internally. If you read the MSDN article about collations (http://msdn.microsoft.com/library/en-us/architec/8_ar_da_3xbn.asp), it gives some insight as to why there is a performance difference. When using dictionary order, not only does case-sensitivity come into play, but also accented characters and other localizations.

    - Jay

  • I dont disagree that a binary compare should be the fastest. Depending on the implementation of the collation certainly its going to be a few clock cycles slower and at that level maybe 30-40% is valid. But how much of the overall query time does that really add up to? Are you saying that a query that takes a min to run case insensitive will run in 30 secs or less with a binary sort?

    Seems to me that making columns case sensitive is the best of both worlds.

    Andy

    http://www.sqlservercentral.com/columnists/awarren/

  • A long time ago, Steve wrote:

    "Case Sensitivity is good for data quality. It ensures that names are entered correctly."

    I'd like to disagree with this statement. Case sensitivity in no way forces you to store names (or anything else for that matter) in a database in the Proper Casing - case sensitivity is not a constraint. If anything, it ensures that _when_ names are stored in the wrong case, they are almost impossible to find back, a problem you'll not have with case insensitive installations.

    Consider the standard procedure of entering names: Casing is usually checked at the client where the names are entered by the user, T-SQL just doesn't have the flexibility in string handling that most application development languages have, so this makes sense. When casing is checked by the client, it doesn't really matter if the database is case sensitive or insensitive, the client will insure the names are entered in the correct casing. Even if the database is case sensitive, the client app will still have to check for correct casing, because a case sensitive installation will not prevent anyone from entering a value in ALL CAPS or all lower case, or even CaMeL cAsE.

    Jeremy

  • One source for binary sort order being faster is Microsoft SQL Server 7.0 DBA Survival Guide by Spenik and Sledge. Page 92 - 'Binary sort order is the fastesdt of the sort orders; the other sort orders are 20 to 35 percent slower ...

  • Do they give any details? It makes sense that it would be faster, but how it affects overall query time/performance?

    Andy

    http://www.sqlservercentral.com/columnists/awarren/

  • I wouold agree with Andy's comment that making columns case-sensitive may be the best of both worlds.

    I guess that my real issue is that making the choice to implement a case-sensitve database should not in itself be a worst practice. Both sort orders are valid in the right situation. Not understanding why you chose one sort order over another is a worst practice. For that matter, not understanding any decision and its consequences is a worst practice (in databases, systems, and business in general).

    Jay

    - Jay

  • Cant argue with that really. Having the option to set column collations really makes SQL2K more than nice to have compared to the all or nothing in earlier versions.

    Andy

    http://www.sqlservercentral.com/columnists/awarren/

  • I can say that case-sensitive databases have been one of the biggest hassles at LockwoodTech. We roll out an app. which bombs on a customer's PC who has a case sensitive DB, and then have to diagnose all the case sensitive probs.

    The problem with case sensitive databases is you are dictating that any client app. that accesses your database must ALSO conform to your standards as well.

    I can see merits to those who are disciplined and prefer it (in SQL and other languages) but this option really adds extra time and hassle factor to those of us who don't, since we HAVE to now write case sensitive clients in order to work on 100% of installations.

    I keep a computer with a case sensitive database installed on it purely as a test machine to test for case sensitive problems in our applications. So now we run a Case Sensitive shop too but not from choice, rather necessity.

    Also - case sensitive conflicts between C# and VBScript "Get" vs "get" are also killing us and will result in pulling "Normalize Case" option from Proc-Blaster in next version until we come up with a workaround.

    Brian Lockwood

    President

    LockwoodTech Software

    Brian Lockwood
    President
    ApexSQL - SQL Developer Essentials

    http://www.apexsql.com/blog

    Stand up for an Independent SQL Community - be Informed

Viewing 15 posts - 16 through 30 (of 85 total)

You must be logged in to reply to this topic. Login to reply