The Number that shouldn't be a number

  • I'm a bit late to the party, but what a party it is. We manage about 30 systems from external vendors. Some are better than others but all have failings, so I could really go to town here. You'd go a long way to beat the following:

    One system stores sex as an int... well sort of. It actually stores it as two int columns. One int column is called Male and the other column is called Female. If you're a male the column values are (hopefully) 1,0 and if you're female they're 0,1. So, correct me if I'm wrong, I think this means they're taking 8 bytes of storage to record each person's sex. As far as I can make out, there's theoretically about 4 billion types of Male you could be and 4 billion types of female, and that number squared for the number of combinations of male and female you could be. I think there's some pretty funky data validation code built into the front end to keep everything lined up.

    We have only one system with with the clustered guid problem but some of the tables store the guids as varchars and some of the varchar guids are enclosed in curly brackets and some are not... in the same column. That system also has tables that store date in one column and time in another column (both of type datetime) but there's a gotcha. The times are (usually, but not always) attached to a notional date of 30 December 1899. No idea why.

    We have one system that stores its dates as nvarchar(255) and at different periods the strings have had different formatting (eg "1 July 2000" and "1/7/2000").

    I could go on. Question is why does this scenario play out over and over again? It seems that my industry (hospitals) has really struggled over the last 30 years in getting from paper data to electronic data. Sickness is a complex beast to quantify. Enterprising clinicians, frustrated at the slow pace of change, have played around with a bit of Access or File Maker Pro or ASP or whatever, and come up with an endless stream of small, specialised, backyard hacky systems that they've convinced management to let them use. Once they're established, two things happen: 1) The clinician hives off a little startup company and starts selling the product to his or her colleagues at other hospitals and 2) IT inherits the ongoing care of these systems and gets told they're "mission critical". Two things NEVER happen. 1) The database is NEVER designed by a competent data architect, 2) The decision to allow these products into the hospital IT ecosystems NEVER involves people who will have to maintain or get data out of them . It is an endless source of amusement though I suppose.

    ...One of the symptoms of an approaching nervous breakdown is the belief that ones work is terribly important.... Bertrand Russell

  • I agree with GPO, who asks "Question is why does this scenario play out over and over again?"

    I think one of the reasons, which GPO touches on, is an accumulation of small independent applications. Another, is where data gets migrated from one database manager to another, or _especially_ from files to a database manager (I wonder if that is how the "Male/Female" 2 integers came about? I've never come across that one!).

    Regarding "small applications", I've created a few in my time, and I've probably transgressed a few times too. The problem isn't so much the independent applications as when they grow, multiply and then try to join up with other applications. One answer is "data modelling" which I used to claim to do, but that answer never used to be acceptable. It was always "too much trouble" for "too little to show for it". I guess another question to add to GPO's is "Someone having got into the situation, how do we get them (and ourselves) out of it?" I don't expect any obvious answers, I've been struggling with that one for years!;-)

    Tom Gillies LinkedIn Profilewww.DuhallowGreyGeek.com[/url]

  • I also agree with GPO. I think a lot of applications start this way, even really large commercially sold packages. I am dealing with a string of "data warehouses" that have been adopted by a number of departments and they are really nothing more than sets of work tables cobbled together by business analysts who used ms excel to pull together data for reports and converted it to SQL server with no idea how a relational database works. They crash often and the developer is called a hero because he puts in 80 hour weeks to keep the system going with manual tweaks and data fixes.

    Personally I like to build things that don't require constant babysitting to keep going. I guess these guys figure the company will not need them if they automate things or that they just don't know how to do it any differently.

  • GPO (2/8/2015)


    I think there's some pretty funky data validation code built into the front end to keep everything lined up.

    And does it work? Or do you have data that makes no sense?

    I could go on. Question is why does this scenario play out over and over again?

    Good question. I suspect part of it has to do with how 'easy' it is to learn to code. Hell there's enough 'learn to program in 21 days' or similar programs and books around. Sure, it teaches people to code, but there's a big difference between coding up something to store your DVD collection and writing a multi-user, scalable, extensible system.

    I can build a garden shed that won't fall down, that doesn't mean I can build a bridge.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • jarick 15608 (2/8/2015)


    ...Personally I like to build things that don't require constant babysitting to keep going. I guess these guys figure the company will not need them if they automate things or that they just don't know how to do it any differently.

    Job security. I wouldn't speculate on how many people think in those terms, but it definitely is a problem where people do things like this just to remain employed.

    Last year I had lunch with some people that I worked with back in the '90s and ran in to someone whom I'd done a little Access grant tracking database, he said he still used it and it still worked like a champ. It was nice to know that it was still running over a decade later.

    -----
    [font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]

  • One system stores sex as an int... well sort of. It actually stores it as two int columns. One int column is called Male and the other column is called Female. If you're a male the column values are (hopefully) 1,0 and if you're female they're 0,1. So, correct me if I'm wrong, I think this means they're taking 8 bytes of storage to record each person's sex. As far as I can make out, there's theoretically about 4 billion types of Male you could be and 4 billion types of female, and that number squared for the number of combinations of male and female you could be. I think there's some pretty funky data validation code built into the front end to keep everything lined up.

    That beats my example into a cocked hat - we have two boolean fields "IS_LATEST" and "IS_IN_SOURCE" each stored as INT. Words fail me.

  • t.pinder (2/9/2015)


    One system stores sex as an int... well sort of. It actually stores it as two int columns. One int column is called Male and the other column is called Female. If you're a male the column values are (hopefully) 1,0 and if you're female they're 0,1. So, correct me if I'm wrong, I think this means they're taking 8 bytes of storage to record each person's sex. As far as I can make out, there's theoretically about 4 billion types of Male you could be and 4 billion types of female, and that number squared for the number of combinations of male and female you could be. I think there's some pretty funky data validation code built into the front end to keep everything lined up.

    That beats my example into a cocked hat - we have two boolean fields "IS_LATEST" and "IS_IN_SOURCE" each stored as INT. Words fail me.

    Agreed. I also really hate to see negative flag columns such as "IsNotValid".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • haven't seen any remarks about concatenated ( meaningfull ) keys yet.

    Recently had to tune a query full of substrings because pos 2 to 4 had a originating status code and position 5 to 7 contained the resulting status code and so on.

    :rolleyes:

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • So, we have a DB with 1600+ tables - each with the two boolean fields I mentioned above. Can I change them all to bit? How?

  • t.pinder (2/10/2015)


    So, we have a DB with 1600+ tables - each with the two boolean fields I mentioned above. Can I change them all to bit? How?

    ALTER TABLE ... ALTER COLUMN, once you've done the complete impact analysis, fixed everything that will be affected by the change, ensured that there really are no values other than 0 and 1 in the column and have ensured that all applications that use the database have been tested with the new data types and any problems fixed

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Well, it's a test server so that's exactly the right place to try it out :unsure:

  • Had to jump in on this.

    The best (worst?) system I have dealt with had every table consisting of an identity clustered PK, a created date time, a created by user and an XML field.

    The XML contained all of the data. And, it was not consistent. One row contained X number of nodes, another time it was Y number of nodes.

    I was tasked with tuning this. They expected me to perform the miracle of adding indexes. The BI/reporting people had lots of fun too.

    Oh, and the XML was stored in a text datatype.

    Needless to say, the originally delivered system was thrown away in a short period of time.

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • GPO (2/8/2015)


    .......Sickness is a complex beast to quantify. Enterprising clinicians, frustrated at the slow pace of change, have played around with a bit of Access or File Maker Pro or ASP or whatever, and come up with an endless stream of small, specialised, backyard hacky systems that they've convinced management to let them use. Once they're established, two things happen: 1) The clinician hives off a little startup company and starts selling the product to his or her colleagues at other hospitals and 2) IT inherits the ongoing care of these systems and gets told they're "mission critical".

    Or the one site product that was built like you mentioned is picked up and actually reused and mingled with three other similar products. All under the label that they were created by and with direct industry input. This is done over 10 years time by an off shore. That swears their insane lack of non clustered indexing is a performance boost.

    Then once the users are unable to use the product that their IT staff can no longer support, the offshore offers the company a huge discount on their fast always-on solution that they run on their servers or in the cloud.

    So now all of the medical data that trillions was spent to keep secure and from just anybody anywhere being able to access is in a place where that can happen at any time.

    But they saved so much money doing this so it must have been right.

    In the end it is expensive to build something well.

    The truth is this. It is too easy to convince most businesses to buy a used 20 year old Honda that has emissions 1000 times worse than this years Escalade ( and less MPG) and swear that they have a Green car that has a lower TCO.

  • bdcoder (1/31/2015)


    Ha! - store *every* column as VARBINARY(max) and never worry about data types again!

    You may laugh, but actually SQL Server's symmetric key encryption requires that any encrypted columns be of datatype VARBINARY. At least that's the case for v2008. So, if you see something like BirthDate VarBinary(40), then that's what's going on.

    To make this a bit more civilized for the application and ad-hoc query users, for each table containing PHI (Protected Health Information) columns, I have a created view that wraps the DecryptByKey() and Cast() functions around the encrypted columns. The user opens the symmetric key within their session and then queries the views, which return the PHI columns in their normalized datatypes (ex: BirthDate is Date and PhoneNumber is VarChar). From the user's perspective nothing looks out of ordinary. It's fine unless they try to search on BirthDate or PhoneNumber, which presents a problem because these computed columns are not indexable.

    If v2014 has introduced a more elegant way of implementing Symmetric Key Encryption (not TDE) on columns without use of DecryptByKey() and Cast() functions, or if someone has a better suggestion, then I'd love to hear about it.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Thank you for bringing up Domains. Most developers have never even *heard* the term. I'm convinced that the relational model is no longer taught in CS or IT classes. They may teach SQL, but usually just wave it away as "the place your objects get persisted."

    The lack of foundation knowledge is killing us.

Viewing 15 posts - 61 through 75 (of 113 total)

You must be logged in to reply to this topic. Login to reply