Think like a DBA

  • Two things. (Does that make it .04 worth?) 

    Any developer that doesn't encapsulate the 'If Null make = '' ' into a static function used throughout their code needs a swift kick where their brain has settled to.

    Any developer or DBA that takes data that has been 'thrown over the wall' from another developer or DBA without running verification against it likewise needs a good atitude adjustment.  Memory, (silicon based), is cheap, the type subtle error this can gerenate is darn expensive.  Ask Microsoft how much their infamous buffer overruns have cost them.

  • Amen!  I'll take your .04 and raise you .02!

    Though I feel a static function may be a little excessive for one inline if or conditional op' statement, (less stack use and more flexible), your point is well (and humourously) made.

    It's all about getting into good habits, the kind of people that you describe are either newbies or not the kind of people who trawl through forums like these. C'est la vie.

    If anybody ever manages to get a figure out of Microsoft for this issue by the way, please post it here for my enjoyment.

    Tony

  • After reading the article and feedback to-date, in my mind the situation presented boiled down to this: from the programmer's perspective, you are storing one fact in one of two possible ways in the database (either as NULL or as ""), and somewhere this fact has to be accounted for programmatically.

    Thinking as a DBA [Aside: I have recently gone through a number of job interviews, and have re-discovered how all-encompassing and elusive in absolute definition the title "Database Administrator" is] thinking as a designer/architect, I'd want to design the structures to avoid this situation from the start, thus avoiding the need for code to scrub data with every read and write.

    If the situtation is such that you are storing two facts in one way each, but for some queries the two get consolidated down to one, the question becomes "how often does this happen"? Nine times out of ten? Half the time? One in a thousand? Figure this out, then guesstimate cost and return. If it has to be done all the time, put the code in close to the "bottom" of the architecture, preferrably (imho) as a stored procedure, or maybe a stored procedure with a switch parameter determining whether or not to cleanse. You'd have more database stuff to maintain (more and more complex procedures), but then nothing else "above" has to worry about this fiddly detail--producing less complex code, and lots of it. Conversely, if the routine gets done once in a blue moon, let that specific application do the work, and don't clutter/bog down everything else with pointless code.

    Sadly, software development can rarely be cast in black and white terms. Whichever tactic makes the most sense/performs best today may be rendered invalid by tomorrow's changing conditions or requirements. Odds are you won't have time to consider all the ins and outs, or you won't have the knowledge of the business to accurately predict future conditions; doing these requires time and <?>business savvy</?>, both of which are rare commodities in the DBAs supply cabinet. I'd pull out the old "it depends" argument, weigh things as best I can in the time alloted, make the decision and move forward from there.

          Philip

  • I do mostly contract/consulting development work, so even acting as a Deputy DBA (I like the term), I don't typically have long-term responsibilities for the database. But I like repeat work, so I try not to leave too big a mess.

    From a developer's perspective, most strongly-typed languages (C#, etc.) don't provide a lot of flexibility in dealing with nulls. If the lastName is declared as a string, you have to use a string. So the conversion issue must be dealt with.  From the database perspective, I must disagree with an earlier poster. The fact that your lastName is unknown does not make it and empty string. Your name is not blank. Null is a more correct representation.

    I like doing the conversion in the SQL. It seems simpler, and you always have some return value to work with. The biggest problem with using stored procedures for this sort of thing comes in debugging an application. Especially when a company doesn't have a solid staff of dbas and developers following good standardized practices (typical, right?) having some functionality in the database, some in the code, and some located who knows where makes it very difficult to maintain the app.

    Go ahead and sue me! Blood from a turnip...

  • Good article.

    Got me to read up on coalesce, which is pretty good since I've been using sql since 4.2!

    Thanks!

  • Barton

    I'm not sure exactly what you mean by 'flexibility in dealing with nulls'? I think C#'s IsDBNull method is pretty much as close as a language (strongly typed or otherwise) is going to get to coping with null values.  At the end of the day a value is either null or not, how flexible can you get with those constraints?

    Personally, if I was just going to cast null values to a zero length string every time I encountered them, I'd save myself the hassle, mark the field as non-nullable and put a zero-length string in the db, problem solved.

    I don't think there's a right or wrong answer, perhaps we'll have to leave it as null

    Tony

  • Well folks, personally I think that since a server needs to process multiple similtaneous requests (sometimes 1000s), why put any more strain on the thing. Almost all client machines are quite capable of processing NULL values and all good programmers handle this in their code. Distribute the load I say.

    If you're programmer is going to do a comparison (i.e. If isnull(..)) then why not speedup the process and do the assignment anyways i.e. strval = "" & rs(field).value ?!!

  • I like the possibility of sps to pre validate and pre format the result set in a way that makes front end code cleaner and easier to maintain, only when applications get bigger, and they need scalability and development is spread across different dev. teams doing different layers, it easily becomes complicated to scale your appl. and devide programming tasks across teams. therefore sometimes making it a nightmare to have a lot of formatting and validation code in sps. (Although as a DBA there is one exception here, insert and update sps ALLWAYS contain validation code in my DBs )

     Now to the null validation part:

    As Deden, pointed out earlier in this thread, Null means unknown and '' means we know that value is not present. as with deden’s example, we either don't know if a person has a 3rd name at all(null) or we DO know that the person hasn't got a 3rd name(''). this is a crucial difference. and depending on the client requirements either the application should be aware of this difference on occasion (in that case sps can't replace null values) or maybe on other occasions the application doesn't need to be aware of this difference (in that case the sp could replace null values).

    Personally I have never implemented the null replacement in sps, and I don't think I'll ever will, since I never know if the application needs to be aware of the difference between unknown and known  in the near future.

    lastly replacing null values in a database with empty strings for me is a worst practice, it’s basically lying. It’s like telling someone you know the answer to their question when in fact you don’t have a clue.

    Edward Dortland

     

  • I cant say enough how pleased I am with the replies I've seen so far. Good articles should generate thought - and thoughtful comments!

    Couple comments I would like to speak to though. One is that coalesce is a great function and worth knowing, but I prefer isNull when that's the only behavior I need - I just think it reads more clearly. The other is that I would never advocate storing empty strings instead of nulls (and you cant avoid nulls, regardless of what you consider good db design).

  • 99% of the time I use ISNULL.  It does what it says on the tin.

    COALESCE is useful when you have to have some form of construct that is IF @a IS NULL ELSE IF @b-2 Is NULL.

     

  • I too am what's been referred to as a Deputy DBA, and over the past 2 years have really come around to using SPs as much as possible.

    I'd like to add a new perspective with regards to ease of deployment. In our particular case, we have different web applications that use different databases ALL of which have a core structure which is always the same. In that case, the advantage of putting the handling of nulls in the SP, is that our middle layer DLL always remains the same - and does not require different versions for each client application - and we only need to tweak the relevant SPs in the database to return '', 'This is empty', 'etc. etc' depending on what we want for each client app.

    As for the more general question of storing empty strings instead of nulls (and you cant avoid nulls, regardless of what you consider good db design), as a Deputy DBA I would say you can easily avoid nulls. Just set defaults for the field to 0, '', or whatever else. This seems a valid alternative in cases in which null ONLY means "value not known". Or am I being too simplistic.....as a Deputy DBA ?

  • i've been programmer (VB6, .Net, ASP, using COM+ and WEB SERVICE) and i understood when you make this kind of app (3-layers), app is a "slave" of data repository (DB2, SYBASE, INFORMIX, SQLSERVER, EXCEL even MainFrame).

    the app retrieves null values .....who cares ?? this languages, all of them, have facilities to manipulate any values (null, date, int, long, etc).

     

  • If you are going to block NULLs from the database you have to put checking in the client layer to ensure that there is no way that a user can trick the application into trying to write NULLs.

    Remember users are like rats.  If there is a gap they can get through it

    The amount of error checking that you put in the database is really dependent on what your application is for and who is going to use it.

    For exampe, if you were writing a web-site that considered Amazon a small subsidiary then you would probably have very little error checking in your database simply because the performance penalties would be too high.

    If you are running a database that had several small departmental applications accessing it then you would probably put more error checking in the database.

    In fact I would say that if you were rolling out multiple applications accessing shared databases then you almost have to have the error checking in the database simply because a bug in one app could corrupt the data in the others.

  • i've been programmer (VB6, .Net, ASP, using COM+ and WEB SERVICE) and i understood when you make this kind of app (3-layers), app is a "slave" of data repository (DB2, SYBASE, INFORMIX, SQLSERVER, EXCEL even MainFrame).

    the app retrieves null values .....who cares ?? this languages, all of them, have facilities to manipulate any values (null, date, int, long, etc).

     

  • A Happy DBA is a ....... rarity maybe ;-). I think that the original point about thinking differently is true. How people think depends very much on the background they bring with them - lets face it there are DBAs who were programmers and DBAs who were Systems Analysts and yet others who were originally network support people. I cannot think of anyone I have met in recent years who didnt come with a lot of IT experience from some other IT discipline and assumed the role for one reason or another. The best ones I have come accross end up 'thinking like a DBA'. I would define this as someone whose sees the role as a service and whose focus is on the shared resource before anything else.

    There is a sense in which MS SQL Server is very unlike its compettitors especially DB2 and Oracle in that their legacy apps have been designed with the shared resource in mind and 'our' legacy apps have been designed with the client in mind. How many database designs do you have that originated with Microsoft Basic, moved on to Ashton Tate's Dbase3+, migrated to clipper or foxpro, on to access and have ended up on SQL server. A good portion of the databases that we deal with on a day to day basis have this or similar lineage.

    Who cares about nulls - anyone who thinks about the shared resource. It is true you can handle nulls at design time or at run time at the client or at the server - at run time - at the server costs in terms of database performance. ThiMk shared resource!

Viewing 15 posts - 16 through 30 (of 33 total)

You must be logged in to reply to this topic. Login to reply