• While I agree in general principle with the gist of the article, I don't think everyone needs to run into work this weekend and take their servers down and rebuild all their tables with the 'correct' data types. Again, I do agree in principle and it is a great article and the points are things that data modelers should take to heart.

    But for existing systems, I'm going to put out a slightly contrairan viewpoint here for arguments sake.

    Everything is relative. Large organizations that have huge databases generally have larger budgets and the reality is the even at enterprise disk-space cost, a savings of a few bytes per row--even if this savings could be achieved in every table in the system--is not going to make a CFO jump for joy and run out and by that yacht.

    And lets look at the real savings (not talking about i/o yet--just disk cost). Let's say in every table in the system, you find a couple datetimes that could be Date instead (4 byte savings per field) a few varchar(2) that could be char(2) (1 byte saved). Lets throw in a few more to get really generous and say you found a lot of little changes and saved 30 bytes per row in every table and index in the entire system.

    Now let's say there's a few detail tables that have 150 million rows. Maybe some audit tables with a another few hundred million. A hundreds of other smaller tables...lets say you have a total of a billion rows--no--how about 10 billion rows. I know that many organizations have even more than that but again--everything is relative. So 10 billion row--and things were so bad that you saved 30 bytes for every row in every table in the entire system. What have you saved? 300 gigs if my math is right. How much is that in enterprise disk cost? And remember--this is an organization who's in-house system has 10 billion rows (and we're not talking about 3rd party systems like SAP with over 100,000 tables or things like MS ISA which can generate many billions of rows in even an average company).

    Are you going to run to the powers that be and tout how much you're going to save after you...what?...take down the servers for how long?...Test the applications with the new data types--and what about those apps that now need to be rebuilt because, say, smalldatetime will now error when a vb app passes its version of null...go through all that to rebuild your applications and tables for how much savings?

    And yes, I get the part about I/O cost. Again, relativity, and pick the low-hanging fruit. If there are butt-stupid models out there in which the modeler was so lazy he had a default type of char(50) and left that in for all his "status" fields--of course by all means I'd make the time to fix that. But as many WTF's that might be out there in data models, the reality is that most places won't have that much low-hanging fruit (If you you can beg to differ on that one, then your company probably has more pressing IT issues than just this). If you're experiencing massive i/o delays on the server then of course--if you see low hanging fruit like that then you've probably found the cause.

    But a more 'expensive' thing in terms of user productivity is going to be network bandwith. And no matter how efficient you are in the data model, when you return the rows over the pipe the application is going to get the same amount of data--whether the db has to chug a few more i/o cyles to put it together or not--the network is going to see the same amount of data. That's where application-level WTF's can save bandwith that's much more costly than some extra spindle time, but that's another article.

    Really though, excellent info and by all means use it if you're in the modeling stage, but if you've got an existing system, you really need to do the math and see how much you're really going to save and if you have the resources to do this.

    --Jim