You Better Learn to Work at Scale

  • Comments posted to this topic are about the item You Better Learn to Work at Scale

  • Great points Steve. I have been amazed at the rapid expanse of storage space as well. We used to be proud of a sector by sector disk backup that did 5MB in 5 minutes. Then one client said that they could not afford the disk space for SQL backups. I went to a local retailer and had an ad printed for 1TB external drives.

    As to files there is a system that digests transaction files like table scrap down a garbage grinder. Each processed file is placed in an archive folder. Daily those get zipped up one zip per day. We can go back and look at every transaction since the system went on-line.

    We had figured out that the NTFS in XP and Windows 7 is a database. Avoiding the "Hammer and Nail" thinking allowed us to produce systems that scale in both directions.

    I attended a technology demo at a large freight company whose headquarters were local to me. They were scanning in every document that came into the company and storing the images on very large optical disks. The documents were run through OCR off-line and that data was compared to the on-line systems. Impressive. It was almost two decades back by now.

    "Don't worry. We will never have more than 32767 customers." The old rules for area codes in the US was that the middle digit had to be a 1 or 0 and that no area code could cross a state line. Scale considerations will come to bite you sooner or later.

    The tough part is figuring out what to keep on-line and what can go off-line. We tend to want to keep everything. It takes making the tough decisions as to where to draw that line.

    ATBCharles Kincaid

  • What irks me is that there seems to be race to use eveyrthing that can be used up as soon as possible.

    Much to my chagrin, the developers are the ones chosen to develop the database (and not the DBAs). I am mantra-like in my telling the developers to use as small a datatype as possible. And that one customised index per stored procedure will make the DB slower not faster — especially once 200 SPs have been coded. They don't seem realise — or care — that when one third of a table size is data and two-thirds are indexes, that every write takes 3 times longer than without the indexes.

    Especially among the developers, there seems to be a belief that everyone will follow them. If the code that they write is slow, then it is up to the sysadmins to get new, faster hardware. If the DB is running slowly, well then, let the DBAs sort it out. They like that sort of thing.

    I want the DB that I manage to be as small as it can be— so no using NVARCHAR when VARCHAR will do nicely, no using DATETIME when only a DATE is needed and so on. I want it to be lean. I want to have as much necessary information on one fill-factor'd page as there can be. I don't want an obese database that hobbles along.

  • Sean Redmond (1/30/2015)


    What irks me is that there seems to be race to use eveyrthing that can be used up as soon as possible.

    Much to my chagrin, the developers are the ones chosen to develop the database (and not the DBAs). I am mantra-like in my telling the developers to use as small a datatype as possible. And that one customised index per stored procedure will make the DB slower not faster — especially once 200 SPs have been coded. They don't seem realise — or care — that when one third of a table size is data and two-thirds are indexes, that every write takes 3 times longer than without the indexes.

    Especially among the developers, there seems to be a belief that everyone will follow them. If the code that they write is slow, then it is up to the sysadmins to get new, faster hardware. If the DB is running slowly, well then, let the DBAs sort it out. They like that sort of thing.

    I want the DB that I manage to be as small as it can be— so no using NVARCHAR when VARCHAR will do nicely, no using DATETIME when only a DATE is needed and so on. I want it to be lean. I want to have as much necessary information on one fill-factor'd page as there can be. I don't want an obese database that hobbles along.

    Not all us developers are like that but too many are. The NVARCHAR/VARCHAR decision can be tricky if one has heard that the system might need to handle internationalisation, however, using DATETIME when only a DATE is required is not only wasteful but error prone and usually due to developers not keeping up with database features1.

    1Playing devil's advocate: it is difficult when you have your language, say C#, .NET Base Class Library, XML (and related standards XSLT etc.), T-SQL, JavaScript, Node.js (and the million other JavaScript frameworks). Types are a basic though and guidance from subject matter experts (DBAs, for example) should, at the very least, be considered.

    Gaz

    -- Stop your grinnin' and drop your linen...they're everywhere!!!

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply