Does size really matter?

  • Looking back to my interviews, I've always been asked what's the size of the db I had managed in my previous positions. But I truly never understood the point of this as I think over my past interviews.

    This question comes up as the latest database I had worked with comes up with 4.5gb but the majority of the data composing this database is images and files stored within the database.

    While another database I manage has a greater volume of text data in comparison only makes up 23mb.

    So what is the point of asking the size of the db?

  • I would guess this would be a question to help determine your ability in writing optimized queries/procs. As the size of the DB grows so would the need for optimized queries.

    At least, that's my opinion. 😀

    ______________________________________________________________________

    Personal Motto: Why push the envelope when you can just open it?

    If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.

    Jason L. Selburg
  • It matters after a point. I'd say if you've never managed anything larger than 50GB, and you're asked to run a 500GB database, you might not be prepared.

    At some point the size matters because you can't work with one file, on captive disk. Backup / Restore becomes an issue, index rebuilds, etc. become problematic because they take so long to complete. Development environments become a problem in trying to duplicate production, and more.

    It's like me asking you to paint a bridge. You've painted them before, but if they've all been < 100ft long, you don't have an appreciation for painting the Golden Gate bridge. That's a full time, year round job that never ends.

  • Size is absolutely a major factor in determining experience. In addition to what Steve mentioned about backups, how you seperate out file storage, indexes, DBCC checks, data loads, reporting, statistics, reindexing, index fragmentation, execution plans, all this changes as the data scales. As you go from 20gb to 100gb, you open up new isues, as you go from 100gb to a tb, all kinds of other issues arise. Past that, I can't honestly say since I haven't supported anything bigger than 800gb. Suffice to say, every maintenance routine that we ran on the 800gb database is different than the routines I run regularly on the 50gb databases I'm maintaining today.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • That makes sense. I haven't really reached more than 100gb in my career thus far and the one that came to 100gb was a train wreck of a db.

    Well, when I do chose to move on which won't be a for a while, I'll have to remember that.

    Thanks everyone.

  • 😀

    Those replies just show how much I know, but then again, I haven't had any experience with large DBs.

    😀

    ______________________________________________________________________

    Personal Motto: Why push the envelope when you can just open it?

    If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.

    Jason L. Selburg
  • Yes, it does matter. In one job I had a database that was 325 GB. I had an array of 28 disks (each 73 GB). And I had a tape library directly attached. I did backups to disk then to tape. The backup to tape took 6 hours. Where am I going with this? My next job had a database with 60 GB. They took over 12 hours to back it up. One person said they needed to cut back on the amount of data stored. I said the problem was that they were backing up to disk and then over the network, through a firewall to tape. If I had not had the experience I did, I might have gone along with the amount of data being the issue. However, I did have experience with a large database and was able to suggest that they get a dedicated tape library inside the firewall and back up the files to that.

    -SQLBill

  • I had a 700GB database and it ran fairly well, but we had 5 or 6 filegroups, mulitple arrays and we had to be careful with maintenance. And we had issues trying to get stuff back to development because we didn't have another server that size.

    You can keep your TB dbs. I prefer to sleep at night.

  • SQLBill (12/21/2007)


    Yes, it does matter. In one job I had a database that was 325 GB. I had an array of 28 disks (each 73 GB). And I had a tape library directly attached. I did backups to disk then to tape. The backup to tape took 6 hours. Where am I going with this? My next job had a database with 60 GB. They took over 12 hours to back it up. One person said they needed to cut back on the amount of data stored. I said the problem was that they were backing up to disk and then over the network, through a firewall to tape. If I had not had the experience I did, I might have gone along with the amount of data being the issue. However, I did have experience with a large database and was able to suggest that they get a dedicated tape library inside the firewall and back up the files to that.

    -SQLBill

    Wouldn't that be more of a network issue rather than a database issue and fall within the domains of the network admin rather than the db admin? Though if you're working in a house where the network admin is the same person as a db admin, that could be the acception since the network admin and the db admin is one and the same person.

  • I'm doing DBA tasks of our clients (15). At the moment they have 2 - 3 Gb but growing, specially the ones that save PDFs. But the maintenance plan is quite easy for their small size. Backups when I want, reindexing when I want... really simple.

    I would like to try to manage larger DB, but I know that I'm not prepared for that. So, my question is: to get in charge of larger databases, do I need to apply for a junior DBA role?

    Josep

  • This post has opened up a really great set of dicsussions. To capture many on th points raised:

    - Size always matters. Small generally means nimble and flexible. Large brings up more physical barriers.

    - As things get larger, you need to spend more time optimizing the system along different (and often conflicting) lines. Backups, query optimizations, updates, standards, etc. all need to be balanced.

    - The ability to optimize existing resources is a skill. Splitting I/O across multiple devices, changing the physical design, maintaning existing designs (indexes, etc.) are all key abilities. There's no one answer (of course), but depends on teh situation and objectives.

    Since a common goal is to always improve the performance aspects of any system (when has anyone ever asked to slow the system down), database size is simply an opening question. More important is when have you identified performance issues and what strategies did you put in place to resolve the problem(s). Some strategies don't work so well as size changes, so understand the limitations of any approach both in machine terms and staffing/cost. Demonstrate an ability to be flexible in your approach and an openess to discuss new ideas, and you'll breeze past any concerns about size. (The question only opens indicates your ability to solve an existing problem, not the one that'll come up tomorrow.)

  • dominic cloud (12/23/2007)


    SQLBill (12/21/2007)


    Yes, it does matter. In one job I had a database that was 325 GB. I had an array of 28 disks (each 73 GB). And I had a tape library directly attached. I did backups to disk then to tape. The backup to tape took 6 hours. Where am I going with this? My next job had a database with 60 GB. They took over 12 hours to back it up. One person said they needed to cut back on the amount of data stored. I said the problem was that they were backing up to disk and then over the network, through a firewall to tape. If I had not had the experience I did, I might have gone along with the amount of data being the issue. However, I did have experience with a large database and was able to suggest that they get a dedicated tape library inside the firewall and back up the files to that.

    -SQLBill

    Wouldn't that be more of a network issue rather than a database issue and fall within the domains of the network admin rather than the db admin? Though if you're working in a house where the network admin is the same person as a db admin, that could be the acception since the network admin and the db admin is one and the same person.

    It's both. But it starts with the DBA knowing what can and can't be done. One DBA thought the problem was the size of the database and that the solution was to get rid (archive) some of the data. With my experience, I knew the problem wasn't the size of the database and was able to look for issues elsewhere. Which type of DBA would you rather work with - one who assumed the problem was with the size of the database or one who was familiar with different database sizes and considered other issues?

    -SQLBill

  • Except for one brief mention somewhere above, everyone is talking about the physical aspects of those things needed to maintain a large database and, of course, all the tricks you need to know to maintain it and splitting files, etc, etc. While those are certainly important, I'd like to stress that performance is not all hardware, file, RAID, memory, and index optimization... doesn't matter how well you do any of that if you let RBAR/performance challenged code into your system. One nasty little triangular join as either a correlated subquery or in the Where clause on a large table... one silly RBAR proc on steroids... one really poorly written trigger... one really badly written Sequence table update (caused an average of 640 deadlocks per day on our box)... one long winded explicity declared transaction... one clustered index on the wrong column(s)... even an innocent looking update written like the following...

    UPDATE Sometable

    SET somecolumn = t1.somecolumn

    FROM Anothertable t1

    WHERE Sometable.someothercolumn = t1.someothercolumn

    ... can bring performance to it's knees and not just for the session causing the problem.

    By the way, the above query caused 4 - 3Ghz CPU's to peg for 2 hours because SomeTable is not in the (2nd) FROM clause... only occurs when parallelism occurs... took us 2 days to figure out the problem 'cause it just didn't look like it could be the problem. Once we rewrote it properly, it executed in about 20 seconds...

    UPDATE Sometable

    SET somecolumn = t1.somecolumn

    FROM Anothertable t1

    INNER JOIN Sometable t2

    ON t2.someothercolumn = t1.someothercolumn

    ... and, no, it wasn't the ANSI join that fixed it 'cause this worked as well...

    UPDATE Sometable

    SET somecolumn = t1.somecolumn

    FROM Anothertable t1,

    Sometable t2

    WHERE t2.someothercolumn = t1.someothercolumn

    Heh... better than that, we didn't actually have to buy any hardware to fix our performance problem. Wouldn't have helped, anyway, 'cause the next day, the same thing happened on an 8 CPU system except that all 8 CPU's went to the wall.

    If you have a couple of tables with millions of rows, it doesn't matter how big the database gets... you have to make sure the code is right before it gets promoted to production, doesn't have any "Hidden RBAR", and has a stong basis in correct set-based methods. Actually, if you're like me, you write code as if there were at least a million rows in every table... even if there are only a couple thousand rows.

    Size does matter... that's why I always write code as if there where huge amounts of data from the 'git' on a new database. If you can keep bad/non-scalable code out of the database, the hardware and all that other stuff will just about take care of itself.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply