SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


The Largest Database


The Largest Database

Author
Message
Steve Jones
Steve Jones
SSC Guru
SSC Guru (83K reputation)SSC Guru (83K reputation)SSC Guru (83K reputation)SSC Guru (83K reputation)SSC Guru (83K reputation)SSC Guru (83K reputation)SSC Guru (83K reputation)SSC Guru (83K reputation)

Group: Administrators
Points: 83005 Visits: 19223


It's not for me, but who out there wants to work on the largest database? The NSA is collecting phone records of calls made and brags that "It's the largest database ever assembled in the world."


If they're really recording all phone calls, then it has to be billions or rows. Heck, it's probably tens or millions or more a day! After all there are more than 200 million people in the US and with cell phones, home businesses, and prepaid calling cards, there's probably a good percentage of those people making a call a day.


Can you imagine having 10 or 20 billion rows of data that you want to index? How about testing a query that joins this table with a list of the people, a much smaller 200 million record table, and you forget the where clause? Or what about a genius manager asking for a list of people that didn't make a phone call between two dates? Probably have time for some nice coffee breaks there.


I've worked with smaller databases, 10s of GB or less, for most of my career and even on older hardware, things ran relatively fast. That includes backups, restores, broken queries, etc. Even cross joins of my largest tables might be annoying, but would run within minutes or an hour or two. It's also made me less concerned about some of the scale issues that the VLDB folks face, like maintenance windows, online operations, or even table recovery. In most of my jobs, if someone deleted a table, like me forgetting a WHERE clause, I could just restore the latest backup on another machine and then move the data across. Often I could even do this on the same server!!!


But that's not possible in a VLDB environment. Or even a large database. I had a 600GB or so database at JD Edwards and we didn't have space to restore that database onto another server to recover data. It would require a complete restore of the entire system. Fortunately it was a warehouse and could be reloaded without data loss if we ran into that type of situation.


VLDBs present some complex challenges to the simplest things that we take for granted. I can imagine that it's fun to work out solutions to some of the problems, like reindexing, defragmenting, and other tedious, mundane chores we take for granted on smaller databases. At least until something's broken.


I know it's not fun to try and fix something in that type of environment when you boss or his boss is watching over your shoulder.


Steve Jones

Follow me on Twitter: @way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com

adamcrv
adamcrv
Old Hand
Old Hand (346 reputation)Old Hand (346 reputation)Old Hand (346 reputation)Old Hand (346 reputation)Old Hand (346 reputation)Old Hand (346 reputation)Old Hand (346 reputation)Old Hand (346 reputation)

Group: General Forum Members
Points: 346 Visits: 1
impressive..wouldnt google be challenging for the largest database ???


------------------------------
Life is far too important to be taken seriously
Michael-284794
Michael-284794
SSC-Enthusiastic
SSC-Enthusiastic (134 reputation)SSC-Enthusiastic (134 reputation)SSC-Enthusiastic (134 reputation)SSC-Enthusiastic (134 reputation)SSC-Enthusiastic (134 reputation)SSC-Enthusiastic (134 reputation)SSC-Enthusiastic (134 reputation)SSC-Enthusiastic (134 reputation)

Group: General Forum Members
Points: 134 Visits: 1

"I know it's not fun to try and fix something in that type of environment when you boss or his boss is watching over your shoulder. "

Actually, I'd suggest that there are few environments in which it's fun to fix anything with your boss looking over your shoulder...



-----------------

C8H10N4O2
Anubis-237029
Anubis-237029
SSC Rookie
SSC Rookie (38 reputation)SSC Rookie (38 reputation)SSC Rookie (38 reputation)SSC Rookie (38 reputation)SSC Rookie (38 reputation)SSC Rookie (38 reputation)SSC Rookie (38 reputation)SSC Rookie (38 reputation)

Group: General Forum Members
Points: 38 Visits: 2
I'm part of a small group presently in the task of developing a database and application which has the ability to become and is projected to become a VLDB. At some points, we have seen growth per day of around 2 to 10GB or more of data.

This challenge is ongoing and has been for the past 7 years, using all sorts of software and several core changes, each with their own advantages and disadvantages. Overall, it all comes down to how you plan to use the data.

In our particualr system, we have found that there is around 40gb of data that's constantly changing and updating. The rest is simply colleting and being stored for later use or archiving. Archiving is easy as you can distribute this data across multiple machines as necessary, for example, one of our developmental archives is currently using around 100 DB servers each with MySQL (cost biased for this quantity) holding around 70Gb of archive data each.

We have found that through parallell access and indexing, we have achieved our desired result, but this more comes from how it's programmed. We knew beforehand how the data is going to be used and have developed out scale-out based upon that. Without knowing how the data will be used (to a large extent), it would be almost impossible to handle a database over 100Gb.

All in all, it really comes down to the developer(s) behind the infrastructure. Time is money, but a wrongly developed database or quickly written one can cost far much more in the long run when a query that should take 200ms to run takes 200seconds or 200minutes.

Databases are going to be something that will continue to develop over time and as technology gets better, we will find better ways of handling things. At the present time, I/O handlers and companies are holding database development back as this is the core part of a database, without I/O there is no DB, conversly if your I/O is slow, it doesn't matter how good the database is or how well it's indexed it'll be slow too.

I think in the scale of VLDBs you need to have a much greater planning and projecting before implemtation, it takes a much broader thinking and thinking out of the box to achieve success. Always remember the golden rule... "Nothing is impossible."
Robert Sterbal
Robert Sterbal
SSC-Enthusiastic
SSC-Enthusiastic (127 reputation)SSC-Enthusiastic (127 reputation)SSC-Enthusiastic (127 reputation)SSC-Enthusiastic (127 reputation)SSC-Enthusiastic (127 reputation)SSC-Enthusiastic (127 reputation)SSC-Enthusiastic (127 reputation)SSC-Enthusiastic (127 reputation)

Group: General Forum Members
Points: 127 Visits: 1

Seems like the critical measures of database size are row count, transaction per second, user count and model complexity.

Any thoughts on other metrics to measure size with?


drnetwork
drnetwork
Mr or Mrs. 500
Mr or Mrs. 500 (547 reputation)Mr or Mrs. 500 (547 reputation)Mr or Mrs. 500 (547 reputation)Mr or Mrs. 500 (547 reputation)Mr or Mrs. 500 (547 reputation)Mr or Mrs. 500 (547 reputation)Mr or Mrs. 500 (547 reputation)Mr or Mrs. 500 (547 reputation)

Group: General Forum Members
Points: 547 Visits: 82

Interesting. Use is an important factor in design.

My thought devolves to the legal uses of this data - not whether you can capture the data in the first place.

Does one have a legal right to data about your actions (in this case a phone call) that is captured by a governmental entity? Further, can you subpoena the data for legal use? Say you know you are the suspect of a murder investigation and you know the fact that you made a phone call could produce an alibi for you. Lacking that you look pretty guilty and might get injected. Could you subpoena that data for your defense? If you could do it for this purpose, where do you draw the line backwards? Could the database be subpoenaed in a divorce proceeding to better establish through the showing of routine and frequent phone calls that one of the parties was involved in an illicit relationship outside the marriage?





Sinisa Jazic
Sinisa Jazic
SSC Rookie
SSC Rookie (32 reputation)SSC Rookie (32 reputation)SSC Rookie (32 reputation)SSC Rookie (32 reputation)SSC Rookie (32 reputation)SSC Rookie (32 reputation)SSC Rookie (32 reputation)SSC Rookie (32 reputation)

Group: General Forum Members
Points: 32 Visits: 172

I have been dealing with real-time and near-time VLDB environments housing 500 to 1000 GB for the last 4 years and the following summarizes the size management strategies employed:

- VL table partitioning and recombining with updateable views

- multi-filegroups

- partition per filegroup

- indexes per partition

- weekly full reindexing

- daily stats updates

- archiving by rotating oldest partitions out of a view


Rudyx - the Doctor
Rudyx - the Doctor
SSCertifiable
SSCertifiable (6.1K reputation)SSCertifiable (6.1K reputation)SSCertifiable (6.1K reputation)SSCertifiable (6.1K reputation)SSCertifiable (6.1K reputation)SSCertifiable (6.1K reputation)SSCertifiable (6.1K reputation)SSCertifiable (6.1K reputation)

Group: General Forum Members
Points: 6141 Visits: 2503

I believe that the size of the database mentioned by Steve is off by at lease 2 orders of magnitude. Yes, two orders of magnitude. I know this from experience from working at a CLEC (a competing telco after deregulation). We were small, maybe a few million phone numbers. Our database table for completed (billed) phone calls was on the order of 1.6 billion rows ! We only kept 3 months of this detailed information online. The remainder was archived. This was just call billing information that was fixed and one could 'optimize' the 9's. I just cannot fathom how the government could even attempt to store, and better yet analyze this volume of data. Could you imagine, tables with trillions of rows ? Oh, and lest we forget our 'blobs. This is all of the digitized (recorded) phone calls. At lease one per call record. Then what about breaking this down (parsing) the digitized conversation into words for further analysis ? Soon trillions turns into quadrillions and quintillions. I just do not see how this information could be gathered, loaded and analyzed in any reasonable fashion to where it could provide any 'proactive' value.



Regards
Rudy Komacsar
Senior Database Administrator

"Ave Caesar! - Morituri te salutamus."
David Reed-223505
David Reed-223505
Old Hand
Old Hand (362 reputation)Old Hand (362 reputation)Old Hand (362 reputation)Old Hand (362 reputation)Old Hand (362 reputation)Old Hand (362 reputation)Old Hand (362 reputation)Old Hand (362 reputation)

Group: General Forum Members
Points: 362 Visits: 380

The production size issues for relational are one thing (and I agree with Rudy, I think the orders of magnitude are off), but the sizing is for use in development is another entirely different can of worms. It becomes a big problem when you can't create a completely isolated development environment when even a 10% sample data set won't fit inside a Virtual PC image.

I primarily do BI consulting and processing a OLAP cube on top of three quarters of a billion rows of development data is a PITA. Especially when you know that you're working with a 10% "sample" of real production health insurance claim data for a "small" regional health insurance carrier. It's a challenge to determine if your 10% "sample" is really representative or not...

20 billion rows is nothin'. I'll be the boys in Vegas who do all the "real" cutting edge BI work are playing with even larger numbers of rows by a couple orders of magnitude...

I'd love to know how many point of sale records Wal-Mart has in their warehouse. Extracts that I've seen in the past for just a few manufacturers' product lines at a time in selected parts of the continental US for just three months of data were in the 400M record range.

[Does somebody at the NSA think that it makes people feel better about the program to brag about the size of their database(s)? Doh!]


Bob Hoffman-209065
Bob Hoffman-209065
SSC-Addicted
SSC-Addicted (410 reputation)SSC-Addicted (410 reputation)SSC-Addicted (410 reputation)SSC-Addicted (410 reputation)SSC-Addicted (410 reputation)SSC-Addicted (410 reputation)SSC-Addicted (410 reputation)SSC-Addicted (410 reputation)

Group: General Forum Members
Points: 410 Visits: 262

What is really amazing is that I heard that the NSA does this all over a modem line.

You know, dial up..


Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search