Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

The Largest Database Expand / Collapse
Author
Message
Posted Tuesday, May 16, 2006 6:55 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: Administrators
Last Login: Yesterday @ 5:38 PM
Points: 31,355, Visits: 15,819


It's not for me, but who out there wants to work on the largest database? The NSA is collecting phone records of calls made and brags that "It's the largest database ever assembled in the world."


If they're really recording all phone calls, then it has to be billions or rows. Heck, it's probably tens or millions or more a day! After all there are more than 200 million people in the US and with cell phones, home businesses, and prepaid calling cards, there's probably a good percentage of those people making a call a day.


Can you imagine having 10 or 20 billion rows of data that you want to index? How about testing a query that joins this table with a list of the people, a much smaller 200 million record table, and you forget the where clause? Or what about a genius manager asking for a list of people that didn't make a phone call between two dates? Probably have time for some nice coffee breaks there.


I've worked with smaller databases, 10s of GB or less, for most of my career and even on older hardware, things ran relatively fast. That includes backups, restores, broken queries, etc. Even cross joins of my largest tables might be annoying, but would run within minutes or an hour or two. It's also made me less concerned about some of the scale issues that the VLDB folks face, like maintenance windows, online operations, or even table recovery. In most of my jobs, if someone deleted a table, like me forgetting a WHERE clause, I could just restore the latest backup on another machine and then move the data across. Often I could even do this on the same server!!!


But that's not possible in a VLDB environment. Or even a large database. I had a 600GB or so database at JD Edwards and we didn't have space to restore that database onto another server to recover data. It would require a complete restore of the entire system. Fortunately it was a warehouse and could be reloaded without data loss if we ran into that type of situation.


VLDBs present some complex challenges to the simplest things that we take for granted. I can imagine that it's fun to work out solutions to some of the problems, like reindexing, defragmenting, and other tedious, mundane chores we take for granted on smaller databases. At least until something's broken.


I know it's not fun to try and fix something in that type of environment when you boss or his boss is watching over your shoulder.


Steve Jones






Follow me on Twitter: @way0utwest

Forum Etiquette: How to post data/code on a forum to get the best help

Post #280578
Posted Tuesday, May 16, 2006 10:35 PM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Monday, September 3, 2007 10:17 PM
Points: 210, Visits: 1
impressive..wouldnt google be challenging for the largest database ???




------------------------------
Life is far too important to be taken seriously
Post #280597
Posted Wednesday, May 17, 2006 4:57 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Wednesday, June 27, 2007 2:23 AM
Points: 64, Visits: 1

"I know it's not fun to try and fix something in that type of environment when you boss or his boss is watching over your shoulder. "

 

Actually, I'd suggest that there are few environments in which it's fun to fix anything with your boss looking over your shoulder...




-----------------

C8H10N4O2
Post #280639
Posted Wednesday, May 17, 2006 5:30 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Thursday, October 27, 2011 5:04 AM
Points: 20, Visits: 2
I'm part of a small group presently in the task of developing a database and application which has the ability to become and is projected to become a VLDB. At some points, we have seen growth per day of around 2 to 10GB or more of data.

This challenge is ongoing and has been for the past 7 years, using all sorts of software and several core changes, each with their own advantages and disadvantages. Overall, it all comes down to how you plan to use the data.

In our particualr system, we have found that there is around 40gb of data that's constantly changing and updating. The rest is simply colleting and being stored for later use or archiving. Archiving is easy as you can distribute this data across multiple machines as necessary, for example, one of our developmental archives is currently using around 100 DB servers each with MySQL (cost biased for this quantity) holding around 70Gb of archive data each.

We have found that through parallell access and indexing, we have achieved our desired result, but this more comes from how it's programmed. We knew beforehand how the data is going to be used and have developed out scale-out based upon that. Without knowing how the data will be used (to a large extent), it would be almost impossible to handle a database over 100Gb.

All in all, it really comes down to the developer(s) behind the infrastructure. Time is money, but a wrongly developed database or quickly written one can cost far much more in the long run when a query that should take 200ms to run takes 200seconds or 200minutes.

Databases are going to be something that will continue to develop over time and as technology gets better, we will find better ways of handling things. At the present time, I/O handlers and companies are holding database development back as this is the core part of a database, without I/O there is no DB, conversly if your I/O is slow, it doesn't matter how good the database is or how well it's indexed it'll be slow too.

I think in the scale of VLDBs you need to have a much greater planning and projecting before implemtation, it takes a much broader thinking and thinking out of the box to achieve success. Always remember the golden rule... "Nothing is impossible."
Post #280642
Posted Wednesday, May 17, 2006 6:30 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Wednesday, August 8, 2007 2:25 PM
Points: 53, Visits: 1

Seems like the critical measures of database size are row count, transaction per second, user count and model complexity.

Any thoughts on other metrics to measure size with?

Post #280661
Posted Wednesday, May 17, 2006 7:03 AM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Wednesday, November 17, 2010 3:38 AM
Points: 445, Visits: 82

Interesting. Use is an important factor in design.

My thought devolves to the legal uses of this data - not whether you can capture the data in the first place.

Does one have a legal right to data about your actions (in this case a phone call) that is captured by a governmental entity? Further, can you subpoena the data for legal use? Say you know you are the suspect of a murder investigation and you know the fact that you made a phone call could produce an alibi for you. Lacking that you look pretty guilty and might get injected. Could you subpoena that data for your defense? If you could do it for this purpose, where do you draw the line backwards? Could the database be subpoenaed in a divorce proceeding to better establish through the showing of routine and frequent phone calls that one of the parties was involved in an illicit relationship outside the marriage?




Post #280676
Posted Wednesday, May 17, 2006 8:22 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, November 11, 2014 12:02 PM
Points: 5, Visits: 110

I have been dealing with real-time and near-time VLDB environments housing 500 to 1000 GB for the last 4 years and the following summarizes the size management strategies employed:

- VL table partitioning and recombining with updateable views

- multi-filegroups

- partition per filegroup

- indexes per partition

- weekly full reindexing

- daily stats updates

- archiving by rotating oldest partitions out of a view

Post #280707
Posted Wednesday, May 17, 2006 8:27 AM
Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Monday, December 15, 2014 4:01 PM
Points: 3,221, Visits: 2,374

I believe that the size of the database mentioned by Steve is off by at lease 2 orders of magnitude. Yes, two orders of magnitude. I know this from experience from working at a CLEC (a competing telco after deregulation). We were small, maybe a few million phone numbers. Our database table for completed (billed) phone calls was on the order of 1.6 billion rows ! We only kept 3 months of this detailed information online. The remainder was archived. This was just call billing information that was fixed and one could 'optimize' the 9's. I just cannot fathom how the government could even attempt to store, and better yet analyze this volume of data. Could you imagine, tables with trillions of rows ? Oh, and lest we forget our 'blobs. This is all of the digitized (recorded) phone calls. At lease one per call record. Then what about breaking this down (parsing) the digitized conversation into words for further analysis ? Soon trillions turns into quadrillions and quintillions. I just do not see how this information could be gathered, loaded and analyzed in any reasonable fashion to where it could provide any 'proactive' value.  





Regards
Rudy Komacsar
Senior Database Administrator

"Ave Caesar! - Morituri te salutamus."
Post #280713
Posted Wednesday, May 17, 2006 9:35 AM


SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Monday, June 24, 2013 11:26 AM
Points: 208, Visits: 380

The production size issues for relational are one thing (and I agree with Rudy, I think the orders of magnitude are off), but the sizing is for use in development is another entirely different can of worms. It becomes a big problem when you can't create a completely isolated development environment when even a 10% sample data set won't fit inside a Virtual PC image.

I primarily do BI consulting and processing a OLAP cube on top of three quarters of a billion rows of development data is a PITA. Especially when you know that you're working with a 10% "sample" of real production health insurance claim data for a "small" regional health insurance carrier. It's a challenge to determine if your 10% "sample" is really representative or not...

20 billion rows is nothin'. I'll be the boys in Vegas who do all the "real" cutting edge BI work are playing with even larger numbers of rows by a couple orders of magnitude...

I'd love to know how many point of sale records Wal-Mart has in their warehouse. Extracts that I've seen in the past for just a few manufacturers' product lines at a time in selected parts of the continental US for just three months of data were in the 400M record range.

[Does somebody at the NSA think that it makes people feel better about the program to brag about the size of their database(s)? Doh!]

Post #280758
Posted Wednesday, May 17, 2006 10:11 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Tuesday, April 5, 2011 12:58 PM
Points: 230, Visits: 262

What is really amazing is that I heard that the NSA does this all over a modem line.

You know, dial up..

Post #280784
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse