Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 1234»»»

Guest Editorial: That ain't a Database, it's a Spreadsheet Expand / Collapse
Author
Message
Posted Wednesday, December 17, 2008 9:55 PM


Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Friday, December 12, 2014 11:01 AM
Points: 594, Visits: 2,580
Comments posted to this topic are about the item Guest Editorial: That ain't a Database, it's a Spreadsheet


Best wishes,

Phil Factor
Simple Talk
Post #621770
Posted Thursday, December 18, 2008 2:18 AM


Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Thursday, October 4, 2012 9:20 AM
Points: 583, Visits: 1,060
Last real system I worked on, I loaded the database with 100% fill of six months worth of data using redgate data generator. (3 million rows).

Six months was the retention period requirement.

Then I quadrupled the data i.e. 2 years worth. (12 million rows)

Half an hour or less was the process time limit for the big stored proc that generated the predictions - it ran just shy of eight. ;)
Job Done.

--Shaun


Hiding under a desk from SSIS Implemenation Work
Post #621858
Posted Thursday, December 18, 2008 4:21 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Sunday, August 31, 2014 10:04 AM
Points: 89, Visits: 294
This is my first reply to a topic on SQLServerCentral. I have been a dba for 10 years. Sometimes I feel that I do not have the knowledge you all have but, hey, I have been able to handle teraoctet databases with million rows, so maybe I am doing something right, no?

I definitely have a point of view on this topic. working for a telecommunication company, handling their SQL servers, I feel I might have a point.

My point is very simple, keep the retention period as low as possible on a production environment, dealing with transactional databases.
We can talk about indexing, defragmentation, vertical partitioning, you name it.... All good things to do, I accept that.

But one of the thing that will not change is:
1. Querying tables, especially with table scans, is so much faster with the fewest rows
2. Maintenance on those tables and in particular reindexing is so much faster
3. Disruption of service is almost nothing if it happens
4. Invasive heavy maintenance DB tasks are not as important
a. backup strategies are faster to implement and execute (especially compression files)
b. restore strategies are faster to implement
c. DW extraction is faster to implement and execute
d. High availability strategies (both load balancing and failover) are faster to implement

I think Dba should focus more on managing data and not databases or servers, mainly because the data is more important that the recipient of it.

Then when all data management strategies (purging, reindexing, backuping, extracting) are done, we can focus on optimizing the rest of it.

This is my experience with large databases in telecommunication companies.

All industries are different but Dba, with this approach can clearly synch business needs with a proper approach to the data.





Clement
Post #621903
Posted Thursday, December 18, 2008 7:03 AM


SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, October 17, 2014 2:10 PM
Points: 153, Visits: 570
I once worked for a consulting company (national level, but not one of the biggest) that contracted to replace an older system with SQL Server. The development DBA was, shall we say, a nice guy. He and a lot of other people who were billing high consulting rates created a system that was much slower than the older system we were replacing.

I was asked to hang around after others were moved to other projects to get the animal into production.

What Phil is describing is exactly our mistake. We tested everything with teaspoons of data when the customer needed gallons. When we hit production the system fell flat on its face. When I started to look under the covers, I was shocked at how many developers were doing things like writing reports that pulled whole tables from the database and processed them on client machines instead of using concise queries. You can get away with that when the test table has 10 rows, but it's awfully clumsy when the production table has millions. What was the indexing strategy? Create one when someone complains about speed, not before. Again, you don't need a lot of indexes when you're only pulling a few rows of test data.

Yes, testing along the way with data volumes that the system is expected to handle is excellent advice.


___________________________________________________
“Politicians are like diapers. They both need changing regularly and for the same reason.”
Post #622044
Posted Thursday, December 18, 2008 7:35 AM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Monday, May 7, 2012 9:23 AM
Points: 304, Visits: 716
I don't think your editorial (like many I have seen along this topic line) is realistic. Sure, it would be great on Day One to "know" the size of your final production database - but even a statement such as that is highly flawed. Think about it - Is the size of a database ever fixed? No, always fluid. So what does "final production database" really mean? Is that the database on Day One, Day One Hundred, Day One Thousand?

Maybe in telecommunications work you can say "We have X number of customers who make X number of calls..." and from that you extrapolate expected size and load - but let me pose a question that in my experience, is more the rule than the exception...

Lets say you work for a large company - who cares what they do, but.... How many JPG image files are there RIGHT NOW on your entire company network? Do you know? Can you guess? Is it 100, maybe 1000, maybe 1 million, maybe 10 million - fact is, you don't know and likely cannot know easily. So how do you design and test a database to track them? How do you load up test records to "match" what that expected load is? Suppose you were counting these JPG image files - how many are actually duplicates of the same file? You cannot know that any better than you know the total number.

Another example - Imagine if you got a job tomorrow in a company with a 250 node network, and you were asked to design a SQL database to track all Word documents in the company on every server and work station in the company. How would you proceed in assessing just how many records you would be dealing with to start? How many new Word documents are generated each day? Each month, year? You cannot simply "pick a number" with any degree of certainty.

This is something our companies deal with all the time. We work in businesses and applications where you cannot simply "pick" a number and say "this is where we're going to be when we're running in production"... Its simplistic to assume most businesses are quantifiable - in fact, the majority are not and though I think good design and testing is vital - to present it as some sort of science where you just "figure" the total load you will be dealing with is just not real world thinking for the majority of businesses.

Plan for the worst, build for the best, but don't presume you can "know" anything on Day One because 99.99% of the time, you will be wrong unless you are truly fortunate to be working in a field where these numbers are truly known on Day One.


There's no such thing as dumb questions, only poorly thought-out answers...
Post #622072
Posted Thursday, December 18, 2008 7:39 AM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Thursday, November 27, 2014 4:23 AM
Points: 20,584, Visits: 9,624
Well then, what is your fully tested method of operation in those cases. Take a good guess, then go with 25 times more data then adjust as need be or something along those lines?
Post #622076
Posted Thursday, December 18, 2008 7:43 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Sunday, August 31, 2014 10:04 AM
Points: 89, Visits: 294
I agree with blandry.

My answer a couple of answers above, was not an answer.
It was kind of trying to find a solution so that your production servers are sizeable.

Saying that, we can manage to develop a load test with x millions of rows that would be consider a full-size production server.

Once we are near to reach this amount when on production, an alert would tell us to set up several servers to load balance the loads.

So although theoritically, you can never know the amount of data you will deal with, you can "manage" it by spreading the loads on different servers.

By doing so, the load testing makes entire significance!

The sizing of the load test then depends on how frequently you will extract, purge, reallocate ressource.


Clement
Post #622082
Posted Thursday, December 18, 2008 7:51 AM


Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Thursday, October 4, 2012 9:20 AM
Points: 583, Visits: 1,060
In my case the 3 million rows was full population of the database for six months worth of usage.

I had specifc fixed limits in the system I was testing and actual data would be at most 2 million rows, given the nature of the application. So doing all the load testing with 4 - 6 times the allowed data volumes was a good test.

hey thats the beauty of redgate sql data generator though. ;)


Hiding under a desk from SSIS Implemenation Work
Post #622094
Posted Thursday, December 18, 2008 7:56 AM


SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Thursday, July 10, 2014 1:33 PM
Points: 457, Visits: 476
I tend to agree with Phil and yet disagree. I feel initial development should start out small in a concept mode using isolated data in a strictly development only environment. However, the trick is have not just one or two levels of development and testing but at least four levels where each level of test and debugging is progressively more production-like prior to cutting a new system or maintenance changes over to production.

Granted, that implementing a series of separate server (farms) for Development, Test (1 and/or 2), Sim-Prod, and Production could be very expensive for a small business. However, the heavy production system Phil outlines, is not one of a small business, or at least is not any longer a small business category. And, the firm involved must be willing to invest in providing a more robust data environment in hardware, software and especially in data and code migration and testing policies.


Ron K.

"Any fool can write code that a computer can understand. Good programmers write code that humans can understand." -- Martin Fowler
Post #622100
Posted Thursday, December 18, 2008 8:20 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Thursday, December 19, 2013 7:41 AM
Points: 118, Visits: 381
I think this is a very valid point, but I would suggest working against two similarly structured databases for development purposes - one with lots of data and one with a little. Reason being, when you are performance testing your database code, you want the mega-datasets. When you are testing the accuracy of your code, you probably want smaller sets that you can control the results better to make sure you are getting exactly the results you seek with your queries. Our production databases have millions of records in some of the tables and I definitely do want to make sure my work can perform well against it - but I also need to make sure that hidden in the thousands of results returned from some of my queries aren't some of the wrong records (and that some of the right ones aren't missing). For simple selects this isn't a big problem but when the query logic gets complicated, it is very easy to have a small mistake that is a big impact in terms of data integrity.

--
Anye Mercy
"Service Unavailable is not an Error" -- John, ENOM support
"You keep using that word. I do not think it means what you think it means." -- Inigo Montoya in "Princess Bride"
"Civilization exists by geologic consent, subject to change without notice." -- Will Durant
Post #622135
« Prev Topic | Next Topic »

Add to briefcase 1234»»»

Permissions Expand / Collapse