Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Guest Editorial: That ain't a Database, it's a Spreadsheet


Guest Editorial: That ain't a Database, it's a Spreadsheet

Author
Message
Phil Factor
Phil Factor
SSC Eights!
SSC Eights! (941 reputation)SSC Eights! (941 reputation)SSC Eights! (941 reputation)SSC Eights! (941 reputation)SSC Eights! (941 reputation)SSC Eights! (941 reputation)SSC Eights! (941 reputation)SSC Eights! (941 reputation)

Group: General Forum Members
Points: 941 Visits: 2953
Comments posted to this topic are about the item Guest Editorial: That ain't a Database, it's a Spreadsheet


Best wishes,

Phil Factor
Simple Talk
Shaun McGuile
Shaun McGuile
Mr or Mrs. 500
Mr or Mrs. 500 (597 reputation)Mr or Mrs. 500 (597 reputation)Mr or Mrs. 500 (597 reputation)Mr or Mrs. 500 (597 reputation)Mr or Mrs. 500 (597 reputation)Mr or Mrs. 500 (597 reputation)Mr or Mrs. 500 (597 reputation)Mr or Mrs. 500 (597 reputation)

Group: General Forum Members
Points: 597 Visits: 1060
Last real system I worked on, I loaded the database with 100% fill of six months worth of data using redgate data generator. (3 million rows).

Six months was the retention period requirement.

Then I quadrupled the data i.e. 2 years worth. (12 million rows)

Half an hour or less was the process time limit for the big stored proc that generated the predictions - it ran just shy of eight. Wink
Job Done.

--Shaun

Hiding under a desk from SSIS Implemenation Work Crazy
clementhuge
clementhuge
SSC-Enthusiastic
SSC-Enthusiastic (117 reputation)SSC-Enthusiastic (117 reputation)SSC-Enthusiastic (117 reputation)SSC-Enthusiastic (117 reputation)SSC-Enthusiastic (117 reputation)SSC-Enthusiastic (117 reputation)SSC-Enthusiastic (117 reputation)SSC-Enthusiastic (117 reputation)

Group: General Forum Members
Points: 117 Visits: 301
This is my first reply to a topic on SQLServerCentral. I have been a dba for 10 years. Sometimes I feel that I do not have the knowledge you all have but, hey, I have been able to handle teraoctet databases with million rows, so maybe I am doing something right, no?

I definitely have a point of view on this topic. working for a telecommunication company, handling their SQL servers, I feel I might have a point.

My point is very simple, keep the retention period as low as possible on a production environment, dealing with transactional databases.
We can talk about indexing, defragmentation, vertical partitioning, you name it.... All good things to do, I accept that.

But one of the thing that will not change is:
1. Querying tables, especially with table scans, is so much faster with the fewest rows
2. Maintenance on those tables and in particular reindexing is so much faster
3. Disruption of service is almost nothing if it happens
4. Invasive heavy maintenance DB tasks are not as important
a. backup strategies are faster to implement and execute (especially compression files)
b. restore strategies are faster to implement
c. DW extraction is faster to implement and execute
d. High availability strategies (both load balancing and failover) are faster to implement

I think Dba should focus more on managing data and not databases or servers, mainly because the data is more important that the recipient of it.

Then when all data management strategies (purging, reindexing, backuping, extracting) are done, we can focus on optimizing the rest of it.

This is my experience with large databases in telecommunication companies.

All industries are different but Dba, with this approach can clearly synch business needs with a proper approach to the data.

Clement
Someguy
Someguy
SSC-Enthusiastic
SSC-Enthusiastic (165 reputation)SSC-Enthusiastic (165 reputation)SSC-Enthusiastic (165 reputation)SSC-Enthusiastic (165 reputation)SSC-Enthusiastic (165 reputation)SSC-Enthusiastic (165 reputation)SSC-Enthusiastic (165 reputation)SSC-Enthusiastic (165 reputation)

Group: General Forum Members
Points: 165 Visits: 579
I once worked for a consulting company (national level, but not one of the biggest) that contracted to replace an older system with SQL Server. The development DBA was, shall we say, a nice guy. He and a lot of other people who were billing high consulting rates created a system that was much slower than the older system we were replacing.

I was asked to hang around after others were moved to other projects to get the animal into production.

What Phil is describing is exactly our mistake. We tested everything with teaspoons of data when the customer needed gallons. When we hit production the system fell flat on its face. When I started to look under the covers, I was shocked at how many developers were doing things like writing reports that pulled whole tables from the database and processed them on client machines instead of using concise queries. You can get away with that when the test table has 10 rows, but it's awfully clumsy when the production table has millions. What was the indexing strategy? Create one when someone complains about speed, not before. Again, you don't need a lot of indexes when you're only pulling a few rows of test data.

Yes, testing along the way with data volumes that the system is expected to handle is excellent advice.

___________________________________________________
“Politicians are like diapers. They both need changing regularly and for the same reason.”
blandry
blandry
Old Hand
Old Hand (367 reputation)Old Hand (367 reputation)Old Hand (367 reputation)Old Hand (367 reputation)Old Hand (367 reputation)Old Hand (367 reputation)Old Hand (367 reputation)Old Hand (367 reputation)

Group: General Forum Members
Points: 367 Visits: 723
I don't think your editorial (like many I have seen along this topic line) is realistic. Sure, it would be great on Day One to "know" the size of your final production database - but even a statement such as that is highly flawed. Think about it - Is the size of a database ever fixed? No, always fluid. So what does "final production database" really mean? Is that the database on Day One, Day One Hundred, Day One Thousand?

Maybe in telecommunications work you can say "We have X number of customers who make X number of calls..." and from that you extrapolate expected size and load - but let me pose a question that in my experience, is more the rule than the exception...

Lets say you work for a large company - who cares what they do, but.... How many JPG image files are there RIGHT NOW on your entire company network? Do you know? Can you guess? Is it 100, maybe 1000, maybe 1 million, maybe 10 million - fact is, you don't know and likely cannot know easily. So how do you design and test a database to track them? How do you load up test records to "match" what that expected load is? Suppose you were counting these JPG image files - how many are actually duplicates of the same file? You cannot know that any better than you know the total number.

Another example - Imagine if you got a job tomorrow in a company with a 250 node network, and you were asked to design a SQL database to track all Word documents in the company on every server and work station in the company. How would you proceed in assessing just how many records you would be dealing with to start? How many new Word documents are generated each day? Each month, year? You cannot simply "pick a number" with any degree of certainty.

This is something our companies deal with all the time. We work in businesses and applications where you cannot simply "pick" a number and say "this is where we're going to be when we're running in production"... Its simplistic to assume most businesses are quantifiable - in fact, the majority are not and though I think good design and testing is vital - to present it as some sort of science where you just "figure" the total load you will be dealing with is just not real world thinking for the majority of businesses.

Plan for the worst, build for the best, but don't presume you can "know" anything on Day One because 99.99% of the time, you will be wrong unless you are truly fortunate to be working in a field where these numbers are truly known on Day One.

There's no such thing as dumb questions, only poorly thought-out answers...
Ninja's_RGR'us
Ninja's_RGR'us
SSC-Insane
SSC-Insane (22K reputation)SSC-Insane (22K reputation)SSC-Insane (22K reputation)SSC-Insane (22K reputation)SSC-Insane (22K reputation)SSC-Insane (22K reputation)SSC-Insane (22K reputation)SSC-Insane (22K reputation)

Group: General Forum Members
Points: 22291 Visits: 9671
Well then, what is your fully tested method of operation in those cases. Take a good guess, then go with 25 times more data then adjust as need be or something along those lines?
clementhuge
clementhuge
SSC-Enthusiastic
SSC-Enthusiastic (117 reputation)SSC-Enthusiastic (117 reputation)SSC-Enthusiastic (117 reputation)SSC-Enthusiastic (117 reputation)SSC-Enthusiastic (117 reputation)SSC-Enthusiastic (117 reputation)SSC-Enthusiastic (117 reputation)SSC-Enthusiastic (117 reputation)

Group: General Forum Members
Points: 117 Visits: 301
I agree with blandry.

My answer a couple of answers above, was not an answer.
It was kind of trying to find a solution so that your production servers are sizeable.

Saying that, we can manage to develop a load test with x millions of rows that would be consider a full-size production server.

Once we are near to reach this amount when on production, an alert would tell us to set up several servers to load balance the loads.

So although theoritically, you can never know the amount of data you will deal with, you can "manage" it by spreading the loads on different servers.

By doing so, the load testing makes entire significance!

The sizing of the load test then depends on how frequently you will extract, purge, reallocate ressource.

Clement
Shaun McGuile
Shaun McGuile
Mr or Mrs. 500
Mr or Mrs. 500 (597 reputation)Mr or Mrs. 500 (597 reputation)Mr or Mrs. 500 (597 reputation)Mr or Mrs. 500 (597 reputation)Mr or Mrs. 500 (597 reputation)Mr or Mrs. 500 (597 reputation)Mr or Mrs. 500 (597 reputation)Mr or Mrs. 500 (597 reputation)

Group: General Forum Members
Points: 597 Visits: 1060
In my case the 3 million rows was full population of the database for six months worth of usage.

I had specifc fixed limits in the system I was testing and actual data would be at most 2 million rows, given the nature of the application. So doing all the load testing with 4 - 6 times the allowed data volumes was a good test.

hey thats the beauty of redgate sql data generator though. Wink

Hiding under a desk from SSIS Implemenation Work Crazy
Ron Kunce
Ron Kunce
SSC-Addicted
SSC-Addicted (482 reputation)SSC-Addicted (482 reputation)SSC-Addicted (482 reputation)SSC-Addicted (482 reputation)SSC-Addicted (482 reputation)SSC-Addicted (482 reputation)SSC-Addicted (482 reputation)SSC-Addicted (482 reputation)

Group: General Forum Members
Points: 482 Visits: 496
I tend to agree with Phil and yet disagree. I feel initial development should start out small in a concept mode using isolated data in a strictly development only environment. However, the trick is have not just one or two levels of development and testing but at least four levels where each level of test and debugging is progressively more production-like prior to cutting a new system or maintenance changes over to production.

Granted, that implementing a series of separate server (farms) for Development, Test (1 and/or 2), Sim-Prod, and Production could be very expensive for a small business. However, the heavy production system Phil outlines, is not one of a small business, or at least is not any longer a small business category. And, the firm involved must be willing to invest in providing a more robust data environment in hardware, software and especially in data and code migration and testing policies.

Ron K.

"Any fool can write code that a computer can understand. Good programmers write code that humans can understand." -- Martin Fowler
Anye Mercy
Anye Mercy
SSC-Enthusiastic
SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)

Group: General Forum Members
Points: 130 Visits: 381
I think this is a very valid point, but I would suggest working against two similarly structured databases for development purposes - one with lots of data and one with a little. Reason being, when you are performance testing your database code, you want the mega-datasets. When you are testing the accuracy of your code, you probably want smaller sets that you can control the results better to make sure you are getting exactly the results you seek with your queries. Our production databases have millions of records in some of the tables and I definitely do want to make sure my work can perform well against it - but I also need to make sure that hidden in the thousands of results returned from some of my queries aren't some of the wrong records (and that some of the right ones aren't missing). For simple selects this isn't a big problem but when the query logic gets complicated, it is very easy to have a small mistake that is a big impact in terms of data integrity.

--
Anye Mercy
"Service Unavailable is not an Error" -- John, ENOM support
"You keep using that word. I do not think it means what you think it means." -- Inigo Montoya in "Princess Bride"
"Civilization exists by geologic consent, subject to change without notice." -- Will Durant
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search