Log in
::
Register
::
Not logged in
Home
Tags
Articles
Editorials
Stairways
Forums
Scripts
Videos
Blogs
QotD
Books
Ask SSC
SQL Jobs
Training
Authors
About us
Contact us
Newsletters
Write for us
Recent Posts
Recent Posts
Popular Topics
Popular Topics
Home
Search
Members
Calendar
Who's On
Home
»
SQLServerCentral.com
»
Editorials
»
Guest Editorial: That ain't a Database, it's...
33 posts, Page 1 of 4
1
2
3
4
»
»»
Guest Editorial: That ain't a Database, it's a Spreadsheet
Rate Topic
Display Mode
Topic Options
Author
Message
Phil Factor
Phil Factor
Posted Wednesday, December 17, 2008 9:55 PM
Mr or Mrs. 500
Group: General Forum Members
Last Login: Sunday, May 19, 2013 4:00 AM
Points: 533,
Visits: 2,285
Comments posted to this topic are about the item
Guest Editorial: That ain't a Database, it's a Spreadsheet
Best wishes,
Phil Factor
Simple Talk
Post #621770
Shaun McGuile
Shaun McGuile
Posted Thursday, December 18, 2008 2:18 AM
Mr or Mrs. 500
Group: General Forum Members
Last Login: Thursday, October 04, 2012 9:20 AM
Points: 583,
Visits: 1,060
Last real system I worked on, I loaded the database with 100% fill of six months worth of data using redgate data generator. (3 million rows).
Six months was the retention period requirement.
Then I quadrupled the data i.e. 2 years worth. (12 million rows)
Half an hour or less was the process time limit for the big stored proc that generated the predictions - it ran just shy of eight. ;)
Job Done.
--Shaun
Hiding under a desk from SSIS Implemenation Work
Post #621858
clementhuge
clementhuge
Posted Thursday, December 18, 2008 4:21 AM
SSC Journeyman
Group: General Forum Members
Last Login: Sunday, May 19, 2013 3:27 PM
Points: 89,
Visits: 285
This is my first reply to a topic on SQLServerCentral. I have been a dba for 10 years. Sometimes I feel that I do not have the knowledge you all have but, hey, I have been able to handle teraoctet databases with million rows, so maybe I am doing something right, no?
I definitely have a point of view on this topic. working for a telecommunication company, handling their SQL servers, I feel I might have a point.
My point is very simple, keep the retention period as low as possible on a production environment, dealing with transactional databases.
We can talk about indexing, defragmentation, vertical partitioning, you name it.... All good things to do, I accept that.
But one of the thing that will not change is:
1. Querying tables, especially with table scans, is so much faster with the fewest rows
2. Maintenance on those tables and in particular reindexing is so much faster
3. Disruption of service is almost nothing if it happens
4. Invasive heavy maintenance DB tasks are not as important
a. backup strategies are faster to implement and execute (especially compression files)
b. restore strategies are faster to implement
c. DW extraction is faster to implement and execute
d. High availability strategies (both load balancing and failover) are faster to implement
I think Dba should focus more on managing data and not databases or servers, mainly because the data is more important that the recipient of it.
Then when all data management strategies (purging, reindexing, backuping, extracting) are done, we can focus on optimizing the rest of it.
This is my experience with large databases in telecommunication companies.
All industries are different but Dba, with this approach can clearly synch business needs with a proper approach to the data.
Clement
Post #621903
Someguy
Someguy
Posted Thursday, December 18, 2008 7:03 AM
SSC-Enthusiastic
Group: General Forum Members
Last Login: Monday, May 07, 2012 10:39 AM
Points: 153,
Visits: 565
I once worked for a consulting company (national level, but not one of the biggest) that contracted to replace an older system with SQL Server. The development DBA was, shall we say, a nice guy. He and a lot of other people who were billing high consulting rates created a system that was much slower than the older system we were replacing.
I was asked to hang around after others were moved to other projects to get the animal into production.
What Phil is describing is exactly our mistake. We tested everything with teaspoons of data when the customer needed gallons. When we hit production the system fell flat on its face. When I started to look under the covers, I was shocked at how many developers were doing things like writing reports that pulled whole tables from the database and processed them on client machines instead of using concise queries. You can get away with that when the test table has 10 rows, but it's awfully clumsy when the production table has millions. What was the indexing strategy? Create one when someone complains about speed, not before. Again, you don't need a lot of indexes when you're only pulling a few rows of test data.
Yes, testing along the way with data volumes that the system is expected to handle is excellent advice.
___________________________________________________
“Politicians are like diapers. They both need changing regularly and for the same reason.”
Post #622044
blandry
blandry
Posted Thursday, December 18, 2008 7:35 AM
Old Hand
Group: General Forum Members
Last Login: Monday, May 07, 2012 9:23 AM
Points: 304,
Visits: 716
I don't think your editorial (like many I have seen along this topic line) is realistic. Sure, it would be great on Day One to "know" the size of your final production database - but even a statement such as that is highly flawed. Think about it - Is the size of a database ever fixed? No, always fluid. So what does "final production database" really mean? Is that the database on Day One, Day One Hundred, Day One Thousand?
Maybe in telecommunications work you can say "We have X number of customers who make X number of calls..." and from that you extrapolate expected size and load - but let me pose a question that in my experience, is more the rule than the exception...
Lets say you work for a large company - who cares what they do, but.... How many JPG image files are there RIGHT NOW on your entire company network? Do you know? Can you guess? Is it 100, maybe 1000, maybe 1 million, maybe 10 million - fact is, you don't know and likely cannot know easily. So how do you design and test a database to track them? How do you load up test records to "match" what that expected load is? Suppose you were counting these JPG image files - how many are actually duplicates of the same file? You cannot know that any better than you know the total number.
Another example - Imagine if you got a job tomorrow in a company with a 250 node network, and you were asked to design a SQL database to track all Word documents in the company on every server and work station in the company. How would you proceed in assessing just how many records you would be dealing with to start? How many new Word documents are generated each day? Each month, year? You cannot simply "pick a number" with any degree of certainty.
This is something our companies deal with all the time. We work in businesses and applications where you cannot simply "pick" a number and say "this is where we're going to be when we're running in production"... Its simplistic to assume most businesses are quantifiable - in fact, the majority are not and though I think good design and testing is vital - to present it as some sort of science where you just "figure" the total load you will be dealing with is just not real world thinking for the majority of businesses.
Plan for the worst, build for the best, but don't presume you can "know" anything on Day One because 99.99% of the time, you will be wrong unless you are truly fortunate to be working in a field where these numbers are truly known on Day One.
There's no such thing as dumb questions, only poorly thought-out answers...
Post #622072
Ninja's_RGR'us
Ninja's_RGR'us
Posted Thursday, December 18, 2008 7:39 AM
SSC-Insane
Group: General Forum Members
Last Login: Today @ 4:55 PM
Points: 21,357,
Visits: 9,540
Well then, what is your fully tested method of operation in those cases. Take a good guess, then go with 25 times more data then adjust as need be or something along those lines?
Post #622076
clementhuge
clementhuge
Posted Thursday, December 18, 2008 7:43 AM
SSC Journeyman
Group: General Forum Members
Last Login: Sunday, May 19, 2013 3:27 PM
Points: 89,
Visits: 285
I agree with blandry.
My answer a couple of answers above, was not an answer.
It was kind of trying to find a solution so that your production servers are sizeable.
Saying that, we can manage to develop a load test with x millions of rows that would be consider a full-size production server.
Once we are near to reach this amount when on production, an alert would tell us to set up several servers to load balance the loads.
So although theoritically, you can never know the amount of data you will deal with, you can "manage" it by spreading the loads on different servers.
By doing so, the load testing makes entire significance!
The sizing of the load test then depends on how frequently you will extract, purge, reallocate ressource.
Clement
Post #622082
Shaun McGuile
Shaun McGuile
Posted Thursday, December 18, 2008 7:51 AM
Mr or Mrs. 500
Group: General Forum Members
Last Login: Thursday, October 04, 2012 9:20 AM
Points: 583,
Visits: 1,060
In my case the 3 million rows was full population of the database for six months worth of usage.
I had specifc fixed limits in the system I was testing and actual data would be at most 2 million rows, given the nature of the application. So doing all the load testing with 4 - 6 times the allowed data volumes was a good test.
hey thats the beauty of redgate sql data generator though. ;)
Hiding under a desk from SSIS Implemenation Work
Post #622094
Ron Kunce
Ron Kunce
Posted Thursday, December 18, 2008 7:56 AM
SSC-Addicted
Group: General Forum Members
Last Login: 2 days ago @ 11:56 AM
Points: 454,
Visits: 418
I tend to agree with Phil and yet disagree. I feel initial development should start out small in a concept mode using isolated data in a strictly development only environment. However, the trick is have not just one or two levels of development and testing but at least four levels where each level of test and debugging is progressively more production-like prior to cutting a new system or maintenance changes over to production.
Granted, that implementing a series of separate server (farms) for Development, Test (1 and/or 2), Sim-Prod, and Production could be very expensive for a small business. However, the heavy production system Phil outlines, is not one of a small business, or at least is not any longer a small business category. And, the firm involved must be willing to invest in providing a more robust data environment in hardware, software and especially in data and code migration and testing policies.
Ron K.
"
Any fool can write code that a computer can understand. Good programmers write code that humans can understand.
" -- Martin Fowler
Post #622100
Anye Mercy
Anye Mercy
Posted Thursday, December 18, 2008 8:20 AM
SSC-Enthusiastic
Group: General Forum Members
Last Login: Thursday, April 18, 2013 6:51 AM
Points: 118,
Visits: 380
I think this is a very valid point, but I would suggest working against two similarly structured databases for development purposes - one with lots of data and one with a little. Reason being, when you are performance testing your database code, you want the mega-datasets. When you are testing the accuracy of your code, you probably want smaller sets that you can control the results better to make sure you are getting exactly the results you seek with your queries. Our production databases have millions of records in some of the tables and I definitely do want to make sure my work can perform well against it - but I also need to make sure that hidden in the thousands of results returned from some of my queries aren't some of the wrong records (and that some of the right ones aren't missing). For simple selects this isn't a big problem but when the query logic gets complicated, it is very easy to have a small mistake that is a big impact in terms of data integrity.
--
Anye Mercy
"Service Unavailable is not an Error" -- John, ENOM support
"You keep using that word. I do not think it means what you think it means." -- Inigo Montoya in "Princess Bride"
"Civilization exists by geologic consent, subject to change without notice." -- Will Durant
Post #622135
« Prev Topic
|
Next Topic »
33 posts, Page 1 of 4
1
2
3
4
»
»»
Permissions
You
cannot
post new topics.
You
cannot
post topic replies.
You
cannot
post new polls.
You
cannot
post replies to polls.
You
cannot
edit your own topics.
You
cannot
delete your own topics.
You
cannot
edit other topics.
You
cannot
delete other topics.
You
cannot
edit your own posts.
You
cannot
edit other posts.
You
cannot
delete your own posts.
You
cannot
delete other posts.
You
cannot
post events.
You
cannot
edit your own events.
You
cannot
edit other events.
You
cannot
delete your own events.
You
cannot
delete other events.
You
cannot
send private messages.
You
cannot
send emails.
You
may
read topics.
You
cannot
rate topics.
You
cannot
vote within polls.
You
cannot
upload attachments.
You
may
download attachments.
You
cannot
post HTML code.
You
cannot
edit HTML code.
You
cannot
post IFCode.
You
cannot
post JavaScript.
You
cannot
post EmotIcons.
You
cannot
post or upload images.
Copyright © 2002-2013 Simple Talk Publishing. All Rights Reserved.
Privacy Policy.
Terms of Use.
Report Abuse.