SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Quick design question for database used for statistical/analysis purposes


Quick design question for database used for statistical/analysis purposes

Author
Message
mrtoonces
mrtoonces
Grasshopper
Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)

Group: General Forum Members
Points: 24 Visits: 24
I have a quick SQL Server design question on an usual (at least for me!) database situation:

Situation
: We are creating a SQL Server 2012 database to store data that is exported daily from an old mainframe system. That system is operated by another firm; we have no control over it. It collects day-to-day transactions (retail sales records). Each day, admins on that system dump out that day's activity and FTP it to us. The database we're charged with creating must store the last 30 days of data --- e.g., On Dec. 31, we'll import that day's data and delete records in our system dated Dec. 1 or earlier. It's like a perpetually changing 30-day snapshot of the mainframe's data. We'll use this data our end for statistical/analytic purposes. It will never be edited by us. The data is pretty darn big; a month's worth of activity can be about ~50 gigabytes (and space on our SQL Server boxes is unfortunately very limited right now).

My question: In creating the initial .MDF file, I'm considering a size of ~75GB (seems like this would more than cover the typical size of 30 days of data). But I have no clue as to the appropriate size for the .LDF (log file). Since we're short on server space, and since this is just a dump of data that will never be edited, should I set the initial size of the log to something very tiny (with no autogrowth)?

I hope this is not a dumb question but I'm accustomed to creating databases in which edits are made constantly by users (and thus the log is important). This one is just a revolving 30-day warehouse used only for a ton of read-only queries to explore the data. Would the log be of any real importance in this scenario? If you have any insights or suggestions as to the appropriate sizes of the .LDF (and .MDF) files, or have any other insights on the design of databases such as this, I would greatly appreciate your help! Thanks so much and best wishes for a happy holiday season!

Joe Torre
Joe Torre
SSCertifiable
SSCertifiable (7.1K reputation)SSCertifiable (7.1K reputation)SSCertifiable (7.1K reputation)SSCertifiable (7.1K reputation)SSCertifiable (7.1K reputation)SSCertifiable (7.1K reputation)SSCertifiable (7.1K reputation)SSCertifiable (7.1K reputation)

Group: General Forum Members
Points: 7118 Visits: 1347
The issue with all at once large data loads is log file growth. the best way to deal with that is to chunk out the load into small sets with a commit after each. SSIS does this for you.
Lynn Pettis
Lynn Pettis
SSC Guru
SSC Guru (385K reputation)SSC Guru (385K reputation)SSC Guru (385K reputation)SSC Guru (385K reputation)SSC Guru (385K reputation)SSC Guru (385K reputation)SSC Guru (385K reputation)SSC Guru (385K reputation)

Group: General Forum Members
Points: 385839 Visits: 42551

Two ways to accomplish this and it depends if you have Enterprise Edition or not. If not, use partitioned views. Each day is in its own table and you rebuild the view after each load (or use synonyms for the 30 days of tables and switch the synonym daily). If you have EE you could use a partitioned table. While writing this I starting thinking the partitioned view would be easier.



Cool
Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
Grant Fritchey
Grant Fritchey
SSC Guru
SSC Guru (350K reputation)SSC Guru (350K reputation)SSC Guru (350K reputation)SSC Guru (350K reputation)SSC Guru (350K reputation)SSC Guru (350K reputation)SSC Guru (350K reputation)SSC Guru (350K reputation)

Group: General Forum Members
Points: 350880 Visits: 34261
Combine Joe's answer with Lynn's answer and you're most of the way there.

To get the precise size of the log you're going to have to do some testing once you've built out your load process. Measure that, add a little head room for exceptions, and then you should be good to go. It will be much smaller than normal because you'll only ever need it for the load processing based on what you've told us.

----------------------------------------------------
The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood...
Theodore Roosevelt

The Scary DBA
Author of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd Edition
Product Evangelist for Red Gate Software
Lynn Pettis
Lynn Pettis
SSC Guru
SSC Guru (385K reputation)SSC Guru (385K reputation)SSC Guru (385K reputation)SSC Guru (385K reputation)SSC Guru (385K reputation)SSC Guru (385K reputation)SSC Guru (385K reputation)SSC Guru (385K reputation)

Group: General Forum Members
Points: 385839 Visits: 42551
If you use a single table for each day the tables will be smaller. You would only need 31 tables for the data as 30 of the tables are used in the partitioned view and 31st table can then be truncated and used for the next days data load.
Do a Bing or Google search on Partitioned Views.

Cool
Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
mrtoonces
mrtoonces
Grasshopper
Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)

Group: General Forum Members
Points: 24 Visits: 24
Thanks Joe, Lynn & Grant! Your insights are all very helpful!

We do not have Enterprise Edition. A local partitioned views approach may be appropriate; we had not really considered this approach. Very interesting!

Up till now, we'd been thinking of tossing everything into one big table, such that each day we'd add new records containing last night's activity, and then run a DELETE query to nuke records that had been imported more than 30 days ago.

Given the size of the stuff to be imported, this is a lot of data churning, and thus the log could grow large quickly. That's why we were wondering about what size we should set it to in the initial creation of the database (and what autogrowth, if any, we should allow). Once imported, there will be zero editing of the data; we'll just be running a bunch of SELECT queries on it. Perhaps my question should have been more along the lines of, "what log size would generally make sense in this sort of data-warehouse situation? "

Given that we'll be checking daily to make sure last night's import went okay, should the log be just large enough to serve as a record of what occurred during last night's import routine? Or maybe we should hold a few days worth of activity, to account for weekends when staff might not be able to verify that last night's import ran okay?

I apologize if my questions seem confused or uniformed. Our staff is spread across a dozen different projects with (of course!) different technologies and platforms and none of us is a SQL Server expert. Your insights really are valuable, thanks again!

Thanks again!
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum








































































































































































SQLServerCentral


Search