One of the things that I've asked DBAs, and I see asked often, is how much does your data change? That affects the transaction log sizes (and backups), the load on your server, possibly the tuning and specs you want to implement, etc. But most DBAs don't have any clue what the change rate on their data is. Most probably can't even tell you quickly what their data growth rate is.
The good ones can, and many of them either track it, or they do what I used to do: they monitor backup sizes and calculate an acceleration and velocity that assists them in proactively managing servers.
That accounts for growth, but what about changes? Very few DBAs know this, and it seems like it's an advanced topic. I've never had a great, homegrown way to track this, and haven't worried about it. Most of my systems could have weekly downtime, and I'd rebuild indexes, and statistics, during that time, so it wasn't an issue.
The other day I was reading Kim Tripp's blog because, well, it's a good idea. She doesn't post a lot, but when she does, it's good stuff to know. In this post she talked about filtered indexes and statistics going stale quicker than expected. I'll let you read it if you're interested (you should be), but the interesting thing to me was her mentioning a way to know how many rows have changed.
In SQL Server 2000, there's a value in sysindexes.rcmodctr that keeps track of how many rows have changed. This is used to note when to refresh statistics (it's when 20% of the rows have changed).
In SQL Server 2005 and 2008, this is tracker per column, which may or may not be better. Read Kim's explanation of that. However the values are in sysrowsetcolumns.rcmodified for SS2K5 and sysrscols.rcmodified for SS2K8.
That's a handy piece of information for me to know. I can now easily track the number of rows that change daily, and then get some averages that might clue me in to the activity of the database. It's not that this average means anything by itself, but if I know it, and then compare the last day's change against the average, I'll have some idea of the load on my server. Might help me diagnose issues, or let me know if the server if coming under more load.
Do you know what determinism is? It's something that comes up periodically in Books Online as various SQL objects require deterministic functions. Things like indexes views.
I ran across a question on this for our ASK site, and thought it was interesting. I started to reply, then had to double check myself and make sure by looking things up in Books Online. I was right, though I'm not sure I'd have been able to actually classify all functions correctly.
For example, RAND can be either deterministic or non-deterministic, depending on how it's used. If you have a seed value, it's deterministic. CAST and CONVERT are usually deterministic but it depends on the data type. There are some good exceptions spelled out in this BOL entry: Deterministic and Nondeterministic Functions
I knew that deterministic meant you could determine the output given the input, and that these are "predictable" functions. However I didn't realize that somethings, like GetDATE() wouldn't fit. It always returns the current time/date, but it's not based on input, or the database, so it doesn't count. It's a non-deterministic function.
Non-deterministic functions like the @@ ones (@@Connections, @@idle, etc.) are listed, and I'm not sure I'd have thought of them off the top of my head, but it was good to read them in the article. Next time I need to I should remember, or at least be able to easily scan code and decide if it's deterministic or not.
I saw a post recently where someone was asking how to separate out all indexes from the data into a separate NDF file. This was the same post that I wrote about recently with the thread myth.
The poster seemed confused on a couple points. The first was that he or she thought that they could separate out the clustered and non-clustered indexes from the data. That’s alone makes me think that this user is not advanced enough to work with multiple filegroups if they don’t understand the table v clustered index structure.
The second thing was thinking a separate file improves performance if the indexes are moved. This can improve performance, but a couple things need to take place.
The user admitted this was something they heard, and would likely not separate things out. I think that’s a good move and should save them some headaches.
Someone was asking about using multiple data files recently to try and increase performance. I had answered that unless you had separate physical disks that it wouldn’t matter.
However then I remembered hearing something about threads and files for I/O. I tjhought this was a myth, but I wasn’t sure. I searched around, and then pinged Gail Shaw since I know she does a lot of internals type investigation.
She confirmed this is a myth and sent me this reference: SQL Server Urban Legends Discussed. It’s from the Microsoft Customer Service engineers and discusses the origin of the myth and how things work.
The bottom line is that SQL Server uses a thread for each unique disk drive, not files. In SQL 2000, you can fool the system if the disk drives are volumes on the same physical disk. However in SQL 2005 the system checks to see if these are the same physical disk.
There are some caveats, but if you hear this rumor, dispel it. Let people know that multiple file groups (or files) only help if you have different physical I/O paths and drives.
I posted a note awhile back that it might be nice to be able to mark an object as deprecated. Then have the system remove it after some time, to allow us to prune out old data, objects, etc that clutter the system and make upgrades more difficult.
Someone suggested I create a Connect item, so I did. I submitted Allow User objects to be deprecated and went on my way. I posted a note and got a few votes.
The other day I got a note back from the system that someone had responded at MS:
This is indeed an interesting idea, and it dovetails nicely with some thoughts we've been having around managing the surface area of database apps. No promises for this release, but it is definitely something we will consider.
It's always nice to get some feedback and feel that someone is looking at our suggestions. I do hope this isn't a standard message, but from what I've seen on the few dozen items I've voted on or submitted, I get a personal message from an individual.
create table MyTable (id int)goCreate Table #Test (id int)gocreate proc MyProc @id intas insert MyTable select @id insert #test select @idreturngoexec MyProc 2goselect * from #testgodrop table MyTabledrop procedure MyProc
I saw a question recently about temp tables and when I was answering it, I learned something. With this code:
create table MyTable (ID int)gocreate proc MyProc @id intas create table #test (id int) insert MyTable select @id insert #test select @idreturngoexec MyProc 2goselect * from #testgodrop table MyTabledrop procedure MyProc
Do you expect the SELECT to return anything? Will the temp table be in scope? It's an interesting question, and I would hope most of you would know the answer if you're coding in T-SQL. What about this code?
Does the SELECT return data? The answer to the first is "no" and the second "yes". The reason is scope. While a local temp table (single #) is visible to your entire session, which means multiple statements, it isn't always in scope.
I had to think back to basic computer science and stacks as to how this works. In the first set of code, the variables and temp table that are used in the procedure are created and put on the stack of data while the procedure is being executed. That means that as long as the procedure is being executed, those items can be accessed.
Once the procedure ends, they are popped of the stack and discarded. So that the "select * from #test" returns an error.
In the second set of code, the temp table is put on the stack when it's created. When the procedure is executed, it gets more values on the stack for it's variable, but the temp table is still on the stack. When the procedure ends, @id is lost, but the temp table remains.
The values aren't really placed on the memory stack locations. Rather their references are, with the temp table living in tempdb (or memory) and the variable in memory as well. The stack used for tracking scope is lots of complicated, I'm sure, than what we dealt with in basic Pascal and C back in high school or college, but the theory remains the same.
You have to know your scoping rules when programming, or you might easily get into trouble and spend a lot of time debugging things that should be simple to understand.
It seems that often I see posts that say “I’m out of space on my drive, what do I do?” The answer, as with most anything in SQL Server is, it depends.
The most common issue I see for running out of space is the lack of deleting database backup files. The maintenance plan wizard instructs you to perform backups, and defaults to making a separate file for each backup, but it doesn’t necessarily clue in the “Accidental DBA” to remove old backups.
You need a maintenance cleanup task here and set it to remove old backups, but be sure that you have a new, good backup first. In other words, the maintenance cleanup comes after the full backup, and then only when that task completes with success.
You can delete old backup files manually, but think about how many you need to keep. Most people probably can get by with 1 or 2.
I know lots of people that have IIS on the same box with SQL Server for small to medium web sites. Unfortunately IIS doesn’t have a maintenance cleanup task. It creates new log files on a daily or weekly basis, and those will fill up your disk when you are least expecting it.
These are in the LogFiles folder under System32 on your system drive. Set a reminder in Outlook to trim these down occasionally.
Another very common issue is that transaction log growing to fill all disk space. This happens because so many new DBAs assume that a full backup also includes a log backup. It doesn’t, for multiple reasons, and it doesn’t matter if it should or no; it doesn’t.
What you need to do here is first make sure you have time, and then clear the log. That’s a log backup with truncate in versions 2000 and prior. It’s setting to Simple mode in 2005 and above and then . Be sure you set the mode back to Full.
Once you have space back, I’d recommend you take a full backup. You can get by with a diff, or maybe wait until your next scheduled one,but I’m conservative. I’d recommend a full backup right then. Once that’s done, you need to get a good size for the transaction log. Unfortunately that requires some trial and error.
The size of your log file depends on the load your database experiences in terms of changes, but also on your log backup schedule. A log backup allows the space in the log to be resused. If you produce 10MB of changes an hour, and back up once an hour, you need a log file that’s 10MB. Actually you’d want it larger to account for a larger load some hours, but you get the idea. If you backup the log once a day, you’d need a 240MB log ( plus pad).
So, what I’d recommend is that you schedule hourly log backups. See how large the backup files are and then size your log file appropriately. As far as the initial size, you can run a DBCC SHRINKFILE to make your log file smaller. I’d lower it to 1GB to start with. That’s just a guess, and you might want to set it to 10% of your data files, but disk space cheap, and 1GB will accommodate lots of databases.
There are other reasons, like your data file growing, but I’m not trying to cover every situation. If you’re new to SQL Server, and you have a full disk, check these items first. If none of these help you, post a note in our forums and someone will answer.
This is a great write-up that shows what is important in a phone. It's how you interact with it, not the features, not the specs, it's how you use it.
I tend to agree with that. With both the G1 and the iPhone, the interaction with the device for the most part, has not been on the phone. It's been twitter, web, reading, music, and the smoothness of those two devices is unbeatable.
I hope that more phone manufacturers realize this and they start to build up their networks of apps. The app store with Apple, and integration with a PC to allow me to manage things either on the PC or the phone, it's unbeatable. If I could have read books from Barnes and Noble or Amazon on my G1, I might not have moved to an iPhone, but I couldn't and that mattered.
I'm not sure how big a deal it is for everyone else, but I see more and more people doing things on their phones that aren't involving making calls.
People building SQL Server apps, especially BI apps ought to consider these markets and look to build small applications that ensure a smooth interaction with customers. As much as I like browser apps onthe desktop, they don't work on a cell phone, even one with as large a screen as the iPhone of G1.
I was reading the most recent issue of TechNet from Microsoft one morning and flipped through the column on SQL Server. This is one of the few paper magazines I get, and for some reason I see things in there that interest me.
It’s not the topics, but often the delivery and format. I flip through it at my desk, or while cooking, and I’ll see things that I’ve missed on the Internet, or get reminded of things I’ve forgotten. The October issue contained one of those.
Paul Randal writes the SQL Q&A column and he usually has something in there that’s worth knowing. This month he answered a question about some strange messages in the SQL Server error log. Here’s the message from my test instance:
In case it’s hard to read, the message is:
CHECKDB for database 'db1' finished without errors on 2009-09-21 08:33:03.713 (local time). This is an informational message only; no user action is required.
This was on startup of the instance, right after the “starting up database” messages that you see. I have this same message for a couple other databases (db2 and master). It didn’t appear for model, msdb, tempdb, and a couple other databases I have on this instance.
Why not? Or rather, why did it appear for these databases? Is DBCC CHECKDB running on startup?
The answer is no, and Paul has a good explanation in his column. It’s not online yet, but I’ve read this before and Paul gave me a great reminder. The message is the “last known good” time for the database. It shows the last time that DBCC completed successfully, and gives you a reference point that you can use an a DR or corruption scenario to try and narrow down when things went bad. Note this is SQL Server 2005 and later.
It’s also a good reminder that you haven’t run dbcc in some time on a DB. When I first read the column, I checked my error log and didn’t see the message. That was a stark reminder that I didn’t have maintenance set up on this instance. It’s a test instance, only been installed for about 20 days, but still. I ran dbcc on those databases that showed messages (db1, db2, master) only as a test. I’ve now set up maintenance on the instance to grab a backup every night and run weekly maintenance.
If I had production instances, I’d love to be able to easily capture this data from a server, or even compare the messages to the databases on the server and then report back to me if CHECKDB hadn’t been run, or if it had been longer than xxx days.
Sankar Reddy, a fellow MVP, wrote a script that helps here, and it’s worth checking out his blog on the last clean DBCC value.
The more I think about this, the more I think it should be built into SQL Server as a default. Why not ask for the information to create an operator and email at install and then create a few nice default alerts, like one for which log is full.
It's such a common problem, why not get people to enable DBMail and enter their email address during installation? For the people that really need to lock down their servers, they could skip it, but for the average person, it might save some headaches if they knew when their log was 80% full.
If you want the general steps for creating an alert, it's on MSDN.
This would probably be a good editorial, but I thought I'd drop it out here as a short note.
You ought to be tracking the build (or version) of all your SQL Server instances and Windows. I'd suggest you automate this, maybe even drop it in a table in every (user) database every day.
With the constant release of patches, the changes you might make to a server, it's easy to lose track of what build a server is at. You might think all your servers are at SP1 or SP2, but you never know when an application might not support the latest version and you're behind.
The more time that passes between the application or deployment of patches, and a disaster, the less likely you'll be sure of what patch level you're on. And if you have dozens of instances, you won't be sure.
In a disaster, the last thing you want to do is have doubt about what you're restoring. Track the builds and you'll be fine. And bookmark my build list pages (2005 and 2008)
I saw a post recently that asked about backing up the Resource database. The person said they just realized that it wasn't included in their backup scheme and was worried.\
Don't worry, you don't need to back it up.
The page at MSDN describes the Resource database as essentially a large lookup table(s) for the SQL Server itself. It can find the system objects that are used: tables, views, DMVs, DMFs, etc. When the server is upgraded, a new copy of the Resource database is copied over. The objects don't have to be built from scripts with DROPs and CREATEs.
This means, however, that each time you patch your server, you potentially could be copying over a new version here if some system object changes. That means that you don't really want to back up this database.
If you are restoring to a new server, when you install that server, it will have the correct copy of the Resource database for your system. If you were to copy over an incorrect version, the system objects might not match up with the system code in the SQL Server executables and DLLs. That would be bad.
I haven't seen notes about the Resource database becoming corrupted, but it could. In that case, I'd suggest you copy the files from another SQL Server of the same version, or install SQL to another machine, patch it up, and then copy the files to your damaged installation. That could be a problem, but it does mean one other thing.
You ought to be tracking your SQL Server version and build on a regular basis as part of your monitoring.
We've had a few articles on how you move databases, but it seems that there are always new people using SQL Server and they end up posting a question because they aren't sure how to do something.
The other day I ran across someone trying to copy their database to a new server. They'd tried to "Copy Database Wizard" and the Copy Objects" in DTS (SQL 2000), but weren't getting default. Likely that's a bug in those processes, which I believe are the same code.
I mentioned detaching and attaching your database, which the poster wasn't aware of. Surprised me, but I'm sure many people don't know how. I decide (another) blog on the topic couldn't hurt with some simple instructions.
Now you have your data moved, all your code, defaults, rules, contraints, etc. Now logins need to be handled. There are a couple ways to do this.
1. If you care about passwords (keeping them the same), use sp_help_revlogin to script out the logins from the old server.
2. run sp_change_users_login to sync up the logins from #1 with your users, or to just create new logins on the new server if you didn't want to mess with the scripting.
Recently someone was asking how they could execute some ALTER DATABASE commands without requiring sysadmin permissions. This person's company didn't allow sysadmin to be granted to non-DBAs, and they had to recover a series of applications and wanted to do that with some type of application setup instead of scripting for the DBAs.
CAUTION: Before you use this technique, really investigate whether or not you need to actually do this or if there is another way. This can be a big security hole.
My suggestion is that you place the code in a table, or script a series of inserts into a table. This can be a simple table with an ID, a description, and the code. I'd use something like this:
create table ExecutableCode( codeid int identity(1,1) , codedescription varchar(1000) , codeitself varchar(max))
I use an ID just for reference, and then I'd include a description, such as
insert ExecutableCode select 'This code creates a database', 'create database Mydb'
That creates a single line of code that will use the defaults for building a database. Obviously you can write more complicated code if need be.
The next step is to create an execution table
create table ExecuteLog( Logid int identity(1,1) , codeit int , executestart datetime , executeend datetime , executionneeded tinyint)
The purpose of this table is to both log the execution and flag it. If I wanted to execute the inserted code, I'd need the ID, which I'll assume is 1. In that case I'd insert
insert ExecuteLog select 1, null, null, 1
I use Nulls for the dates since I'm not actually executing the job. I'm telling someone else to execute it. That someone else is SQL Agent. Now in order for this to work, the Agent that is executing the job, or a proxy, needs to have the sysadmin permissions (or whatever elevated permissions are needed). What I do is create a proc that looks for a job to execute and executes it. In that job I want to flag the start and end of the job for logging, so the psuedocode is:
-- Get codeID to execute from ExecuteLog-- update ExecuteLog for that line, update start time-- execute code from ExecutableCode using the codeId to retrieve the statement-- update ExecuteLog for that line, update end time, mark execution needed as 0
A few caveats here. First, be sure that you don't allow anyone to just put code in the ExecutableCode table. That could be a big security risk. Second, I'm not sure this makes sense for one time events, like DR. To me the DBA can just run this since they'll need to script the creation in any case. Third, you may still be limited, depending on how your SQL is structured. The EXEC() command has limited permissions, and while you can use XP_CMDSHELL or SQLCMD in Agent, that could be more of a security hole if you don't set it up properly
That's it. It's essentially what the Windows scheduler does, and in this case it allows you to work with SQL code.
The new HierarchyID datatype in SQL Server 2008 has captured my interest lately. I’ve been working on a presentation that I can give to some local user groups, and hope to have it done soon. As with other data types, an index can be built on this column to help speed up the performance of queries.
An interesting thing that I learned while working with this data type is that there are two different ways of indexing the data in this column. You can do a depth first index, or a breadth first index.
A hierarchy is represented as a tree of values. An example of this might be the following:
If you examine the hierarchy, you can see that the root is node A, it has a children as nodes B and C, and there are further children. The diagram makes it easy to see which nodes are above or below other nodes in the hierarchy. The depth first indexing assumes that you are querying for subtrees and so it groups those nodes together in the index. In this case the index would store the nodes as:
A, B, E, F, C, D, G
In other words, this would be a traversal of the hierarchy that looks like this, essentially a left-child first traversal.
This is useful if you often query for subtree and its child nodes.
In this case, all nodes at a particular level are stored together in the index. For our example above, the traversal would look like:
And that would result in the nodes being stored like this:
A, B, C, D, E, F, G
The first level is the root, node A. The next level contains the children of the first level, which are B, C, D. Finally the bottom level of E, F, and G is indexed. This is useful when you are querying for all nodes at some particular level.
Final Notes
I’ll show how to create the different indexes and some actual SQL code in another post, however one thing to note here is that there is no guarantee of uniqueness with the HierarchyID. Similar to an identity property, if you want to be sure things are unique, you need a unique index.