SQL Clone
SQLServerCentral is supported by Redgate
Log in  ::  Register  ::  Not logged in

Being Cautious With SQL Based Processes

Why being careful is such an important mindset when working with SQL and databases.

At the Boston Public Schools, certain processes need to be run at particular times of the school year. There’s a couple of specific windows of time where we print choice forms for students. There’s a few occurrences where we run a process to assign students to schools. There’s a specific day [generally June 30th] where we have a significant process to archive the finished school year data and bring the next year data to the current year tables. This last one is what we call the “School Year Flip”, and I’ll likely discuss it more in an upcoming post.

All of these processes are, to a great extent, SQL-based. The eligible schools based on the student’s address, the student’s school choices and rankings, the end of year data to archive – all are stored in tables. Also, these processes [and many others] are very important – we can’t mess them up. We don’t want tens of thousands of errors on printed forms. Thankfully, we do testing to catch these types of systematic errors, so generally they haven’t happened [at least to that great of an extent]. But sometimes errors slip through and a follow-up phone call or correction letter is then needed.

One of the things I’ve found to be highly important with any of these processes is to be careful and prepared for the ‘just in case’ situation. For example:

  1. Take and save a snapshot of the data before starting the process and after finishing it. This might entail taking a database backup or simply copying a handful of integral tables to a data dumping ground and giving them some name like [TableName]_yyyymmdd_[ReasonToExist]. For example, StudentData_20110616_BeforePrintedSchoolForms. If table backups are needed frequently enough, I try to automate this type of backup/build it into the system as a beginning or ending step [whichever is appropriate]. The backup will come in handy if the process needs to be started again or if, at some point in the future, there’s a question about a particular student’s information at the time the process was run.
  2. For complicated tasks or tasks with many steps, I generally follow a document outlining the steps [especially if it is a seldom run process] just to make sure I don’t miss anything important. If something new pops up that isn’t documented, I’ll update the document for next time. The School Year Flip process is the most significant process like this, and the document for it is over 30 pages long.
  3. Run the process on a non-production environment first. A run-though on the staging environment, for example, can reveal issues that you’d rather know about before running updates, inserts, or deletes on the production tables.
  4. Anticipate counts and then double check as you’re going along. If you start with X number of rows in one table and know, based on whatever the criteria is, X-N rows should get transferred over to some other table [where, for example, N represents records that don't meet some sort of filtering condition] – take note of X and N. After running the intermediate SQL scripts in the process, confirm that there are X-N rows in the output table.
  5. Save intermediate changes frequently. It’s always possible to start from scratch using the backup saved [based on #1 above]. But – it’s much easier 80% of the way through a complicated process to only have to retrace a few steps and not lose hours of work. If there’s a risky part to some process, copy the table(s) as described above. Or even take a transactional database backup.
  6. Spot check edge cases and have multiple people review different aspects of the output. SQL validation checks and queries can only check for known issues and data integrity requirements. Sometimes developers or business users can spot things that were never anticipated or previously checked but that could cause major headaches if they went unnoticed. As new validation rules are identified, build them into some sort of validation process for the future.
  7. If something’s risky, look it over again. There’s something to be said for double checking – is this the right package? Am I on the right database? Is that the right table I’m about to run this truncate statement on? Is doing the double checking wasted time? The double check might seem like an unnecessary 30 seconds or 2 minutes of overhead 97% of the time. You’ll be glad you did it the other 3% of the time!
  8. Automate for next time – a key component of improving processes. If there’s certain parts that are manual but can be batched together and automated, do so. The more automated processes are, the less likely human error will take it’s toll in the future. But test the automated aspect thoroughly – otherwise, things could be worse off.

I’m sure this list could get much longer, and I welcome additional suggestions. But the main point is – and I’m sorry it’s cliché – but look before you leap when you’re dealing with databases and SQL.


Posted by steve.casey on 20 June 2011

Pardon the grammatical correction, but...

re point 4: one criterion, two or more criteria

Posted by brosspremier on 20 June 2011

I considering having these 8 items done in calligraphy, framed and posted on a wall in my office - above my own monitor!

Posted by kaplanm on 20 June 2011

There is a third option to handle the pre-archival snapshots if you're working with 2k8 Enterprise. SQL Database snapshots (while firmly not a recovery solution) can be utilized to handle this without the need to create new tables etc. Then when you're done if you really need to roll back you can revert to your snapshot.

There are some additional repercussions in reverting to a snapshot, and it's hard to recommend against creating a backup if you have the drive space, but it is another available option.

Posted by Heather on 20 June 2011

Excellent thoughts.  All DBA's should generate their own set of golden rules.

Posted by Alex-668179 on 21 June 2011

Although it may be difficult to achieve in your situation, I do get the impression that a lot of these issues could be avoided, or at least easier to deal with, with a better database design. Consider, for example, making "current" an attribute rather than being defined by the table a record happens to be in. In fact, "current" should be a single value in a configuration table that refers to a lookup table of all schoolyears (or whatever it is) that ever were or are "current". Other tables then reference that lookup table through a reference that typically doesn't have to change.

Leave a Comment

Please register or log in to leave a comment.