Why being careful is such an important mindset when working with SQL and databases.
At the Boston Public Schools, certain processes need to be run at particular times of the school year. There’s a couple of specific windows of time where we print choice forms for students. There’s a few occurrences where we run a process to assign students to schools. There’s a specific day [generally June 30th] where we have a significant process to archive the finished school year data and bring the next year data to the current year tables. This last one is what we call the “School Year Flip”, and I’ll likely discuss it more in an upcoming post.
All of these processes are, to a great extent, SQL-based. The eligible schools based on the student’s address, the student’s school choices and rankings, the end of year data to archive – all are stored in tables. Also, these processes [and many others] are very important – we can’t mess them up. We don’t want tens of thousands of errors on printed forms. Thankfully, we do testing to catch these types of systematic errors, so generally they haven’t happened [at least to that great of an extent]. But sometimes errors slip through and a follow-up phone call or correction letter is then needed.
One of the things I’ve found to be highly important with any of these processes is to be careful and prepared for the ‘just in case’ situation. For example:
- Take and save a snapshot of the data before starting the process and after finishing it. This might entail taking a database backup or simply copying a handful of integral tables to a data dumping ground and giving them some name like [TableName]_yyyymmdd_[ReasonToExist]. For example, StudentData_20110616_BeforePrintedSchoolForms. If table backups are needed frequently enough, I try to automate this type of backup/build it into the system as a beginning or ending step [whichever is appropriate]. The backup will come in handy if the process needs to be started again or if, at some point in the future, there’s a question about a particular student’s information at the time the process was run.
- For complicated tasks or tasks with many steps, I generally follow a document outlining the steps [especially if it is a seldom run process] just to make sure I don’t miss anything important. If something new pops up that isn’t documented, I’ll update the document for next time. The School Year Flip process is the most significant process like this, and the document for it is over 30 pages long.
- Run the process on a non-production environment first. A run-though on the staging environment, for example, can reveal issues that you’d rather know about before running updates, inserts, or deletes on the production tables.
- Anticipate counts and then double check as you’re going along. If you start with X number of rows in one table and know, based on whatever the criteria is, X-N rows should get transferred over to some other table [where, for example, N represents records that don't meet some sort of filtering condition] – take note of X and N. After running the intermediate SQL scripts in the process, confirm that there are X-N rows in the output table.
- Save intermediate changes frequently. It’s always possible to start from scratch using the backup saved [based on #1 above]. But – it’s much easier 80% of the way through a complicated process to only have to retrace a few steps and not lose hours of work. If there’s a risky part to some process, copy the table(s) as described above. Or even take a transactional database backup.
- Spot check edge cases and have multiple people review different aspects of the output. SQL validation checks and queries can only check for known issues and data integrity requirements. Sometimes developers or business users can spot things that were never anticipated or previously checked but that could cause major headaches if they went unnoticed. As new validation rules are identified, build them into some sort of validation process for the future.
- If something’s risky, look it over again. There’s something to be said for double checking – is this the right package? Am I on the right database? Is that the right table I’m about to run this truncate statement on? Is doing the double checking wasted time? The double check might seem like an unnecessary 30 seconds or 2 minutes of overhead 97% of the time. You’ll be glad you did it the other 3% of the time!
- Automate for next time – a key component of improving processes. If there’s certain parts that are manual but can be batched together and automated, do so. The more automated processes are, the less likely human error will take it’s toll in the future. But test the automated aspect thoroughly – otherwise, things could be worse off.
I’m sure this list could get much longer, and I welcome additional suggestions. But the main point is – and I’m sorry it’s cliché – but look before you leap when you’re dealing with databases and SQL.