Database Code Analysis

Database code analysis will reduce the number of 'code smells' that creep into your database builds. It will alert the team to mistakes or omissions, such as missing indexes, that are likely to cause performance problems in production. It will allow the Governance and Operations team visibility into production readiness of the code, warning them of security loopholes and vulnerabilities. William Brewer describes the two technical approaches to database code analysis, static and dynamic, and suggests some tools that can help you get started.

To do Database Lifecycle Management effectively, you will need to automate as many of the routine database tasks as possible, simply because any move toward faster development cycles, delivering changes more frequently, means that a lot of simple checks, tests, and analyses need to be done more frequently.

Once your source code is in version control and you can produce clean builds from a standing start, you can look around for other processes to automate. Database code analysis is an obvious choice because you get an immediate benefit, and it is relatively easy to automate.

This article explains briefly the benefits of database code analysis, both static and dynamic, the tools available to perform and automate this analysis.

What are the uses of database code analysis?

In general, code analysis is not just a help to the individual developer but can be useful to the entire team. This is because it makes the state and purpose of the code more visible, so that it allows everyone who is responsible for delivery to get a better idea of progress and can alert them much earlier to potential tasks and issues further down the line. It also makes everyone more aware of whatever coding standards are agreed, and what operational, security and compliance constraints there are.

Database Code analysis is a slightly more complicated topic than static code analysis as used in Agile application development. It is more complicated because you have the extra choice of dynamic code analysis to supplement static code analysis, but also because databases have several different types of code that have different conventions and considerations. There is DML (Data Manipulation Language), DDL (Data Definition Language), DCL (Data Control Language) and TCL (Transaction Control Language).  They each require rather different analysis.

As regards the purpose of  database code analysis, this is probably best understood by looking at the requirements of the Governance, Operations and Delivery teams, because they will all need database code analysis, but for rather different purposes.

Delivery

Even in development work, the delivery team will have at least three different objectives for code analysis. It is a solution to several problems.

  • Developers need to be able to check their own code within their own IDE to ensure that it is fit to be merged into a development branch.
    • It prevents any obvious mistakes or ‘To Do’ or ‘Hack’ sections of code getting into a build.
    • It checks that you haven’t forgotten to test or document some code.
    • It is handy for reminders of a deprecation notice, or an alert to a careless mistake.
    • It is the ideal place to do routine checks because mistakes are quick to fix and involve no fuss. Productive developers make a lot of mistakes, and like to fix them rapidly and quietly.
  • The build process needs a way of checking the code to ensure that nothing is going to break the build. By running a Database Code analysis in advance you know what will break it, and so it is quicker to remedy the problem and restart the build. The regular build process is, itself, a way of ensuring that the working database can be built from what is in source control, but a check of the code can avoid the work and delay of having to repeat the build.
  • The team need to check the code for any security issues before they get into the build. Any access control code (DCL) needs to be checked before a build.

Governance

The governance team will want to run checks on the current build, to get a general idea of where the code is on the spectrum between ‘working’ and ‘production quality’.

The governance process needs to get some idea of the extent to which compromises have been made by a development team in order to meet deadlines. This is the so-called ‘technical debt’.

Operations

Ops people will also need database code analysis. An obvious requirement is to do a security audit to make sure that no permissions are being escalated in code, and there are no other issues that cut across their security policy, such as the use of xp_cmdshell, or dynamic SQL, without proper access control.

It also allows the operations team to check for several of the causes of long-running queries in production databases, as well as for deprecated code, when doing a server upgrade. For databases, there are some tools written entirely in SQL that detect a wide range of omissions and potential problems in databases. Where Ops are actively working on delivery in cooperation with developers, these tests allow issues to be resolved long before they become a problem in production.

Static and dynamic code analysis

Static Code analysis parses the code as a compiler would, but for the purpose of with the objective of checking the syntax, rather than compiling it.  Static code analysis is a well-entrenched technique, based originally on the Lint Unix program that was designed in 1979 as a pre-processor for a C compiler. It parsed the C source code,. This performed several functions that nowadays are more usually picked up by the compiler, and flagged up suspicious and non-portable constructs in source code that would be likely to cause bugs. Most people still refer to any tool that flags suspicious coding as a ‘lint’ or a ‘linter’, regardless of which language it is written to analyse. Lint-like tools generally perform only static analysis of source code.

For databases, static analysis is very useful in checking routines, such as functions and procedures, for code smells that require code itself to be analysed. Dynamic database code analysis then extends the range of what can be checked. A user-created relational database is, in effect, built using a system  relational database, and it is simple to query the nature and structure of a user database with SQL  queries against the system database. This data about your database,  the so-called ‘metadata’, can be used to check the structure of the database objects, to explore how database objects inter-relate, and to look for errors and dubious practices.

As well as information about the database objects, their structure and interrelationships, you also have  access to information about the dynamic behaviour of object metadata in the system views,  by using dynamic management views. This makes it far easier to do a range of checks. Once a database is built, we can, for example, check for missing indexes, unused indexes, dependency problems, missing constraints, and slow-running queries.

In short, the combination of static and dynamic analysis gives you the opportunity for some very wide-ranging analysis.

Tools for Code Analysis

We’ve described several purposes for using code analysis tools and mentioned that there are two different technical approaches. It is inevitable that no single tool is able to cover all these different requirements. You would use an entirely different type of tool for all the various purposes, and in some cases, there is little available in the way of tooling to fill the requirement.

Tools for Static Code Analysis

Static code analysis works by parsing the code to do a ‘lint’ analysis on the code, and the tools to do it come in various guises.

Code Analysis within the Development IDE

By making it easy for developers to run their own checks, you are much more likely to prevent bad code getting into the build. The time-pressures that developers live with mean that any code analysis tool must be well-integrated with the workflow and user-interface of the IDE. This would allow them to check code the whole database, or just the code within a schema, or just the objects on which they are working.

Visual Studio with SQL Server Data Tools can detect a limited set of code issues, design issues, naming issues and performance issues. It also allows you to create new rules.

For both Visual Studio and SSMS, there are some third-party tools that you can plug in, such as SQL Enlight and SQL Code Guard.

For SSMS only is Devart’s T-SQL Code Analyzer Tool and for Eclipse, there is SonarQube, but the TSQL plug-in is rather rudimentary.

Database code analysis in its own IDE

SQL Cop has its own IDE and provides a range of useful reports. DB Best’s T-SQL Analyzer uses its own GUI and attaches to the live database to analyze stored procedures and user-defined functions in the database.

Database code analysis in a database monitoring tool

SQL Monitor is able to run SQL Code Guard checks on code that is showing problems on a live database.

Database code analysis in SQL Server

Policy-Based Management (PBM) is able to run a series of checks within a whole collection of servers, and is a good choice for checking and correcting security policies. It has a very comprehensive GUI for creating rules. Although this tool is used to enforce coding standards it requires the users to add a lot of custom checks to make it comprehensive. It has no way of analysing code via a parser.

Automated code analysis

SQL Code Guard can be used at the command line or with MS Build, and its reports can be read by any application that can import an xml report. SQL Enlight can also be used as a command-line tool. SQL Cop has teamed up with SQL Test so that SQL Cop tests can be incorporated directly into the SQL Test suite. Manduka is a command-line tool that runs only a very limited number of T-SQL tests. It is possible to use DacFx to check the code in a DacPac. There is a built-in parser that makes it easier.

Tools for dynamic database analysis

Dynamic code analysis uses the metadata of the live database and there are a few tools that can do it.

SQLBlitz is an ops-oriented tool for giving you a comprehensive report of what needs to be put right in a database. It is maintained by Brent Ozar but with a lot of community contributions. It is designed for DBAs who have inherited the maintenance of a SQL Server production database. Sp_BlitzIndex is used to detect duplicate indexes, unused indexes and heaps. Sp_BlitzCache displays your worst-performing queries and sp_BlitzFirst to get all the most important evidence required to fix slow-running queries.

A more specialist tool is Kimberly L. Tripp’s excellent sp_SQLskills_helpindex which will help diagnose all manner of index problems.

Glenn Berry is well-known for his SQL Server diagnostic Information queries, which are indispensable for operational work, and to give you a great deal of information about the server. There is also useful information for database settings as well.

Phil Factor has contributed his ‘table smells’ code that detects common problems with tables.

Problem areas in database code

The sort of checks that need to be made on a database before it goes into production are likely to include the following.

  • Problems with database design
  • Problems with the details of table design
  • Problems with datatypes
  • Problems with expressions
  • Problems with query syntax, including deprecated syntax
  • Problems with identifiers
  • Problems with routines
  • Security loopholes and vulnerabilities
  • Missing or duplicated indexes, or wrong type of index
  • Problems with maintenance tasks
  • Unusual server configuration requirements
  • Reliability issues in database config
  • Performance-related configuration issues
  • Issues with agent tasks.
  • Issues with High-availability and resilience

This is a vague list, but I hope it gives an idea of the range of checks that are possible.

Conclusions

Nobody would argue that database code analysis is essential. One can get by without it; but once you have it, it soon pays dividends. In my experience, I’ve benefitted most from having it in the IDE, as part of SSMS in my case, and ready to check for problems with the development database.  I’ve picked up all sorts of issues, from a team member putting in some cut-and-paste code with obsolescent syntax, to sections of code I’d commented as ‘todo’  or ‘undone’, but which I’d forgotten about. I’ve found it very useful for timely reminders about depreciation too.  SQL code analysis as a personal database tool is the place to start, I reckon. I have always supplemented this with a range of dynamic tests, written in SQL to ferret out the more obscure ‘smells’ in code. It is, for example, great to winkle out those ‘uber-functions’ that try to do too much, simply by listing out the functions or other routines with the most dependencies.

I’m not sure about using code analysis for the purpose of enforcing coding standards, mainly because best practices in SQL are far less black-and-white than in procedural code. Naming conventions certainly benefit from scrutiny, but some of the best practices I’ve seen in coding policies are ridiculous.

Where it comes to analysing technical debt via SQL code analysis, we seem to be in the dark ages. I’ve seen measures of code complexity for SQL routines that seem to work but we need more thought about how to generate a measure that conforms with what database people consider to be technical debt. We need to understand what the equivalent of coupling and cohesion really are, how we assess readability, and how significant cyclomatic complexity is for SQL code. However, the number of warnings that are flagged up at the time of build are a reasonably  good measure, and are surely better than nothing.

I definitely like to have a simple way to check that code will compile without errors before the automated build process because it is much simpler to fix things at that stage. 

The biggest problem with SQL code analysis is in determining and agreeing all the dubious practices and problem code: of deciding the best naming conventions and getting a consensus on what constitutes a SQL code smell, or SQL Anti-Pattern. The actual detection process just isn’t the hardest part.