Stairway to Exploring Database Metadata

Idempotent DDL Scripts That Always Achieve The Same Result - Making Changes Only Once: Stairway to Exploring Database Metadata Level 6

,

Idempotence is the property of certain operations in mathematics and computer science, that can be applied multiple times without changing the result beyond the first time they are run. When you are making changes to a database to deploy a new version, you need to be sure that certain changes aren’t made twice, or in the wrong order. If you neglect to do this, you will generally get an error, but the consequences aren’t entirely predictable, and a chain of events can have an unhappy ending. You want to make sure your scripts are idempotent.

There are a number of motives for wanting to CREATE an object such as a stored procedure only if it doesn’t exist, and to ALTER it if it does exist. Generally, it means that you can make changes to a routine whether or not it already exists in that database or not. It is very handy for doing maintenance work that requires a whole lot of temporary routines. It is wonderful for doing integration tests that require build-up and tear-down routines that need to work in all circumstances.

In this stairway, we’ll explore various tasks that require this sort of idempotence, using the metadata.

Why Bother?

If you log every build script in a database table if it is run successfully, you could surely check to make sure that no file is ever run twice?

Firstly, if you relied on that technique, each script would need to be entirely transactional, meaning that it either executes entirely and successfully, or leaves no trace at all. This is good practice, but are all your build scripts guaranteed to be transactional? Secondly, you would have no quick remedy for the loss of particular database objects within an existing build. You would be prevented from running an essential script because you’ve run it before.

I maintain a version number for my database. I’m told that this is enough, and if each script takes me from one version increment to the next, then it can’t be run twice or in the wrong order.

Reality can take a vengeance on you for being so confident. In a perfect world this would work, but takes no account of work you need to do in a hurry, in antisocial hours, when things go wrong. Most DBAs will confess that they once left a query window open on a production server, came back to it a while later, and forgot to check what server they’d logged into before executing a DROP statement.

Even if this fixes the problems of build batches, it doesn’t help with integration tests, scalability tests, or any other of the essential development and deployment tests. You are doing these tests, right?

Create and alter statements

Everyone needs to occasionally check whether an object exists. Most frequently it is when you are working on a definition of a module. A module, also referred to as a routine, is any SQL Server object in which is defined by T-SQL code. This means such things as stored procedures, functions or views, but will also include such things as triggers, constraints and rules. You can see for yourself the range of such objects that are in your database.

/* find out all the types of modules defined in the database */SELECT DISTINCT type_desc
  FROM sys.sql_modules m
    INNER JOIN sys.objects o
    ON m.object_ID=o.object_ID

The SQL DDL of Modules are stored by their CREATE statements. More generally, you will want to ALTER these modules, so you probably store the individual modules as ALTER statements in your version control systems (VCS). The problem with DDL source code that you are working on actively in development is that sometimes a version exists that you want to over-write, and sometimes it doesn’t, so sometimes you need an ALTER statement and sometimes a CREATE statement. So what does everyone do? They put in a short batch before the CREATE statement to DROP (delete/kill) any previous version if it exists. SQL Server 2016 actually provides a DROP IF EXISTS operator to make this easier to do. This action is actually a bad idea because it wipes out any existing permissions and extended properties. However, we’ll deal with that issue later on in the Stairway because there are other more pressing problems.

Here is a simple example of a scalar function that tells you the ideal length of a dimension (e.g. width) for a box given an associated smaller dimension (height). Prior to the function definition we check to see if the function already exists, and drop it if it does.

if (object_ID('dbo.DeleteMePlease') is not null)
   DROP FUNCTION dbo.DeleteMePlease
GO
CREATE FUNCTION dbo.DeleteMePlease(@MyParameter int)
RETURNS int
WITH EXECUTE AS CALLER
AS BEGIN RETURN(@MyParameter * 1.6180339887) END
GO

We can work on this to improve it, but it is reasonably functional. It will work whether the function is there or not.

We’ve used the Metadata function object_id(). This is the recommended way of checking for existence, but what if dbo.DeleteMePlease exists already but is a table? You’ll get two errors, but nothing got dropped and nothing created so no damage was done.

Msg 3705, Level 16, State 1, Line 2
Cannot use DROP FUNCTION with 'dbo.DeleteMePlease' because 'dbo.DeleteMePlease' is a table. Use DROP TABLE.
Msg 2714, Level 16, State 3, Procedure DeleteMePlease, Line 5
There is already an object named 'DeleteMePlease' in the database.

This is because you are saying, in that first ‘guard’ batch, that if any object called dbo.DeleteMePlease exists then drop a function called dbo.DeleteMePlease. You are assuming that it is a function. Of course, if there is a table called dbo.DeleteMePlease in the database you won’t be able to create the function, so there will still be an error. It is safe but bewildering.

What if you can’t have an error? You first need to check for the existence of a function dbo.DeleteMePlease rather than just any object.

This will work but it is a bit clunky:

if exists (select 1 from sys.objects where type_desc like '%FUNCTION' and object_ID=object_ID('dbo.DeleteMePlease'))
   DROP FUNCTION dbo.DeleteMePlease

…there is a metadata function that is neater for a scalar function…

if ( OBJECTPROPERTYEX ( object_ID('dbo.DeleteMePlease'), 'IsScalarFunction')=1)
   DROP FUNCTION dbo.DeleteMePlease

…but for some weird reason the ‘IsFunction’ value for the ObjectProperty isn’t valid. The ‘IsExecuted’ value gets you views and procedures as well. This means that you’d be best to do this:

IF EXISTS 
  (SELECT * FROM sys.objects WHERE  object_id = OBJECT_ID(N'dbo.DeleteMePlease')
                  AND type IN ( 'FN','FT', 'IF', 'TF','FS' ))
  DROP FUNCTION dbo.DeleteMePlease

SQL Server doesn’t allow two objects with the same name in the same schema, so if you’ve successfully dropped the function, then you’re safe to recreate it. If there was no function called dbo.DeleteMePlease but there was a table, then you just get the error:

Msg 2714, Level 16, State 3, Procedure DeleteMePlease, Line 4
There is already an object named 'DeleteMePlease' in the database.

The statement found that there wasn’t a function called dbo.DeleteMePlease, it went on to the next batch after the GO and tried to CREATE a function when there is already a table of the same name. However, no harm was done.

If there is a clash of names, there is no way of sorting it out automatically. It must be investigated. This means raising an error.

In this example, we first see if any object exists with the specified name. Then we check whether it is a function. If so we are safe to drop it. Otherwise, we set the connection to parse but not execute all following code. In working code you’ll want to roll back and maybe set conditional logic based on the @@Error variable, but this is an example. Otherwise we can execute the following batch and remember to finish with SET NOEXEC OFF.

--check whether dbo.DeleteMePlease exists as an object in the dbo schema
IF  (object_id('dbo.DeleteMePlease') IS NOT NULL) 
 BEGIN --check whether dbo.DeleteMePlease exists as a scalar function
 IF ( objectpropertyex ( object_id('dbo.DeleteMePlease'), 'IsScalarFunction')=0)
   BEGIN --do whatever is necessary. You have to find out why
   RAISERROR ('dbo.DeleteMePlease already exists and it isn''t a scalar function',16,1)
   SET NOEXEC ON --prevent futher execution on this connection 
   END
 ELSE DROP FUNCTION dbo.DeleteMePlease --safe to drop it because it exists as a function
 END
GO -- so now we can create our new better version of the function
CREATE FUNCTION dbo.DeleteMePlease(@MyParameter INT)
  RETURNS INT
  WITH EXECUTE AS CALLER
  AS BEGIN RETURN (@MyParameter * 1.6180339887) END
GO
SET NOEXEC OFF --always remember to do this. It can be perplexing otherwise

This NOEXEC trick is odd isn’t it? Why not set a variable if we drop the function? The trouble is that CREATE statements must be at the start of a batch and SQL Server local variables have a scope of only a single batch Why not use a connection-based ‘global’ user variable? Simple, they don’t exist in SQL Server. You can use the rather awkward system function Context_Info for session-based info, but SQL Server 2016 offers a new system function SESSION_CONTEXT() to do this rather more easily. In the meantime, the NOEXEC trick is fine.

Duplicate names for objects

Let’s just take a step back. We’ve just been having to deal with duplicate names. I’ve already mentioned that SQL Server allows duplicate names for objects, which means that you either have to reference object names with their schema, or use the object_ID of the object.

The way of getting an object_ID from a two-part (<schema><object>) name is to use the metadata function.

    object_ID('<schema>.<object>'))

To get the two-part name that SQL uses to describe an object, you’ll need two functions, as shown:

    object_schema_name(<the object_ID> )+'.'+object_name(<the object_ID> )

So a round trip might look like the following:

DECLARE @MyObject int
SELECT @MyObject = object_ID('dbo.DeleteMePlease')
SELECT object_schema_name(@MyObject )+'.'+object_name(@MyObject)

It is important to emphasise that using the name alone is a bad idea. For that reason, we’ll wrap up by demonstrating how it can happen. Please use a practice database for this!

/* this simply demonstrates that you can have duplicate names for
the same object type in SQL Server */CREATE TABLE dbo.DeleteMePlease 
  (MyKey int IDENTITY); --create a dummy table in DBO
GO
CREATE SCHEMA silly;
CREATE TABLE silly.DeleteMePlease 
  (MyKey int IDENTITY);--create another dummy table in silly
GO
CREATE SCHEMA sillier;
CREATE TABLE sillier.DeleteMePlease 
  (MyKey int IDENTITY);--create another dummy table in sillier
GO
--now demonstrate the fact that using a name alone is bad news
SELECT object_schema_name(object_ID)+'.'
       + name+' ('+type_desc collate database_default +')'
  FROM sys.objects 
  WHERE name LIKE 'deleteMePlease';
/* --------- which gives...
dbo.DeleteMePlease (USER_TABLE)
silly.DeleteMePlease (USER_TABLE)
sillier.DeleteMePlease (USER_TABLE)
------*/--now we clean up
DROP TABLE dbo.DeleteMePlease; 
DROP TABLE silly.DeleteMePlease;
DROP TABLE sillier.DeleteMePlease;
DROP SCHEMA silly;
DROP SCHEMA sillier;

There are some good reasons for allowing duplicate names in different schemas, thereby creating namespaces, but you would never want to do it for general work because of the risk of getting it wrong.

A few guard clauses.

Now that we’ve established the principle of writing idempotent code by whether something needs to be done before doing it, or alternatively checking whether it has already been done, we can extend the check to all sorts of different conditions, using the catalog views.

Adding a table column

This code will only execute the code between the BEGIN and END if the specified column in the named table actually doesn’t exist.

IF NOT EXISTS(  
  SELECT * FROM sys.columns
  WHERE object_ID=object_id('HumanResources.Employee')
   AND col_name(object_ID,column_Id)='BusinessEntityID'
)
  BEGIN
--it doesn’t exist so do something
  END;

Altering the datatype of a table column

This code will prevent a change to the datatype being performed more than once, with the risk to the data, such as losing or truncating it, depending on the original and final datatypes.

IF NOT EXISTS(  
  SELECT name, TYPE_NAME(user_type_id) FROM sys.columns
  WHERE object_ID=object_id('HumanResources.Employee')
   AND name='Jobtitle'
   AND TYPE_NAME(user_type_id)='nvarchar'
 )
  BEGIN
--it isn't an NVARCHAR so do something
  END;

Creating an index only if it doesn’t exist

This code checks first to see whether an index exists and will only execute the BEGIN … END block if the index does not exist.

IF NOT EXISTS( 
  SELECT * FROM sys.indexes i
    WHERE i.object_ID=object_id('HumanResources.Employee')
      AND name ='AK_Employee_LoginID'
)
  BEGIN
--then the index doesn’t exist
  END;

ALTERing an index if it does exist and creating it first if it doesn’t

This code alters an index. If the index doesn’t exist, then it first creates it

--ALTERing an index if it does exist but creating it first if it doesn’t
IF EXISTS( 
  SELECT * FROM sys.indexes i
    WHERE i.object_ID=object_id('HumanResources.Employee')
      AND name ='AK_Employee_LoginID'
) SET NOEXEC ON;
GO
CREATE UNIQUE NONCLUSTERED INDEX [AK_Employee_LoginID] ON [HumanResources].[Employee]
(
[LoginID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,
       IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, 
   ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY];
GO
EXEC sys.sp_addextendedproperty @name=N'MS_Description', 
     @value=N'Unique nonclustered index.' , @level0type=N'SCHEMA',
 @level0name=N'HumanResources', @level1type=N'TABLE',@level1name=N'Employee',
 @level2type=N'INDEX',@level2name=N'AK_Employee_LoginID'
GO
SET NOEXEC OFF;
ALTER INDEX AK_Employee_LoginID ON HumanResources.Employee
REBUILD ;

Creating an index only if nothing similar exists

Here we create a particular index. Instead of guarding against a repeated execution by testing to see whether an index of the same name exists, we are checking whether an index already exists which Contains the same columns in the same order. The guard query has a string containing the list of columns being indexed. (INCLUDEd columns are ignored) and if there is already an index that uses the same list in the same order, it doesn’t create the new index.

/* See if a similar index already exists and only create it if no similar
one exists (be careful with the comma delimiter, it must have a space after it!)
Note that one can have different ideas of what constitutes a similar index.
You can have a stricter or morelax definition.
*/  
IF NOT EXISTS
  (SELECT * FROM
(SELECT 
  coalesce(stuff(
(SELECT
   ', ' + col_name(Ic.Object_Id, Ic.Column_Id)
   + CASE WHEN Is_Descending_key <> 0 THEN ' DESC' ELSE '' END
 FROM Sys.Index_Columns Ic
 WHERE Ic.Index_Id = indexes.Index_Id 
   AND Ic.Object_Id = indexes.Object_Id
   AND is_included_column=0
ORDER BY key_Ordinal
FOR XML PATH (''), TYPE).value('.', 'varchar(max)'),1,2,''), '?') AS columnList
FROM sys.indexes [indexes]
WHERE type_desc NOT IN ('heap','xml')
AND indexes.object_ID=object_id('HumanResources.Employee')
)f
  WHERE columnList LIKE 'OrganizationLevel, OrganizationNode'
  )
CREATE UNIQUE INDEX AK_NewIndex 
    ON HumanResources.Employee(OrganizationLevel, OrganizationNode)

Checking whether a column has any sort of constraint

Sometimes you need to check whether there is any type of constraint on a column before doing an action to alter a column. Here, the information schema has some advantages. You can, if you need to, even get the name of the constraint, but not the type. It performs the check easily:

IF EXISTS
  (
  SELECT *
    FROM INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE
    WHERE CONSTRAINT_COLUMN_USAGE.TABLE_SCHEMA = 'HumanResources'
      AND CONSTRAINT_COLUMN_USAGE.TABLE_NAME = 'Employee'
      AND CONSTRAINT_COLUMN_USAGE.COLUMN_NAME = 'BusinessEntityID'
)
  BEGIN
  --do something 
   END;

The method of determining whether a constraint of any type exists on a column, via the catalog views is more tortuous. I’ve enhanced the query so that you can use it to tell you the type of constraint, and you can opt to have a NOT NULL constraint in the list as well (comment it out if you don’t want that check.)

IF EXISTS( 
SELECT ConstraintType FROM --check, default, unique, primary key, foreign key, Not NULL
  (SELECT object_ID,parent_column_ID,'default' FROM sys.default_constraints
 UNION ALL
   SELECT parent_object_ID,parent_column_ID,'foreign' FROM sys.foreign_key_columns
     UNION ALL
 SELECT object_ID, column_ID, 'NOT NULL'FROM sys.columns WHERE is_nullable=0
 UNION ALL
   SELECT i.object_ID,ic.column_ID, 
     CASE WHEN is_primary_key=1 THEN 'Primary Key' ELSE 'Unique'END
     FROM sys.indexes i
    INNER JOIN sys.index_columns ic
      ON i.object_ID=ic.object_ID AND i.index_id=ic.index_ID
   WHERE is_primary_key=1 OR is_unique_constraint=1
      UNION ALL  
   SELECT object_ID,parent_column_ID, 'Check' FROM sys.check_constraints
      )allConstraints(object_ID,Column_ID, ConstraintType)
   WHERE object_ID=object_id('HumanResources.Employee')
   AND col_name(object_ID,column_ID)='BusinessEntityID'
 
)
  BEGIN
--do something
  END;

More often, you will want to test for a primary key before adding it, or checking whether a column has already been made NOT NULL before making it so. If so, then you just need to use that part of the above query to test it.

Here, for example, is a query to test whether the column is already NOT NULLable:

IF EXISTS
  (
  SELECT *
    FROM sys.columns ic
    WHERE ic.is_nullable = 0
      AND i.object_id = OBJECT_ID('HumanResources.Employee')
      AND COL_NAME(ic.object_id, ic.column_id) = 'BusinessEntityID'
)
  BEGIN
  --do something
  SELECT 1;
  END;

Checking whether a column participates in a primary key etc.

Basically, this guard query checks whether the column that you specify already participates in a primary key, but you can alter it to detect whether it participates in an index with any combination of other attributes such as whether it is unique, a unique constraint, is disabled, clustered etc.

IF EXISTS(  
  SELECT 1 FROM sys.indexes i
    INNER JOIN sys.index_columns ic
      ON i.object_ID=ic.object_ID AND i.index_id=ic.index_ID
  WHERE is_primary_key=1 
   AND i.object_ID=object_id('HumanResources.Employee')
   AND col_name(ic.object_ID,ic.column_Id)='BusinessEntityID'
)
  BEGIN
--do something
  END;

Alternatively you can do other checks for other index attributes such as checking whether the type_desc column of the sys.indexes table has the values 'CLUSTERED', 'NONCLUSTERED' or 'HEAP':

You can also check the attributes of the columns of the sys.columns table as follows:

  • is_unique =1 --or 0
  • ignore_dup_key =1 --or 0
  • is_unique_constraint=1 --or 0
  • is_padded =1--or 0
  • is_disabled =1--or 0
  • is_hypothetical =1 --or 0
  • has_filter =1--or 0

Summary

There are many occasions when doing database development work that you will need to check whether a database object or property is actually there before executing SQL code. Many DDL scripts, such as making a column NOT NULL must only be run once and a guard condition prevents it being executed a second time and thereby creating an error. This type of guard condition checks to see if the action has already been taken. Other types of guard clauses will delete an object if it exists before creating it, or may create a ‘stub’ before an ALTER statement is executed, to prevent an error if an attempt is made to alter an object that doesn’t yet exist. There is a similar range of techniques that can be used to make sure that mistakes in applying DDL code have no serious consequences.

This article is part of the parent stairway Stairway to Exploring Database Metadata

Rate

5 (6)

You rated this post out of 5. Change rating

Share

Share

Rate

5 (6)

You rated this post out of 5. Change rating