For most DBAs, normalization is an understood concept, a bread and butter bit of knowledge. However, it is not at all unusual to review a database design by a development group for an OLTP (OnLine Transaction Processing) environment and find that the schema chosen is anything but properly normalized. This article by Brian Kelley will give you the core knowledge to data model.
I frequently use
sys.sql_modules to search for substrings in T-SQL code modules (views, stored procedures, functions, triggers, etc.). While I know I can use “View Dependencies” to identify references to tables, views, and stored procedures, I generally prefer to use
sys.sql_modules because it lets me search for arbitrary strings.
Here’s an example of querying for the T-SQL behind
ufnGetCustomerInformation in AdventureWorksLT (the sample database available for Azure SQL Database):
SELECT O.[name] AS ObjName, SM.[definition] FROM sys.sql_modules AS SM JOIN sys.objects AS O ON SM.[object_id] = O.[object_id] WHERE O.[name] = 'ufnGetCustomerInformation';
Before running this query, I recommend configuring the Results|Grid page of the Query Options for SSMS as follows. This can be done either on a per-query basis or in the SSMS Preferences. If you don’t see all of these options, consider upgrading to SSMS 2017.
- Check “Retain CR/LF on copy or save”
- Under “Maximum Characters Retrieved”, set “Non XML data” to “65535”
- Under “Maximum Characters Retrieved”, set “XML data” to “Unlimited”
The output from this query includes the
ufnGetCustomerInformation. Copy the
definition into a new query window and observe that it uses the
CustomerID column. If we wanted to identify all T-SQL objects in the database that use
CustomerID, we might use the following query:
SELECT S.[name] + N'.' + O.[name] AS ObjName, SM.[definition] FROM sys.sql_modules AS SM JOIN sys.objects AS O ON SM.[object_id] = O.[object_id] JOIN sys.schemas AS S ON O.[schema_id] = S.[schema_id] WHERE SM.[definition] LIKE '%CustomerID%';
This lists two additional T-SQL objects,
SalesLT.vGetAllCategories. Copy the
definition for dbo.ufnGetAllCategories into a new query window and observe something interesting:
CREATE FUNCTION [dbo].[ufnGetAllCategories]() RETURNS @retCategoryInformation TABLE ( -- Columns returned by the function [ParentProductCategoryName] nvarchar(50) NULL, [ProductCategoryName] nvarchar(50) NOT NULL, [ProductCategoryID] int NOT NULL ) AS -- Returns the CustomerID, first name, and last name for the specified customer. BEGIN WITH CategoryCTE([ParentProductCategoryID], [ProductCategoryID], [Name]) AS ( SELECT [ParentProductCategoryID], [ProductCategoryID], [Name] FROM SalesLT.ProductCategory WHERE ParentProductCategoryID IS NULL ....
The actual T-SQL code contains no references to
CustomerID – the only occurrence is in a incorrect comment that likely resulted from poor copy-and-paste coding! The same copy-and-paste comment mistake shows up in SalesLT.vGetAllCategories. Unfortunately, there is no easy way to match against
sys.sql_modules.definition that will avoid matching within the comments, so it would be nice to remove the comments from the T-SQL before matching the pattern. Web searches identified a number of attempts by several DBAs and developers to programatically remove comments, but none of them seemed to meet all my requirements. My requirements, starting with the most important, were:
Correctness. The code must properly ignore comment-like sequences in identifiers and string literals, handle corner cases such as
/* */and vice-versa, etc.
- No procedural code, no functions, no dependence on .NET, etc. I wanted a single T-SQL query that could run on its own.
- Understandable code. The above requirements (especially the correctness) meant this wouldn’t be a simple query, but it should be cleanly laid out.
- Reasonable performance. Because this is used for investigating databases and not production execution, performance is a benefit, but not required.
I knew that, given the limitations of T-SQL, avoiding procedural code meant using recursive Common Table Expressions to iterate through the T-SQL code and match elements. I also knew that in order to address the requirement for correctness, I would need to build a comprehensive test suite that covered a whole host of T-SQL syntax edge and corner cases. In additional articles, I will go into more detail about the T-SQL syntax requirements as well as the implementation of the query and the test suite. This first article simply covers using the query.
To use the query, download
SQL_Search_Decommenter.sql and run it in the desired database. The query will return all objects with T-SQL code (views, stored procedures, functions, triggers, etc.) in the database. The results can be filtered by specifying patterns in the
@SQLPat variables. I specify these with variables at the top to simplify locating and modifying them. The
@DecommentedOnly variable can be changed to
1 to return only objects whose decommented SQL matches
@SQLPat.(by default, the query returns all objects that match, along with the column
DecommentedHasMatch indicating whether the decommented code matches) Finally, the
@MaxIter variable can be used to control how many iterations of the recursive Common Table Expression will be permitted as part of the decommenting process.
The output columns are as follows:
ObjName: 2-part name for the object
0indicates no match
Definition: Original code from
NULLif length of original code is greater than 65,535 characters (SSMS won’t retrieve more than 65,535 characters from non-XML character datatypes).
Decommented: Code from
sys.sql_moduleswith comments stripped.
/*...*/is replaced with a space.
--...\nis replaced is with
NULLif length of original code is greater than 65,35 characters.
Definition_in_XML: If length of original code is greater than 65,535 characters, this contains the code embedded in an XML comment. Click on the XML, then copy the contents out of the XML tag and paste into an Query window. While I agree that a code module over 64 KB is a bit extreme, I have seen vendors ship triggers larger than 64K, as well as stored procedures that are multiple megabytes!
Decommented_in_XML: If length of original code is greater than 65,535 characters, this contains the decommented code embedded in an XML comment. Click on the XML, then copy the contents out of the XML tag and paste into an Query window.
Iter: Number of iterations required to decomment the code.
Using this query, we could proceed to search
AdventureWorksLT for references to
CustomerID that aren’t in the comments. Update the
@SQLPat variable at the top of the query as follows and then run the query:
@SQLPat nvarchar(1000) = ‘%CustomerID%’,
You should get three rows back, but only one of them will have
1 in the
You should be able to inspect the
Decommented columns to observe that the query effectively strips comments from the original
Definition and returns the decommented T-SQL in the
Decommented column. If you didn’t want to see the objects that had
CustomerID only in the comments, you could set the
@DecommentedOnly variable at the top to
1 and the query would filter out those rows.
In the next article, I’ll discuss the T-SQL comment behaviors that the code must support, how those were encoded to support a data-driven query design, and then demonstrate how the recursive query iterates through a sample piece of T-SQL to strip comments.