T-SQL Decommenter Part I


I frequently use sys.sql_modules to search for substrings in T-SQL code modules (views, stored procedures, functions, triggers, etc.).  While I know I can use "View Dependencies" to identify references to tables, views, and stored procedures, I generally prefer to use sys.sql_modules because it lets me search for arbitrary strings.

Here's an example of querying for the T-SQL behind ufnGetCustomerInformation in AdventureWorksLT (the sample database available for Azure SQL Database):

SELECT	O.[name] AS ObjName,
FROM sys.sql_modules AS SM
	JOIN sys.objects AS O
		ON SM.[object_id] = O.[object_id]
WHERE O.[name] = 'ufnGetCustomerInformation';

Before running this query, I recommend configuring the Results|Grid page of the Query Options for SSMS as follows.  This can be done either on a per-query basis or in the SSMS Preferences.  If you don't see all of these options, consider upgrading to SSMS 2017.

  • Check "Retain CR/LF on copy or save"
  • Under "Maximum Characters Retrieved", set "Non XML data" to "65535"
  • Under "Maximum Characters Retrieved", set "XML data" to "Unlimited"

The output from this query includes the definition for ufnGetCustomerInformation.  Copy the definition into a new query window and observe that it uses the CustomerID column.  If we wanted to identify all T-SQL objects in the database that use CustomerID, we might use the following query:

SELECT	S.[name] + N'.' + O.[name] AS ObjName,
FROM sys.sql_modules AS SM
	JOIN sys.objects AS O
		ON SM.[object_id] = O.[object_id]
	JOIN sys.schemas AS S
		ON O.[schema_id] = S.[schema_id]
WHERE SM.[definition] LIKE '%CustomerID%';

This lists two additional T-SQL objects, dbo.ufnGetAllCategories and SalesLT.vGetAllCategories.  Copy the definition for dbo.ufnGetAllCategories into a new query window and observe something interesting:

CREATE FUNCTION [dbo].[ufnGetAllCategories]()
RETURNS @retCategoryInformation TABLE
    -- Columns returned by the function
    [ParentProductCategoryName] nvarchar(50) NULL,
    [ProductCategoryName] nvarchar(50) NOT NULL,
    [ProductCategoryID] int NOT NULL
-- Returns the CustomerID, first name, and last name for the specified customer.
    WITH CategoryCTE([ParentProductCategoryID], [ProductCategoryID], [Name]) AS
        SELECT [ParentProductCategoryID], [ProductCategoryID], [Name]
        FROM SalesLT.ProductCategory
        WHERE ParentProductCategoryID IS NULL

The actual T-SQL code contains no references to CustomerID - the only occurrence is in a incorrect comment that likely resulted from poor copy-and-paste coding!  The same copy-and-paste comment mistake shows up in SalesLT.vGetAllCategories. Unfortunately, there is no easy way to match against sys.sql_modules.definition that will avoid matching within the comments, so it would be nice to remove the comments from the T-SQL before matching the pattern.  Web searches identified a number of attempts by several DBAs and developers to programatically remove comments, but none of them seemed to meet all my requirements. My requirements, starting with the most important, were:

  • Correctness.  The code must properly ignore comment-like sequences in identifiers and string literals, handle corner cases such as -- embedded in /* */ and vice-versa, etc.
  • No procedural code, no functions, no dependence on .NET, etc.  I wanted a single T-SQL query that could run on its own.
  • Understandable code.  The above requirements (especially the correctness) meant this wouldn't be a simple query, but it should be cleanly laid out.
  • Reasonable performance.  Because this is used for investigating databases and not production execution, performance is a benefit, but not required.

I knew that, given the limitations of T-SQL, avoiding procedural code meant using recursive Common Table Expressions to iterate through the T-SQL code and match elements.  I also knew that in order to address the requirement for correctness, I would need to build a comprehensive test suite that covered a whole host of T-SQL syntax edge and corner cases.  In additional articles, I will go into more detail about the T-SQL syntax requirements as well as the implementation of the query and the test suite.  This first article simply covers using the query.

To use the query, download SQL_Search_Decommenter.sql and run it in the desired database.  The query will return all objects with T-SQL code (views, stored procedures, functions, triggers, etc.) in the database.  The results can be filtered by specifying patterns in the @NamePat and @SQLPat variables.  I specify these with variables at the top to simplify locating and modifying them.  The @DecommentedOnly variable can be changed to 1 to return only objects whose decommented SQL matches @SQLPat.(by default, the query returns all objects that match, along with the column DecommentedHasMatch indicating whether the decommented code matches)  Finally, the @MaxIter variable can be used to control how many iterations of the recursive Common Table Expression will be permitted as part of the decommenting process.

The output columns are as follows:

  • ObjName: 2-part name for the object
  • DecommentedHasMatch: 1 indicates that @SQLPat matched the Decommented code, 0 indicates no match
  • Definition: Original code from sys.sql_modulesNULL if length of original code is greater than 65,535 characters (SSMS won't retrieve more than 65,535 characters from non-XML character datatypes).
  • Decommented: Code from sys.sql_modules with comments stripped.  /*...*/ is replaced with a space.  --...\n is replaced is with \r\n. NULL if length of original code is greater than 65,35 characters.
  • Definition_in_XML: If length of original code is greater than 65,535 characters, this contains the code embedded in an XML comment.  Click on the XML, then copy the contents out of the XML tag and paste into an Query window.  While I agree that a code module over 64 KB is a bit extreme, I have seen vendors ship triggers larger than 64K, as well as stored procedures that are multiple megabytes!
  • Decommented_in_XML: If length of original code is greater than 65,535 characters, this contains the decommented code embedded in an XML comment.  Click on the XML, then copy the contents out of the XML tag and paste into an Query window.
  • Iter: Number of iterations required to decomment the code.

Using this query, we could proceed to search AdventureWorksLT for references to CustomerID that aren't in the comments.  Update the @SQLPat variable at the top of the query as follows and then run the query:

 @SQLPat nvarchar(1000) = '%CustomerID%',

You should get three rows back, but only one of them will have 1 in the DecommentedHasMatch column:

You should be able to inspect the Definition and Decommented columns to observe that the query effectively strips comments from the original Definition and returns the decommented T-SQL in the Decommented column.  If you didn't want to see the objects that had CustomerID only in the comments, you could set the @DecommentedOnly variable at the top to 1 and the query would filter out those rows.

In the next article, I'll discuss the T-SQL comment behaviors that the code must support, how those were encoded to support a data-driven query design, and then demonstrate how the recursive query iterates through a sample piece of T-SQL to strip comments.



4.44 (9)




4.44 (9)