Building a complicated query like this and ensuring that it has the correct behavior may not be easy, but it is achievable when guided by a comprehensive test suite. When I set out to write this query, I started by developing a test suite and only then started writing code. As I explored the T-SQL syntax, I continued to add test cases to validate edge cases. In general, I prefer to err on the side of having too many tests rather than too few, especially for code where there are wide variety of permutations possible for the input. The full test suite along with the decommenter rigged to run against it is in the attached file Decommenter_Test.sql.
The tests in the suite are defined in the temp table
#tests, which receives two populations of tests. The columns in
ID, a unique identifier for each test
OrigSQL, the SQL to be decommented
ValidSQL, the correctly decommented SQL against which the query is validated
MaxIter, the iteration limit for the test.
#tests contains hundreds of tests for a wide variety of scenarios. Here is a drastically redacted sample of this test population:
INSERT INTO #tests (ID, OrigSQL, ValidSQL, MaxIter) SELECT ID, CAST(OrigSQL AS nvarchar(max)) AS OrigSQL, CAST(ValidSQL AS nvarchar(max)) AS ValidSQL, 100 AS MaxIter FROM ( VALUES (0, NULL, NULL), (1, '', ''), (2, ' ', ' '), (3, 'S', 'S'), (108, ' /*Foo*/', ' '), (134, '/* Foo */'+CHAR(13)+CHAR(10)+ 'SELECT *'+CHAR(13)+CHAR(10)+ '/* Bar */'+CHAR(13)+CHAR(10)+ 'FROM Bar', ' '+CHAR(13)+CHAR(10)+ 'SELECT *'+CHAR(13)+CHAR(10)+ ' '+CHAR(13)+CHAR(10)+ 'FROM Bar' ), (203, '--'+CHAR(13)+CHAR(10)+'-Foo-', CHAR(13)+CHAR(10)+'-Foo-'), (310, '/**/', ' '), (311, '[/**/]', '[/**/]'), (312, '/**/', ' '), (416, '''/*Foo*/', '''/*Foo*/'), (530, '"/**/"""', '"/**/"""'), (950, REPLICATE('/* */ ', 50), REPLICATE(' ', 49) + '/* */ '), (NULL, NULL, NULL) ) AS V(ID, OrigSQL, ValidSQL) WHERE ID IS NOT NULL;
This batch of tests focuses on correctly and completely decommenting the T-SQL, so it uses a
100 for all of the tests. The one exception to completely decommenting the T-SQL is the final test,
950, which is used to validate that
MaxIter is indeed set to
100 and that
MAXRECURSION at the end of the query is not set below
100. Also note that I'm lazy and don't like having to remember to remove the comma on the last test, so the final
(NULL, NULL, NULL) is the record without a trailing comma and is excluded by the
The second batch of tests focuses on how the decommenter degrades when the iteration limit is reached. All of the tests in this batch share the same
OrigSQL, but as
MaxIter drops the
ValidSQL begins to include comments. At
19 and above, all of the comments are removed. As
MaxIter drops to
18 and lower, the decommenter begins to leave comments at the end of the output as it runs out of iterations to remove the comments.
The test harness consists of the decommenter logic hooked up with
#tests as the source for the anchor rows. A fairly comprehensive list of columns is included in the
Decomment recursive CTE to support debugging. At the end of the whole query, all output rows from
Decomment are checked to see if the test passes.
FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Iter DESC) AS RowN FROM Decomment ) AS D CROSS APPLY ( SELECT CASE WHEN D.RowN = 1 AND D.IsFinished = 1 AND D.Iter <= D.MaxIter AND ( CAST(D.NewSQL AS varbinary(max)) = CAST(D.ValidSQL AS varbinary(max)) OR ( D.NewSQL IS NULL AND D.ValidSQL IS NULL ) ) THEN 1 WHEN D.RowN > 1 AND D.IsFinished = 0 AND D.Iter <= D.MaxIter THEN 1 ELSE 0 END AS TestPassed ) AS C1
The first thing is to determine whether a given row is the final iteration. This is handled by using
ROW_NUMBER() on the
Decomment output, partitioned by
ID and ordered on
Iter in descending order. When
1, it is the final iteration for that test.
The logic for determining
TestPassed has two main forks, one for the final iteration for a given test
(RowN = 1) and one for all of the prior iterations.
- For the final iteration, all of the following must be true for the test to pass:
Itermust not exceed
D.ValidSQLmust be identical (including trailing spaces, case, and so forth, checked using a cast to
varbinary(max)), or both must be
- For the prior iterations, the test is easier:
Itermust not exceed
WHERE clause isolates the rows where the test did not pass (
C1.TestPassed = 0). There is a commented out
WHERE clause illustrating debugging. If a test fails, substitute the failing test
ID into the alternative
WHERE clause and toggle which clause is commented out. All of the iterations for the failing test will be displayed, making it easier to understand the issue. This can also be used to investigate how the decommenter iteratively processes various tests.
Decommenter_Test.sql file is the definitive version of the decommenter - it's what I work from whenever I'm working on the decommenter code, because I need to know immediately if a change breaks any tests. However, to be useful, I want a decommenter that runs against
sys.sql_modules. That version is is
SQL_Search_Decommenter.sql, which was developed from
Decommenter_Test.sql. Whenever I update
Decommenter_Test.sql, I compare the two files and then make similar changes in
SQL_Search_Decommenter.sql to bring it into line. There are a number of differences between the two, both to adapt the code to
sys.sql_modules and to improve execution performance.
SQL_Search_Decommenter.sql starts with a
DECLARE block. I use the
DECLARE block to keep the commonly adjusted query parameters in one place at the start of the query, making them easy to set.
DECLARE @NamePat nvarchar(1000) = '%', @SQLPat nvarchar(1000) = '%', @DecommentedOnly bit = 0, @MaxIter int = 32767;
@NamePat: Matches the 2-part name for the object (e.g. to match
v_Foo*, either use
'dbo.v_Foo%', or to be precise
'dbo.v[_]Foo%'). Leave as
'%'to skip filtering by object name.
@SQLPat: Matches the SQL. For instance, to search for
'%tbl_Foo%'(or, to be precise,
'%tbl[_]Foo%'). By default, this will return objects that match in the commented SQL, whether or not the decommented SQL matches, with the
DecommentedHasMatchcolumn indicating whether a match was found in the decommented SQL.
@DecommentedOnly: Switch from
1to return only objects that match in the decommented SQL.
@MaxIter: Controls how many iterations to perform in the recursive query when decommenting. The
Itercolumn in the output indicates how many iterations were required to decomment the code. Setting this to a lower value will improve performance for lengthy objects that have lots of comments, string literals, and delimited (both
[) identifiers, but at the expense of leaving comments in the decommented code once the limit is reached.
Next is the
SourceSQL Common Table Element (CTE), which handles retrieving module definitions and their two-part object names. It specifies name filtering using
@NamePat and filtering of the commented SQL using
@SQLPat. Note that there is a subtle bug here caused by matching the commented SQL prior to decommenting. Under some circumstances,
@SQLPat can match a string that shows up only in the decommented SQL. For instance, the pattern
'%Long_Column_Name AS ShtColNm%' wouldn't match the commented SQL
'SELECT Long_Column_Name/**/AS ShtColNm', but would match it after the decommenter replaces
/**/ with a single space. The query could defer all
@SQLPat filtering until after all of
sys.sql_modules has been decommented, but that would have a substantial negative impact on performance. The query could be further complicated to detect patterns that might suffer from this issue (hint - they all have one or more of the
Replacement strings in them) and only defer matching in that scenario, coupled of course with
OPTION (RECOMPILE). I think I'll live with the subtle bug and just avoid triggering it!
SourceSQL AS ( SELECT O.[object_id] AS ID, C1.ObjName, SM.[definition] AS OrigSQL FROM sys.sql_modules AS SM JOIN sys.objects AS O ON SM.[object_id] = O.[object_id] JOIN sys.schemas AS S ON O.[schema_id] = S.[schema_id] CROSS APPLY ( SELECT S.[name] + N'.' + O.[name] AS ObjName ) AS C1 WHERE C1.ObjName LIKE @NamePat AND SM.[definition] LIKE @SQLPat ),
This is straight forward, except for the use of
CROSS APPLY to avoid redundant calculation of
ObjName. It has negligible performance impact, but I dislike repeating myself in code.
Matchers CTEs are identical to those in
Decommenter_Test.sql. There are some columns left out of the
SELECT clauses of the
Decomment CTE. As explained in the previous article, only the columns up through and including
RemSQL are required for the
Decomment CTE to function. All of the subsequent columns are only included in
Decomment_Test.sql to facilitate debugging, and so they are omitted in
SQL_Search_Decommenter.sql in order to reduce the size of the Worktable and improve performance. Finally, the one reference in the
Decomment CTE to
Old.MaxIter is replaced with
@MaxIter (since there is no need to control the iteration limit on a per-test basis).
An additional CTE,
Decommented, is added near the end to select only the final iteration for each anchor row, to use
ID to look up the
Definition columns from the
SourceSQL CTE, and to provide a clean interface for accessing the output from
Decommented AS ( SELECT D.ID, S.ObjName, CASE WHEN D.NewSQL LIKE @SQLPat THEN 1 ELSE 0 END AS DecommentedHasMatch, S.OrigSQL AS [Definition], NewSQL AS Decommented, Iter FROM Decomment AS D JOIN SourceSQL AS S ON D.ID = S.ID WHERE D.IsFinished = 1 )
Finally, there is the actual
SELECT clause that generates the result set:
SELECT ObjName, DecommentedHasMatch, CASE WHEN DATALENGTH([Definition]) <= 65535 * 2 THEN [Definition] ELSE NULL END AS [Definition], CASE WHEN DATALENGTH([Definition]) <= 65535 * 2 THEN Decommented ELSE NULL END AS Decommented, CASE WHEN DATALENGTH([Definition]) > 65535 * 2 THEN CAST(N'<?def --' + NCHAR(13)+NCHAR(10) + [Definition] + NCHAR(13)+NCHAR(10) + N'--?>' AS xml) ELSE NULL END AS Definition_in_XML, CASE WHEN DATALENGTH([Definition]) > 65535 * 2 THEN CAST(N'<?def --' + NCHAR(13)+NCHAR(10) + Decommented + NCHAR(13)+NCHAR(10) + N'--?>' AS xml) ELSE NULL END AS Decommented_in_XML, Iter FROM Decommented WHERE (@DecommentedOnly = 0 OR DecommentedHasMatch = 1) ORDER BY ObjName OPTION (MAXRECURSION 32767);
- Because SSMS won't retrieve more than 65,535 characters from non-XML text blocks, if the
DATALENGTH(which counts trailing spaces) for
[Definition]is greater than twice 65,535 (
nvarcharvalues have two bytes per character), then
Decommentedare both output as
NULLand the SQL is embedded in XML columns (since SSMS can retrieve XML of unlimited size).
- The filter on
DecommentedHasMatchis controlled by
MAXRECURSIONis set to
@MaxIterwill limit the recursion.
This completes the discussion of the T-SQL Decommenter. I hope that you find the code useful and that you learned something along the way!