Comparing Stored Procedures, Part 6

Jesse-McLain, 2009-04-15 (first published: 2009-02-26)

This picks up from an earlier article on trying to quantify the comparison of two stored procedures.

For a more complete explanation, please go to my blog entry here. This excerpt from my blog helps to explain what this does:

"Initially using an iterative approach to this problem made a lot of sense, since we need to control the order of the subsequences checked (longer before shorter). There didn't seem to be an obvious way to do this with a CTE. The real power of the CTE comes from its feature of recursion - the ability to have a set of data in the CTE refer to itself (please go here for an excellent article on this). I developed a hybrid approach that would combine iterative and recursive code. The recursive CTE in it will select off the longest matching subsequence between the two sequences it can find whose values are all unmatched, save it in a temp table, and then update the two sequences to show the matching values. It would then continue doing this until no more matching subsequences can be found."

Written by Jesse McLain

jesse@jessemclain.com

www.jessemclain.com

http://jessesql.blogspot.com

/************************************************************************************************* ** Proc: "spd_SequenceCompare_Version3" ** Desc: This stored procedure will compare the items in two sequences. The sequences should ** be stored in two tables, Seq1 and Seq2, which should have the following structure ** (the data type for CodeLineTxt can be anything, but the others should not change): ** ** CREATE TABLE Seq1 ( ** CodeLineTxt varchar(max), /* stores the original line of code */** CodeLineNum int not null identity(1,1), /* stores the line number */** MatchLineNum int /* stores the matching line of code from spd #2 */** ) ** ** This spd will compare smaller and smaller subsequences in the tables, and when it ** finds a match, it will note it in the column MatchLineNum. ** ** ** ** Return values: none ** ** Called by: ** ** Parameters: ** Input ** ---------- ** none ** ** Output ** ----------- ** none ** ** Auth: Jesse McLain ** Email: jesse@jessemclain.com ** Web: www.jessemclain.com ** Blog: http://jessesql.blogspot.com/2009/02/comparing-stored-procedures-part-6.html ** ** Date: 02/16/2008 ** *************************************************************************************************** ** Change History *************************************************************************************************** ** Date: Author: Description: ** -------- -------- ------------------------------------------- ** 20080216 Jesse McLain Created script ** 20080223 Jesse McLain Added CTE code to better determine @SubSeqLen value ** 20080225 Jesse McLain Rewrote sequence matching to use recursive CTE code ** 20080226 Jesse McLain Added MAXRECURSION option to recursive CTE code. ** Chgd UPDATEs to set MatchLineNum to CodeLineNum of first ** matching value in other sequence (to show matching blocks) ** *************************************************************************************************** ** Notes *************************************************************************************************** ** 2/25/09 - This excellent article helped with this development: ** http://www.sqlservercentral.com/articles/Development/recursivequeriesinsqlserver2005/1760/ **************************************************************************************************/ CREATE PROCEDURE spd_SequenceCompare_Version3 AS SET NOCOUNT ON /* variable @SubSeqLen will store the length of the most recent subsequence matched in the loop */DECLARE @SubSeqLen int /* set @SubSeqLen to a "dummy" value to ensure that we enter loop the first time */SET @SubSeqLen = -1 WHILE @SubSeqLen <> 0 BEGIN /* In order to match the subsequences from longest first to shortest last, we're going to wrap this CTE in a loop. The CTE will pull off the first subsequence it finds, ordered by length and start position. The reason why we have to pull off one at a time is because we might have duplicate matches; let's say that we're matching the strings (sequences of letters) "ABCD" and "ABXYAB". Obviously the longest matching substrings are "AB", of length 2. If we pulled off all matching substrings of length 2, then we would get two matches of "AB" to "AB". We only want one, and we arbitrarily take the first one found. The CTE works such that it will omit any values found to be a part of a prior match, and this prevents the 2nd "AB" in "ABXYAB" from falsely matching to anything. */ ;WITH Matches (MatchingValue, Seq1Start, Seq1End, Seq2Start, Seq2End, SeqLen) AS ( SELECT Seq1.CodeLineTxt ,Seq1.CodeLineNum ,Seq1.CodeLineNum ,Seq2.CodeLineNum ,Seq2.CodeLineNum ,1 FROM Seq1 JOIN Seq2 ON Seq1.CodeLineTxt = Seq2.CodeLineTxt WHERE Seq1.MatchLineNum = 0 AND Seq2.MatchLineNum = 0 UNION ALL SELECT Seq1.CodeLineTxt ,Matches.Seq1Start ,Seq1.CodeLineNum ,Matches.Seq2Start ,Seq2.CodeLineNum ,Matches.SeqLen + 1 FROM Seq1 JOIN Seq2 ON Seq1.CodeLineTxt = Seq2.CodeLineTxt JOIN Matches ON Seq1.CodeLineNum = Matches.Seq1End + 1 AND Seq2.CodeLineNum = Matches.Seq2End + 1 WHERE Seq1.MatchLineNum = 0 AND Seq2.MatchLineNum = 0 ) SELECT TOP 1 * INTO #Matched FROM Matches ORDER BY SeqLen DESC, Seq1Start, Seq2Start OPTION (MAXRECURSION 0) /* after we process the last subsequence, table #Matched will be empty, and so we assign @SubSeqLen to 0 to indicate this, and our loop will terminate */ SET @SubSeqLen = ISNULL((SELECT SeqLen FROM #Matched), 0) /* update the sequences to note any matches - if we're out of matches, this will do nothing, and we'll proceed out of the loop */ UPDATE S SET MatchLineNum = M.Seq2Start FROM Seq1 S JOIN #Matched M ON S.CodeLineNum BETWEEN M.Seq1Start AND M.Seq1End UPDATE S SET MatchLineNum = M.Seq1Start FROM Seq2 S JOIN #Matched M ON S.CodeLineNum BETWEEN M.Seq2Start AND M.Seq2End DROP TABLE #Matched END RETURN

Introduction to Common Table Expressions

by Kathi Kellenberger

SQLServerCentral.com

T-SQL

Learn how to use common table expressions to simplify your code. Replace temp tables and correlated subqueries with this cool T-SQL feature.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

4.36 (59)

You rated this post out of 5. Change rating

2009-12-08

15,185 reads

Discuss

Power and Deception of CTEs

by Kev Riley

SQLServerCentral.com

T-SQL

A performance problem with a CTE based stored procedure, opened the developers eyes to a whole new world

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

3.57 (37)

You rated this post out of 5. Change rating

2008-12-01

9,161 reads

Discuss

Uncommon Table Expressions

by Peter Kierstead

SQLServerCentral.com

T-SQL

This article describes an alternate use of CTEs for functional data processing.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

4.37 (49)

You rated this post out of 5. Change rating

2008-08-11

9,384 reads

Discuss

Recursive Queries using Common Table Expressions (CTE) in SQL Server

by Additional Articles

MSSQLTips.com

T-SQL

The problem arises when the hierarchy level increases as SQL Server is limited to 32 levels of recursion. We need a better way to implement recursive queries in SQL Server 2005. How do we do it?

2008-07-02

5,894 reads