Technical Article

Comparing Stored Procedures, Part 6

,

This picks up from an earlier article on trying to quantify the comparison of two stored procedures.

 

For a more complete explanation, please go to my blog entry here. This excerpt from my blog helps to explain what this does:

"Initially using an iterative approach to this problem made a lot of sense, since we need to control the order of the subsequences checked (longer before shorter). There didn't seem to be an obvious way to do this with a CTE. The real power of the CTE comes from its feature of recursion - the ability to have a set of data in the CTE refer to itself (please go here for an excellent article on this). I developed a hybrid approach that would combine iterative and recursive code. The recursive CTE in it will select off the longest matching subsequence between the two sequences it can find whose values are all unmatched, save it in a temp table, and then update the two sequences to show the matching values. It would then continue doing this until no more matching subsequences can be found."

Written by Jesse McLain

jesse@jessemclain.com

www.jessemclain.com

http://jessesql.blogspot.com

/*************************************************************************************************
** Proc: "spd_SequenceCompare_Version3"
** Desc: This stored procedure will compare the items in two sequences. The sequences should
** be stored in two tables, Seq1 and Seq2, which should have the following structure
** (the data type for CodeLineTxt can be anything, but the others should not change):
** 
** CREATE TABLE Seq1 (
** CodeLineTxt varchar(max), /* stores the original line of code */** CodeLineNum int not null identity(1,1), /* stores the line number */** MatchLineNum int /* stores the matching line of code from spd #2 */** )
** 
** This spd will compare smaller and smaller subsequences in the tables, and when it
** finds a match, it will note it in the column MatchLineNum. 
** 
** 
** 
** Return values: none
** 
** Called by: 
** 
** Parameters:
** Input
** ----------
** none
**
** Output
** -----------
** none
**
** Auth: Jesse McLain
** Email: jesse@jessemclain.com
** Web: www.jessemclain.com
** Blog: http://jessesql.blogspot.com/2009/02/comparing-stored-procedures-part-6.html
**
** Date: 02/16/2008
**
***************************************************************************************************
** Change History
***************************************************************************************************
** Date:    Author:      Description:
** -------- --------     -------------------------------------------
** 20080216 Jesse McLain Created script
** 20080223 Jesse McLain Added CTE code to better determine @SubSeqLen value
** 20080225 Jesse McLain Rewrote sequence matching to use recursive CTE code 
** 20080226 Jesse McLain Added MAXRECURSION option to recursive CTE code.
**                       Chgd UPDATEs to set MatchLineNum to CodeLineNum of first 
**                       matching value in other sequence (to show matching blocks)
**
***************************************************************************************************
** Notes
***************************************************************************************************
** 2/25/09 - This excellent article helped with this development:
** http://www.sqlservercentral.com/articles/Development/recursivequeriesinsqlserver2005/1760/
**************************************************************************************************/
CREATE PROCEDURE spd_SequenceCompare_Version3 AS

SET NOCOUNT ON

/* variable @SubSeqLen will store the length of the most recent subsequence matched in the loop */DECLARE @SubSeqLen int

/* set @SubSeqLen to a "dummy" value to ensure that we enter loop the first time */SET @SubSeqLen = -1

WHILE @SubSeqLen <> 0
BEGIN

  /* In order to match the subsequences from longest first to shortest last, we're going
  to wrap this CTE in a loop. The CTE will pull off the first subsequence it finds,
  ordered by length and start position. The reason why we have to pull off one at a 
  time is because we might have duplicate matches; let's say that we're matching the
  strings (sequences of letters) "ABCD" and "ABXYAB". Obviously the longest matching
  substrings are "AB", of length 2. If we pulled off all matching substrings of length 2,
  then we would get two matches of "AB" to "AB". We only want one, and we arbitrarily 
  take the first one found. The CTE works such that it will omit any values found to 
  be a part of a prior match, and this prevents the 2nd "AB" in "ABXYAB" from falsely 
  matching to anything. */
  ;WITH Matches (MatchingValue, Seq1Start, Seq1End, Seq2Start, Seq2End, SeqLen) AS (
    SELECT
      Seq1.CodeLineTxt
      ,Seq1.CodeLineNum
      ,Seq1.CodeLineNum
      ,Seq2.CodeLineNum
      ,Seq2.CodeLineNum
      ,1
    FROM Seq1
    JOIN Seq2
      ON Seq1.CodeLineTxt = Seq2.CodeLineTxt
    WHERE Seq1.MatchLineNum = 0
      AND Seq2.MatchLineNum = 0
    UNION ALL
    SELECT
      Seq1.CodeLineTxt
      ,Matches.Seq1Start
      ,Seq1.CodeLineNum
      ,Matches.Seq2Start
      ,Seq2.CodeLineNum
      ,Matches.SeqLen + 1
    FROM Seq1
    JOIN Seq2
      ON Seq1.CodeLineTxt = Seq2.CodeLineTxt
    JOIN Matches
      ON Seq1.CodeLineNum = Matches.Seq1End + 1
      AND Seq2.CodeLineNum = Matches.Seq2End + 1
    WHERE Seq1.MatchLineNum = 0
      AND Seq2.MatchLineNum = 0
  )
  SELECT TOP 1 * 
  INTO #Matched 
  FROM Matches 
  ORDER BY SeqLen DESC, Seq1Start, Seq2Start
  OPTION (MAXRECURSION 0)


   /* after we process the last subsequence, table #Matched will be empty, and
   so we assign @SubSeqLen to 0 to indicate this, and our loop will terminate */   SET @SubSeqLen = ISNULL((SELECT SeqLen FROM #Matched), 0)


   /* update the sequences to note any matches - if we're out of matches, this
   will do nothing, and we'll proceed out of the loop */   UPDATE S
   SET MatchLineNum = M.Seq2Start
   FROM Seq1 S
   JOIN #Matched M
     ON S.CodeLineNum BETWEEN M.Seq1Start AND M.Seq1End


   UPDATE S
   SET MatchLineNum = M.Seq1Start
   FROM Seq2 S
   JOIN #Matched M
     ON S.CodeLineNum BETWEEN M.Seq2Start AND M.Seq2End


   DROP TABLE #Matched 
END

RETURN

Rate

3.5 (2)

You rated this post out of 5. Change rating

Share

Share

Rate

3.5 (2)

You rated this post out of 5. Change rating