Compare two varchar arrays one character at time until different

Question

Post reply

Compare two varchar arrays one character at time until different

Robert Szymanski-440859

SSC Eights!

Points: 957
More actions
June 28, 2007 at 1:20 pm

#117587

I need to compare two varchar character arrays one character at a time until they differ using T-SQL, can someone help?
For example given the two varchar arrays:
DECLARE @String1 VARCHAR(100)
SET @String1 = 'Smith, Alex'
DECLARE @String2 VARCHAR(100)
SET @String2 = 'Smith, Alice'
Compare them one character at a time starting from the left, until they are different.
When a difference is detected, produce the following output
String1 = 'Smith, Ale'
String2 = 'Smith, Ali'
Thanks!
Regards,
Bob Szymanski
Blue Mule Software

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply

Robert Szymanski-440859 SSC Eights! Points: 957 More actions · Answer 1

Call off the Dogs!!! Never mind. I figured it out myself.

DECLARE @String1 VARCHAR(100)

SET @String1 = 'Smith, Alex'

DECLARE @String2 VARCHAR(100)

SET @String2 = 'Smith, Alice'

-- Compare arrays one character at a time starting at the left until different.

-- When different produce the following results:

-- String1 = 'Smith, Ale'

-- String2 = 'Smith, Ali'

DECLARE @i INT

SET @i = 1

while @i <= LEN(@String1) begin

IF (LEFT(@String1, @i) LEFT(@String2, @i))

BEGIN

SET @String1 = SUBSTRING(@String1, 1, @i + 1)

SET @String2 = SUBSTRING(@String2, 1, @i + 1)

BREAK

END

-- increment the row counter

SET @i = @i + 1

END

PRINT @String1 + '-' + @String2

Regards,
Bob Szymanski
Blue Mule Software

Kory Becker Hall of Fame Points: 3585 More actions · Answer 2

while @i <= LEN(@String1) begin

This is executed every pass so set the length once and evaluate the length variable for performance.

Set @len = Len(@string1)

While @i <= @len etc....

Small things add up!

Jeff Moden SSC Guru Points: 1004749 More actions · Answer 3

Very nicely done, Robert... but if you need to do this for a whole table, you'll either need to make a function, or some form of double nested While loop... some folks would even resort to using a (yeech....haaaaackk!) Cursor (Patooooiiii!)

... or ...

... we can use the set-based magic of a "Tally" table to do a whole table at once. Now, this assumes that you have some sort of Primary Key on the table and most tables should even if it's a "surrogate" key instead of a "natural" key.

We need some data to test with, so let's create, oh, say, 10,000 rows of identical data with some really large names to check... look for the "8" near the end of string 2 which is the only difference between String1 and String2...

--===== Create a table to hold the sample test data in to simulate a "real" problem
     -- and populate it with test data, on the fly...
 SELECT TOP 10000
        RowNum = IDENTITY(INT,1,1),
        String1 = 'Smith, Alexabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz',
        String2 = 'Smith, Alexabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrs8uvwxyz'
   INTO jbmCompareDemo
   FROM Master.dbo.SysObjects sc1,
        Master.dbo.SysObjects sc2

--===== Add a Primary Key to the test table
  ALTER TABLE jbmCompareDemo
    ADD PRIMARY KEY CLUSTERED (RowNum)

Heh... ya just gotta love the power of a nice restricted cross-join for making stuff in a hurry, huh?

Speaking of cross-joins, let's make a nice little "Tally" or "Numbers" table... all it is is a very well indexed table of consecutive numbers from 1 to some number (I use 11,000). It has lots and lots of interesting uses and is as fast or faster (usually) than a While loop, so you might want to make it a permanent part of your programming arsenal. The real key is how easy things become when you use it. Here's how to make one...

--===== Create and populate the Tally table on the fly

SELECT TOP 11000 --equates to more than 30 years of dates by day

IDENTITY(INT,1,1) AS N

INTO dbo.Tally

FROM Master.dbo.SysColumns sc1,

Master.dbo.SysColumns sc2

--===== Add a Primary Key to maximize performance

ALTER TABLE dbo.Tally

ADD CONSTRAINT PK_Tally_N PRIMARY KEY CLUSTERED (N)

--===== Allow the general public to use it

GRANT SELECT ON dbo.Tally TO PUBLIC

--===== Display the content of the Tally table

SELECT * FROM dbo.Tally

You with me so far? We have a Tally table containing a bunch of sequential numbers AND we have a test table (jbmCompareDemo) containing 10,000 rows of predictable data to test with. Hmmm... wonder what would happen if we cross-joined those tables (with restrictions) using substrings in the joins... would it replace a loop for the whole table? You bet it would...

--===== Finds first character position that doesn't match

-- between the String1 and String2 columns of the test table

SELECT RowNum, MIN(N) AS N

FROM dbo.Tally WITH (NOLOCK),

jbmCompareDemo

WHERE N <= LEN(String1)

AND N <= LEN(String2)

AND SUBSTRING(String1,N,1) <> SUBSTRING(String2,N,1)

GROUP BY RowNum

So far, so good, eh? But you wanted to show the left matching part of both strings up to and including the first character that doesn't match... let's use the above query to drive all of that as a "derived table" (one of my other SQL "best friends")...

--===== Show the left matching part of both strings up to and

-- including the first character that doesn't match

SELECT LEFT(t.String1,d.N)+'-'+LEFT(t.String2,d.N)

FROM jbmCompareDemo t,

(--==== Derived table "d" finds first character position that doesn't match

-- between the String1 and String2 columns of the test table

SELECT RowNum, MIN(N) AS N

FROM dbo.Tally WITH (NOLOCK),

jbmCompareDemo

WHERE N <= LEN(String1)

AND N <= LEN(String2)

AND SUBSTRING(String1,N,1) <> SUBSTRING(String2,N,1)

GROUP BY RowNum

) d

WHERE t.RowNum = d.RowNum

Pretty cool, huh? And look at how simple the code appears! Study the code to figure out what's going on (cross-join between object table and Tally table using Substring at character level) and lemme know if you have any questions...

p.s. SQL Server 2005 has a nifty feature (RowNum) that can be used to replace the Tally table... I don't have SQL Server 2005 to test on, so I won't demo the feature. But I'll just bet someone else could...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Robert Szymanski-440859 SSC Eights! Points: 957 More actions · Answer 4

Hi Jeff, Whoa - you go dude!! That's pretty intense.

It is my intention to create a function for this in addition to some additional code.

I prefer not to use cursors as I learned to avoid them if possible.

The table tally example you show is interesting. I would be interested in seeing a version of my original question using the row_number() function you mention (I am using SQL Server 2005).

Again, thanks for your help.

Regards,
Bob Szymanski
Blue Mule Software

Jeff Moden SSC Guru Points: 1004749 More actions · Answer 5

Ok, a function it is... still using the Tally table 'cause I don't have 2K5 to test the code with...

CREATE FUNCTION dbo.fnCharCompare

/******************************************************************************

Purpose:

Compare the two input strings and return the left matching part of both

strings (separated by a single dash) up to and including the first character

that doesn't match.

******************************************************************************/

--===== Delare the input parameters

(

@pString1 CHAR(8000),

@pString2 CHAR(8000)

)

RETURNS VARCHAR(8000)

AS

BEGIN

--===== Return the left matching part of both strings up to and

-- including the first character that doesn't match

RETURN (SELECT LEFT(@pString1,MIN(N))

+ '-'

+ LEFT(@pString2,MIN(N))

FROM dbo.Tally WITH (NOLOCK)

WHERE SUBSTRING(@pString1,N,1) <> SUBSTRING(@pString2,N,1))

END

Wonder if you might answer a question for me, Robert... what are you going to use this for? Haven't seen this particular request (problem) before...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004749 More actions · Answer 6

Ah crud... I've got a fly in the ointment... didn't test the majority of possibilities... I'll be back...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004749 More actions · Answer 7

Sorry about that... the other code had a problem when the two strings were different lengths (doh!)... I fixed the function code above...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Robert Szymanski-440859 SSC Eights! Points: 957 More actions · Answer 8

Hi Jeff, thanks for your interest. Ok, you asked for it so here goes...

Some background info first. We have an ASP.NET Web Application that serves up a list of names in a dynamic tree control. The nodes of the tree are generated on demand and loaded into the tree dynamically. When this list of names become very big (and indeed it will be), the delay takes too long. So to minimize the delay we will group the names into dynamically created folders. Each folder will contain n number of names based on some passed in limit (In my code below this is the variable @NthRow). This way we can mimimize the number of nodes to fetch yet still support a large number of names.

The final output of our function must be a table of strings in the form of X - Y where X is the starting name and Y is the ending name of the group.

For example, given the following names and @NthRow = 5

Baker, Ms. Jamie

Espinosa, Ms. Jean

Gonzales, Mr. Robert

Holland, Ms. Julia

Jimenez, Ms. Soni

Macdonald, Mr. Mickey

Miller, Ms. Jana

Noriega, Ms. Michelle

Owen, Ms. Alma

Ramirez, Ms. Stephanie

Salas, Ms. Jeanine

Sipes, Mr. Tyler

Tamayo, Ms. Laura

Timmer, Ms. Julie

Trevino, Ms. Yvonne

Young, Ms. Annia

Produce an output table of:

Ba-Ji

Ma-Ra

Sa-Tr

Y-Y

Where:

Ba-Ji represents the names Baker, Ms. Jamie thru Jimenez, Ms. Soni

Ma-Ra represents the names Macdonald, Mr. Mickey thru Ramirez, Ms. Stephanie

Sa-Tr represents the names Salas, Ms. Jeanine thru Trevino, Ms. Yvonne

Y-Y represents the name Young, Ms. Annia

Got it? Now for the code that I have so far, which works mind you, but may not be very efficient and/or elegant as I'm a T-SQL newbie. As you can probably tell from my style, I am a developer. This style may or may not lead to very efficient code in T-SQL. Any hoot, please feel free to comment on how I can better follow DBA "best practices" where you see fit.

The code itself is not in a function yet as I wanted to nail down the agloritm first. So I was just working of a script file. below is the content of what I have thus far.

I bet you're sorry you asked.

DECLARE @RowId INT

DECLARE @maxRowId INT

DECLARE @StartName VARCHAR(8000)

DECLARE @EndName VARCHAR(8000)

DECLARE @LastFolderNodeStartNameID INT

DECLARE @i INT

DECLARE @len INT

DECLARE @NodeText VARCHAR(8000)

DECLARE @FullNodeTextCount INT

DECLARE @NthRow INT

-- Temp table variable to store raw query results with an ID

DECLARE @Raw TABLE

(

ID int IDENTITY(1,1),

RawText VARCHAR(8000)

)

-- Temp table variable to store long versions of both the starting text and the ending text

DECLARE @FullNodeText TABLE

(

ID int IDENTITY(1,1),

StartText VARCHAR(8000),

EndText VARCHAR(8000)

)

-- Temp table variable to store results

DECLARE @Results TABLE

(

ID INT,

NodeText VARCHAR(8000)

)

-- Initialize tables and variables

SET @NthRow = 100 -- This controls how many nodes will be represented by each folder node. Later this will be passed in as an argument to the function

-- Populate the Raw table

INSERT INTO @Raw (RawText)

SELECT RTRIM(LastName) + ', ' + RTRIM(Title) + ' ' + RTRIM(FirstName) + ' ' + RTRIM(MiddleName) + ' ' + RTRIM(Suffix) AS RawText

FROM KeyHolders

WHERE KeyHolderID > 0

ORDER BY RawText

-- Populate the FullNodeText table with with every nth and nth + 1 raw text

INSERT INTO @FullNodeText(StartText,EndText)

SELECT R1.RawText, R2.RawText

FROM @Raw R1

INNER JOIN @Raw R2 ON R1.ID + @NthRow - 1 = R2.ID

WHERE R1.ID % @NthRow-1 = 0

-- How many text nodes were found?

SELECT @MaxRowId = MAX(ID) FROM @Raw

-- Were there any remainders?

IF @MaxRowId ((SELECT COUNT(*) FROM @FullNodeText) * @NthRow )

BEGIN

-- In the case we have a remainder, we still need to add the last folder node start and end names to our final results.

-- This occurs when there are remaining node(s) after extracting the nth and nth + 1 row (previously).

-- To do this simply calculate the IDs and insert them into the FullNodeText table.

SET @LastFolderNodeStartNameID = (SELECT COUNT(*) FROM @FullNodeText) * @NthRow + 1

SET @StartName = (SELECT RawText FROM @raw WHERE id = @LastFolderNodeStartNameID)

SET @EndName = (SELECT RawText FROM @raw WHERE id = @MaxRowId)

INSERT INTO @FullNodeText(StartText,EndText) VALUES(@StartName, @EndName)

END

-- Determine the start and end row IDs (@FullNodeText.ID is an Identity column, auto-incrementing starting at 1)

SELECT @RowId = MIN(ID), @MaxRowId = MAX(ID) FROM @FullNodeText

SET @StartName = ''

SET @EndName = ''

-- Iterate through the FullNodeText table one row at a time.

-- For each row, compare the Start text with the end text one character at a time starting

-- from the left until a difference is detected. When the difference is found insert the

-- strings into another temp table which is our final function results

WHILE @RowId <= @MaxRowId BEGIN

DECLARE @StartText VARCHAR(8000)

DECLARE @EndText VARCHAR(8000)

-- Fetch the values from the @FullNodeText table one row at a time

SELECT @StartName = StartText, @EndName = EndText

FROM @FullNodeText

WHERE ID = @RowId

-- It is possible to have a single remaining row in which case the

-- start name would

IF @StartName = @EndName

BEGIN

SET @StartText = SUBSTRING(@StartName, 1, 1)

SET @EndText = SUBSTRING(@EndName, 1, 1)

INSERT INTO @Results (ID, NodeText) VALUES(-1, @StartText + '-' + @EndText)

END

ELSE

BEGIN

-- compare on character at a time starting at the left until different.

SET @i = 1

SET @len = Len(@EndName)

WHILE @i <= @Len BEGIN

IF (UPPER(LEFT(@StartName, @i))) UPPER((LEFT(@EndName, @i)))

BEGIN

SET @StartText = SUBSTRING(@StartName, 1, @i + 1)

SET @EndText = SUBSTRING(@EndName, 1, @i + 1)

SET @NodeText = @StartText + '-' + @EndText

INSERT INTO @Results (ID, NodeText) VALUES(-1, @NodeText)

BREAK

END

SET @i = @i + 1

END

PRINT @NodeText

-- increment loop counter

SET @RowId = @RowId + 1

END

SELECT * FROM @Results

Regards,
Bob Szymanski
Blue Mule Software

Jeff Moden SSC Guru Points: 1004749 More actions · Answer 9

There are a number of ways to returns trees in a high speed fashion in SQL Server 2000.... are you simply having performance problems or are you just concerned with the number of items in the tree.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Robert Szymanski-440859 SSC Eights! Points: 957 More actions · Answer 10

Hi Jeff, I'm aware of many tree schema/strategies. ASP.NET 2.0 and SQL Server 2005 have some very interesting approaches to implementing trees and we have research these in detail. Due to the nature of our current project, we were unable to employ such strategies. Regardless, in my opinion, either would still have issues when presenting in excess of 10,000 dynamically generated nodes. With that many nodes it turns into more of a client side issue rather than server. Besides, changing the strategy at this point is not an option.

Regards,
Bob Szymanski
Blue Mule Software

SwePeso SSC-Dedicated Points: 39758 More actions · Answer 11

Here is a nice function that works for up to 8 characters!

CREATE

FUNCTION dbo.fnFirstDifference

(

@FirstString VARCHAR(8),

@SecondString VARCHAR(8)

)

RETURNS VARCHAR(17)

AS

BEGIN

DECLARE @var VARBINARY(8),

@pos TINYINT

SELECT @var = CAST(CAST(@FirstString AS BINARY(8)) AS BIGINT) ^ CAST(CAST(@SecondString AS BINARY(8)) AS BIGINT),

@pos = PATINDEX('%[^0]%', SUBSTRING(master.dbo.fn_sqlvarbasetostr( @var), 3, 16)),

@pos = (1 + @pos) / 2

RETURN LEFT(@FirstString, @pos) + '-' + LEFT(@SecondString, @pos)

END

N 56°04'39.16"
E 12°55'05.25"

Jeff Moden SSC Guru Points: 1004749 More actions · Answer 12

So you can't limit the query by level and tree branch on a single click?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

steve dassin SSCrazy Points: 2396 More actions · Answer 13

Bob,

For my own purposes I have used your problem as the basis for a blog post. I have modified your original problem to pose a general question concerning the creation of intervals for character data. The solution is in the D4 relational language of Dataphor using sql server as the data repository. It is essentially a non-procedural solution so as to possibly interest t-sql users If anyone is interested (and you should be) here is the link:

http://beyondsql.blogspot.com/2007/07/dataphor-create-string-intervals.html

Thanks for the overall problem and if there are questions feel free to comment on the post. I hasten to add that I could have offered a solution using the RAC utility but I refrained. Should anyone (Jeff) like to see the solution in RAC please post

best,

http://www.beyondsql.blogspot.com

http://www.rac4sql.net

Jeff Moden SSC Guru Points: 1004749 More actions · Answer 14

Nope... I'm pretty well spammed out on RAC. How about posting the T-SQL solution that it's supposed to replace?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)