June 28, 2007 at 1:20 pm
I need to compare two varchar character arrays one character at a time until they differ using T-SQL, can someone help?
For example given the two varchar arrays:
DECLARE @String1 VARCHAR(100)
SET @String1 = 'Smith, Alex'
DECLARE @String2 VARCHAR(100)
SET @String2 = 'Smith, Alice'
Compare them one character at a time starting from the left, until they are different.
When a difference is detected, produce the following output
String1 = 'Smith, Ale'
String2 = 'Smith, Ali'
Thanks!
June 28, 2007 at 1:45 pm
Call off the Dogs!!! Never mind. I figured it out myself.
DECLARE @String1 VARCHAR(100)
SET @String1 = 'Smith, Alex'
DECLARE @String2 VARCHAR(100)
SET @String2 = 'Smith, Alice'
-- Compare arrays one character at a time starting at the left until different.
-- When different produce the following results:
-- String1 = 'Smith, Ale'
-- String2 = 'Smith, Ali'
DECLARE @i INT
SET @i = 1
while @i <= LEN(@String1) begin
IF (LEFT(@String1, @i) LEFT(@String2, @i))
BEGIN
SET @String1 = SUBSTRING(@String1, 1, @i + 1)
SET @String2 = SUBSTRING(@String2, 1, @i + 1)
BREAK
END
-- increment the row counter
SET @i = @i + 1
END
PRINT @String1 + '-' + @String2
June 28, 2007 at 7:13 pm
while @i <= LEN(@String1) begin
This is executed every pass so set the length once and evaluate the length variable for performance.
Set @len = Len(@string1)
While @i <= @len etc....
Small things add up!
June 28, 2007 at 9:09 pm
Very nicely done, Robert... but if you need to do this for a whole table, you'll either need to make a function, or some form of double nested While loop... some folks would even resort to using a (yeech....haaaaackk!) Cursor  (Patooooiiii!)
 (Patooooiiii!) 
... or ...
... we can use the set-based magic of a "Tally" table to do a whole table at once. Now, this assumes that you have some sort of Primary Key on the table and most tables should even if it's a "surrogate" key instead of a "natural" key.
We need some data to test with, so let's create, oh, say, 10,000 rows of identical data with some really large names to check... look for the "8" near the end of string 2 which is the only difference between String1 and String2...
--===== Create a table to hold the sample test data in to simulate a "real" problem -- and populate it with test data, on the fly... SELECT TOP 10000 RowNum = IDENTITY(INT,1,1), String1 = 'Smith, Alexabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz', String2 = 'Smith, Alexabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrs8uvwxyz' INTO jbmCompareDemo FROM Master.dbo.SysObjects sc1, Master.dbo.SysObjects sc2
--===== Add a Primary Key to the test table ALTER TABLE jbmCompareDemo ADD PRIMARY KEY CLUSTERED (RowNum)
Heh... ya just gotta love the power of a nice restricted cross-join for making stuff in a hurry, huh? 
Speaking of cross-joins, let's make a nice little "Tally" or "Numbers" table... all it is is a very well indexed table of consecutive numbers from 1 to some number (I use 11,000). It has lots and lots of interesting uses and is as fast or faster (usually) than a While loop, so you might want to make it a permanent part of your programming arsenal. The real key is how easy things become when you use it. Here's how to make one...
--===== Create and populate the Tally table on the fly
SELECT TOP 11000 --equates to more than 30 years of dates by day
IDENTITY(INT,1,1) AS N
INTO dbo.Tally
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2
--===== Add a Primary Key to maximize performance
ALTER TABLE dbo.Tally
ADD CONSTRAINT PK_Tally_N PRIMARY KEY CLUSTERED (N)
--===== Allow the general public to use it
GRANT SELECT ON dbo.Tally TO PUBLIC
--===== Display the content of the Tally table
SELECT * FROM dbo.Tally
You with me so far? We have a Tally table containing a bunch of sequential numbers AND we have a test table (jbmCompareDemo) containing 10,000 rows of predictable data to test with. Hmmm... wonder what would happen if we cross-joined those tables (with restrictions) using substrings in the joins... would it replace a loop for the whole table? You bet it would...
--===== Finds first character position that doesn't match
-- between the String1 and String2 columns of the test table
SELECT RowNum, MIN(N) AS N
FROM dbo.Tally WITH (NOLOCK),
jbmCompareDemo
WHERE N <= LEN(String1)
AND N <= LEN(String2)
AND SUBSTRING(String1,N,1) <> SUBSTRING(String2,N,1)
GROUP BY RowNum
So far, so good, eh? But you wanted to show the left matching part of both strings up to and including the first character that doesn't match... let's use the above query to drive all of that as a "derived table" (one of my other SQL "best friends")...
--===== Show the left matching part of both strings up to and
-- including the first character that doesn't match
SELECT LEFT(t.String1,d.N)+'-'+LEFT(t.String2,d.N)
FROM jbmCompareDemo t,
(--==== Derived table "d" finds first character position that doesn't match
-- between the String1 and String2 columns of the test table
SELECT RowNum, MIN(N) AS N
FROM dbo.Tally WITH (NOLOCK),
jbmCompareDemo
WHERE N <= LEN(String1)
AND N <= LEN(String2)
AND SUBSTRING(String1,N,1) <> SUBSTRING(String2,N,1)
GROUP BY RowNum
) d
WHERE t.RowNum = d.RowNum
Pretty cool, huh? And look at how simple the code appears! Study the code to figure out what's going on (cross-join between object table and Tally table using Substring at character level) and lemme know if you have any questions...
p.s. SQL Server 2005 has a nifty feature (RowNum) that can be used to replace the Tally table... I don't have SQL Server 2005 to test on, so I won't demo the feature. But I'll just bet someone else could...
--Jeff Moden
Change is inevitable... Change for the better is not.
June 29, 2007 at 9:10 am
Hi Jeff, Whoa - you go dude!! That's pretty intense.
It is my intention to create a function for this in addition to some additional code.
I prefer not to use cursors as I learned to avoid them if possible.
The table tally example you show is interesting. I would be interested in seeing a version of my original question using the row_number() function you mention (I am using SQL Server 2005).
Again, thanks for your help.
June 29, 2007 at 8:51 pm
Ok, a function it is... still using the Tally table 'cause I don't have 2K5 to test the code with...
CREATE FUNCTION dbo.fnCharCompare
/******************************************************************************
Purpose:
Compare the two input strings and return the left matching part of both
strings (separated by a single dash) up to and including the first character
that doesn't match.
******************************************************************************/
--===== Delare the input parameters
(
@pString1 CHAR(8000),
@pString2 CHAR(8000)
)
RETURNS VARCHAR(8000)
AS
BEGIN
--===== Return the left matching part of both strings up to and
-- including the first character that doesn't match
RETURN (SELECT LEFT(@pString1,MIN(N))
+ '-'
+ LEFT(@pString2,MIN(N))
FROM dbo.Tally WITH (NOLOCK)
WHERE SUBSTRING(@pString1,N,1) <> SUBSTRING(@pString2,N,1))
END
Wonder if you might answer a question for me, Robert... what are you going to use this for? Haven't seen this particular request (problem) before...
--Jeff Moden
Change is inevitable... Change for the better is not.
June 29, 2007 at 8:56 pm
Ah crud... I've got a fly in the ointment... didn't test the majority of possibilities... I'll be back...
--Jeff Moden
Change is inevitable... Change for the better is not.
June 29, 2007 at 9:09 pm
Sorry about that... the other code had a problem when the two strings were different lengths (doh!)... I fixed the function code above...
--Jeff Moden
Change is inevitable... Change for the better is not.
June 30, 2007 at 9:46 am
Hi Jeff, thanks for your interest. Ok, you asked for it so here goes...
Some background info first. We have an ASP.NET Web Application that serves up a list of names in a dynamic tree control. The nodes of the tree are generated on demand and loaded into the tree dynamically. When this list of names become very big (and indeed it will be), the delay takes too long. So to minimize the delay we will group the names into dynamically created folders. Each folder will contain n number of names based on some passed in limit (In my code below this is the variable @NthRow). This way we can mimimize the number of nodes to fetch yet still support a large number of names.
The final output of our function must be a table of strings in the form of X - Y where X is the starting name and Y is the ending name of the group.
For example, given the following names and @NthRow = 5
Baker, Ms. Jamie
Espinosa, Ms. Jean
Gonzales, Mr. Robert
Holland, Ms. Julia
Jimenez, Ms. Soni
Macdonald, Mr. Mickey
Miller, Ms. Jana
Noriega, Ms. Michelle
Owen, Ms. Alma
Ramirez, Ms. Stephanie
Salas, Ms. Jeanine
Sipes, Mr. Tyler
Tamayo, Ms. Laura
Timmer, Ms. Julie
Trevino, Ms. Yvonne
Young, Ms. Annia
Produce an output table of:
Ba-Ji
Ma-Ra
Sa-Tr
Y-Y
Where:
Ba-Ji represents the names Baker, Ms. Jamie thru Jimenez, Ms. Soni
Ma-Ra represents the names Macdonald, Mr. Mickey thru Ramirez, Ms. Stephanie
Sa-Tr represents the names Salas, Ms. Jeanine thru Trevino, Ms. Yvonne
Y-Y represents the name Young, Ms. Annia
Got it? Now for the code that I have so far, which works mind you, but may not be very efficient and/or elegant as I'm a T-SQL newbie. As you can probably tell from my style, I am a developer. This style may or may not lead to very efficient code in T-SQL. Any hoot, please feel free to comment on how I can better follow DBA "best practices" where you see fit.
The code itself is not in a function yet as I wanted to nail down the agloritm first. So I was just working of a script file. below is the content of what I have thus far.
I bet you're sorry you asked.
DECLARE @RowId INT
DECLARE @maxRowId INT
DECLARE @StartName VARCHAR(8000)
DECLARE @EndName VARCHAR(8000)
DECLARE @LastFolderNodeStartNameID INT
DECLARE @i INT
DECLARE @len INT
DECLARE @NodeText VARCHAR(8000)
DECLARE @FullNodeTextCount INT
DECLARE @NthRow INT
-- Temp table variable to store raw query results with an ID
DECLARE @Raw TABLE
(
ID int IDENTITY(1,1),
RawText VARCHAR(8000)
)
-- Temp table variable to store long versions of both the starting text and the ending text
DECLARE @FullNodeText TABLE
(
ID int IDENTITY(1,1),
StartText VARCHAR(8000),
EndText VARCHAR(8000)
)
-- Temp table variable to store results
DECLARE @Results TABLE
(
ID INT,
NodeText VARCHAR(8000)
)
-- Initialize tables and variables
SET @NthRow = 100 -- This controls how many nodes will be represented by each folder node. Later this will be passed in as an argument to the function
-- Populate the Raw table
INSERT INTO @Raw (RawText)
SELECT RTRIM(LastName) + ', ' + RTRIM(Title) + ' ' + RTRIM(FirstName) + ' ' + RTRIM(MiddleName) + ' ' + RTRIM(Suffix) AS RawText
FROM KeyHolders
WHERE KeyHolderID > 0
ORDER BY RawText
-- Populate the FullNodeText table with with every nth and nth + 1 raw text
INSERT INTO @FullNodeText(StartText,EndText)
SELECT R1.RawText, R2.RawText
FROM @Raw R1
INNER JOIN @Raw R2 ON R1.ID + @NthRow - 1 = R2.ID
WHERE R1.ID % @NthRow-1 = 0
-- How many text nodes were found?
SELECT @MaxRowId = MAX(ID) FROM @Raw
-- Were there any remainders?
IF @MaxRowId ((SELECT COUNT(*) FROM @FullNodeText) * @NthRow )
BEGIN
-- In the case we have a remainder, we still need to add the last folder node start and end names to our final results.
-- This occurs when there are remaining node(s) after extracting the nth and nth + 1 row (previously).
-- To do this simply calculate the IDs and insert them into the FullNodeText table.
SET @LastFolderNodeStartNameID = (SELECT COUNT(*) FROM @FullNodeText) * @NthRow + 1
SET @StartName = (SELECT RawText FROM @raw WHERE id = @LastFolderNodeStartNameID)
SET @EndName = (SELECT RawText FROM @raw WHERE id = @MaxRowId)
INSERT INTO @FullNodeText(StartText,EndText) VALUES(@StartName, @EndName)
END
-- Determine the start and end row IDs (@FullNodeText.ID is an Identity column, auto-incrementing starting at 1)
SELECT @RowId = MIN(ID), @MaxRowId = MAX(ID) FROM @FullNodeText
SET @StartName = ''
SET @EndName = ''
-- Iterate through the FullNodeText table one row at a time.
-- For each row, compare the Start text with the end text one character at a time starting
-- from the left until a difference is detected. When the difference is found insert the
-- strings into another temp table which is our final function results
WHILE @RowId <= @MaxRowId BEGIN
DECLARE @StartText VARCHAR(8000)
DECLARE @EndText VARCHAR(8000)
-- Fetch the values from the @FullNodeText table one row at a time
SELECT @StartName = StartText, @EndName = EndText
FROM @FullNodeText
WHERE ID = @RowId
-- It is possible to have a single remaining row in which case the
-- start name would
IF @StartName = @EndName
BEGIN
SET @StartText = SUBSTRING(@StartName, 1, 1)
SET @EndText = SUBSTRING(@EndName, 1, 1)
INSERT INTO @Results (ID, NodeText) VALUES(-1, @StartText + '-' + @EndText)
END
ELSE
BEGIN
-- compare on character at a time starting at the left until different.
SET @i = 1
SET @len = Len(@EndName)
WHILE @i <= @Len BEGIN
IF (UPPER(LEFT(@StartName, @i))) UPPER((LEFT(@EndName, @i)))
BEGIN
SET @StartText = SUBSTRING(@StartName, 1, @i + 1)
SET @EndText = SUBSTRING(@EndName, 1, @i + 1)
SET @NodeText = @StartText + '-' + @EndText
INSERT INTO @Results (ID, NodeText) VALUES(-1, @NodeText)
BREAK
END
SET @i = @i + 1
END
END
PRINT @NodeText
-- increment loop counter
SET @RowId = @RowId + 1
END
SELECT * FROM @Results
July 1, 2007 at 11:46 pm
There are a number of ways to returns trees in a high speed fashion in SQL Server 2000.... are you simply having performance problems or are you just concerned with the number of items in the tree.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 2, 2007 at 8:34 am
Hi Jeff, I'm aware of many tree schema/strategies. ASP.NET 2.0 and SQL Server 2005 have some very interesting approaches to implementing trees and we have research these in detail. Due to the nature of our current project, we were unable to employ such strategies. Regardless, in my opinion, either would still have issues when presenting in excess of 10,000 dynamically generated nodes. With that many nodes it turns into more of a client side issue rather than server. Besides, changing the strategy at this point is not an option.
July 2, 2007 at 12:11 pm
Here is a nice function that works for up to 8 characters!
CREATE
FUNCTION dbo.fnFirstDifference
(
@FirstString VARCHAR(8),
@SecondString VARCHAR(8)
)
RETURNS VARCHAR(17)
AS
BEGIN
DECLARE @var VARBINARY(8),
@pos TINYINT
SELECT @var = CAST(CAST(@FirstString AS BINARY(8)) AS BIGINT) ^ CAST(CAST(@SecondString AS BINARY(8)) AS BIGINT),
@pos = PATINDEX('%[^0]%', SUBSTRING(master.dbo.fn_sqlvarbasetostr(@var), 3, 16)),
@pos = (1 + @pos) / 2
RETURN LEFT(@FirstString, @pos) + '-' + LEFT(@SecondString, @pos)
END
N 56°04'39.16"
E 12°55'05.25"
July 2, 2007 at 11:11 pm
So you can't limit the query by level and tree branch on a single click?
--Jeff Moden
Change is inevitable... Change for the better is not.
July 5, 2007 at 6:34 pm
Bob,
For my own purposes I have used your problem as the basis for a blog post. I have modified your original problem to pose a general question concerning the creation of intervals for character data. The solution is in the D4 relational language of Dataphor using sql server as the data repository. It is essentially a non-procedural solution so as to possibly interest t-sql users  If anyone is interested (and you should be) here is the link:
 If anyone is interested (and you should be) here is the link:
http://beyondsql.blogspot.com/2007/07/dataphor-create-string-intervals.html
Thanks for the overall problem and if there are questions feel free to comment on the post. I hasten to add that I could have offered a solution using the RAC utility but I refrained. Should anyone (Jeff) like to see the solution in RAC please post 
best,
http://www.beyondsql.blogspot.com
July 5, 2007 at 7:59 pm
Nope... I'm pretty well spammed out on RAC. How about posting the T-SQL solution that it's supposed to replace?
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply