is there any difference between != and <>

  • Jeff Moden (11/15/2007)


    Me too! Anyone know where there might be a posting for it? Or, maybe post it here so we can do some testing on it?

    http://omnibuzz-sql.blogspot.com/2006/06/concatenate-values-in-column-in-sql.html

    Bottom of the page. It's actually a CSQ, but it seems to run like the wind with proper indexing (8 seconds re-concatenating the 3.54M phone numbers I extracted previously). Those aren't really long sequences to concatenate however (which was what some were reporting as tripping the thing up).

    if you're interested - the code:

    set @StartTime=getdate()

    create index ix_mattPhoneExtTest2 on #mattPhoneExtTest(rid) include (val)

    PRINT CONVERT(CHAR(12),GETDATE()-@StartTime,114) + ' Duration (hh:mi:ss:mmm)- IDX creation'

    set @StartTime=getdate()

    select @dummy=rID, @dummy2=stuff( ( select ','+ val

    from #mattPhoneExtTest t1

    where t2.rID = t1.rID

    for xml path('')

    ),1,1,'')

    from #mattPhoneExtTest t2

    group by rID

    order by rID

    PRINT CONVERT(CHAR(12),GETDATE()-@StartTime,114) + ' Duration (hh:mi:ss:mmm) - XML path() with IDX'

    Results:

    00:00:07:827 Duration (hh:mi:ss:mmm)- IDX creation

    00:00:08:717 Duration (hh:mi:ss:mmm) - XML path() with IDX

    00:00:21:590 Duration (hh:mi:ss:mmm) - XML path() no IDX

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Ok... first things first... after a (finally) decent night's sleep, I did the Phase II code for the phone number problem. Assuming you still have the Tally table active, here's the code in whole... don't need to run any of it separately...

    Like I said, this is where SQL Server and I are gonna get our ears folded back by a CLR... SQL Server pretty much sucks at true ReGex...

    --===========================================================================================================

    PRINT REPLICATE('=',78)

    PRINT SPACE(4)+'Jeff Moden''s Tally Table Solution "Phone test - Phase 1 (Extraction)".'

    PRINT REPLICATE('-',78)

    --===========================================================================================================

    --===== Declare and start a time to measure duration with

    DECLARE @StartTime DATETIME

    SET @StartTime = GETDATE()

    --===== Suppress the auto-display of rowcounts for appearances

    SET NOCOUNT ON

    --===== If the target table already exists, drop it

    IF OBJECT_ID('TempDB.dbo.MyHead','U') IS NOT NULL

    DROP TABLE TempDB.dbo.MyHead

    --===== Declare a local variable for some dynamic SQL

    DECLARE @SQL VARCHAR(300)

    --===== Get the required data and and reserve a column for an index, as well

    -- Index will be used to get rid of "3rd" telephone number for each row.

    SELECT m.RID,

    RTRIM(SUBSTRING(m.Doc,t.N,14)) AS TelNum,

    StartPos = t.N,

    YPos = CASE

    WHEN RIGHT(SUBSTRING(m.Doc,t.N,14),1)=' ' --No space pattern

    THEN 6 --No space pattern

    ELSE 7 --Has space pattern

    END,

    0 AS [Index]

    INTO TempDB.dbo.MyHead

    FROM dbo.MattTestText m

    LEFT OUTER JOIN dbo.Tally t WITH (NOLOCK)

    ON SUBSTRING(m.Doc,t.N,1) = '(' --This check doubles the performance (halves the duration)

    AND ( SUBSTRING(m.Doc,t.N,15) LIKE '%([0-9][0-9][0-9]) [0-9][0-9][0-9]-[0-9][0-9][0-9][0-9] %'

    OR SUBSTRING(m.Doc,t.N,14) LIKE '%([0-9][0-9][0-9])[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9] %')

    AND t.N < LEN(m.Doc)-13

    -- Display the rowcount

    PRINT STR(@@ROWCOUNT,12) + ' rows created in the Temp table...'

    --===== Create the required index

    CREATE UNIQUE INDEX IX_tmpMyHead_RID_StartPos

    ON TempDB.dbo.MyHead (RID,StartPos,Ypos)

    --===== Create the [Index] column which identifies which number telephone number in a row is represented

    -- This has to be dynamic because the index used in WITH(INDEX()) doesn't exist at compile time

    SET @SQL = '

    DECLARE @PrevRID INT, @PrevIndex INT

    SELECT @PrevRID = 0, @PrevIndex = 0

    UPDATE TempDB.dbo.MyHead WITH (TABLOCK)

    SET @PrevIndex = [Index] = CASE WHEN RID = @PrevRID THEN @PrevIndex+1 ELSE 1 END,

    @PrevRID = RID

    FROM TempDB.dbo.MyHead WITH (INDEX(IX_tmpMyHead_RID_StartPos))

    WHERE StartPos > 0'

    EXEC (@SQL)

    --===== Get rid of any "3rd" telephone numbers

    DELETE TempDB.dbo.MyHead WHERE [INDEX] = 3

    --===== Display the duration

    PRINT CONVERT(CHAR(12),GETDATE()-@StartTime,114) + ' Duration (hh:mi:ss:mmm)'

    PRINT REPLICATE('=',78)

    --===== End of test... display the first 100 rows of each table for visual check...

    SELECT TOP 100 * FROM TempDB.dbo.MyHead ORDER BY RID,StartPos

    GO

    --===========================================================================================================

    -- Create a function to Redaction the Doc column for Phase II of the test

    --===========================================================================================================

    --===== If the function already exists, drop it

    IF OBJECT_ID('dbo.fnJBMRedaction') IS NOT NULL

    DROP FUNCTION dbo.fnJBMRedaction

    GO

    --===== Create a UDF to Redaction a row

    CREATE FUNCTION dbo.fnJBMRedaction (@RID INT, @Doc VARCHAR(850))

    RETURNS VARCHAR(850)

    AS

    BEGIN

    SELECT @Doc =

    STUFF(

    STUFF(

    @Doc,

    StartPos+Ypos-1,3,'YYY'),

    StartPos+1,3,'XXX')

    FROM TempDB.dbo.MyHead

    WHERE RID = @RID

    RETURN @Doc

    END

    GO

    --===========================================================================================================

    PRINT REPLICATE('=',78)

    PRINT SPACE(12)+'Jeff Moden''s Non-RegEx Solution "Phone test - Phase 2 (Redaction)".'

    PRINT REPLICATE('-',78)

    --===========================================================================================================

    --===== Declare and start a time to measure duration with

    DECLARE @StartTime DATETIME

    SET @StartTime = GETDATE()

    --===== Copy the original data to the column to be Redactioned (DocRed)

    UPDATE dbo.MattTestText WITH (TABLOCKX)

    SET DocRed = dbo.fnJBMRedaction(RID,Doc)

    -- Display the rowcount

    PRINT STR(@@ROWCOUNT,12) + ' rows Redactioned...'

    --===== Display the duration

    PRINT CONVERT(CHAR(12),GETDATE()-@StartTime,114) + ' Duration (hh:mi:ss:mmm)'

    PRINT REPLICATE('=',78)

    --===== End of test... display the first 100 rows of each table for visual check...

    SELECT TOP 100 * FROM dbo.MattTestText ORDER BY RID

    --===========================================================================================================

    -- Houskeeping... cleanup the stuff we put on Matt's drive

    --===========================================================================================================

    --===== If the target table already exists, drop it

    IF OBJECT_ID('TempDB.dbo.MyHead','U') IS NOT NULL

    DROP TABLE TempDB.dbo.MyHead

    --===== If the function already exists, drop it

    IF OBJECT_ID('dbo.fnJBMRedaction') IS NOT NULL

    DROP FUNCTION dbo.fnJBMRedaction

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Bottom of the page. It's actually a CSQ, but it seems to run like the wind with proper indexing (8 seconds re-concatenating the 3.54M phone numbers I extracted previously). Those aren't really long sequences to concatenate however (which was what some were reporting as tripping the thing up).

    Matt,

    What does the orginal data look like and what does it end up looking like after your concatenation?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • First - test results. Took a while to get the code realigned correctly after the copy/paste operation. I made one change to your code (making the @SQL variable 1000 instead of 300 - it seemed to be truncating the statement)

    ==============================================================================

    Jeff Moden's Tally Table Solution "Phone test - Phase 1 (Extraction)".

    ------------------------------------------------------------------------------

    4026650 rows created in the Temp table...

    00:02:53:357 Duration (hh:mi:ss:mmm)

    ==============================================================================

    ==============================================================================

    Jeff Moden's Non-RegEx Solution "Phone test - Phase 2 (Redaction)".

    ------------------------------------------------------------------------------

    500000 rows Redactioned...

    00:02:19:417 Duration (hh:mi:ss:mmm)

    ==============================================================================

    There's one unknown left. Your extraction process failed to pick up 13K phone numbers, usually the last one in a DOC. Not sure why yet.

    So - not as fast, but personally I'd still say both methods are "in range" (cleaning up 3.5M phone numbers in less than 3 minutes still ain't so shabby, and that's with a "single drive installation" of SQL server.)

    The update phase - well, that's one thing regex is really good at.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Jeff Moden (11/15/2007)


    Matt,

    What does the orginal data look like and what does it end up looking like after your concatenation?

    The original data is the output of Phone Test #1. In other words a table with RID (which repeats) and VAL (a varchar(20) to concatenate).

    sample output

    1(205)267-1886 ,(118)968-6573 ,(136) 232-1351,(192) 660-4222,(856) 398-8842

    2(126) 151-6314,(303)449-1778 ,(202)100-1299 ,(108) 147-1487,(232) 124-8413,(821) 213-2451

    3(246) 120-2076,(386) 752-1275,(114)134-1698 ,(326)173-1150 ,(110)190-1003 ,(173) 181-1402,(132) 110-1908

    4(109)268-1933 ,(710) 207-2316,(449)249-1302 ,(184)113-1468 ,(217) 198-6872,(187)138-7872 ,(184)210-1855 ,(527) 208-1880,(212)148-1957

    5(118) 162-1347,(167)198-1976 ,(566) 105-1552,(384) 219-6923,(192)203-1105 ,(639)192-8775 ,(328) 204-2221,(623)124-1244 ,(120) 162-8677

    6(195)108-2059 ,(119) 115-2446,(621) 181-1370,(644) 193-1051,(178) 176-8503,(140) 550-1360

    7(417) 132-9134,(718) 112-1727,(101) 110-1062,(188) 206-1340,(106) 909-1192,(238) 744-2143,(120) 128-1164,(104) 165-9994

    8(183)449-1619 ,(173)172-2507 ,(177)134-3391 ,(734)125-1635 ,(227) 612-2049,(295) 140-1867,(108)374-4209

    9(687) 162-3682,(144)445-7692 ,(166) 984-6919,(640)487-2014 ,(550) 521-1211,(162) 147-1325,(156) 934-1061

    10(156) 194-2554,(137)148-1946 ,(120)782-4540 ,(162)179-1139 ,(205)712-1560 ,(214) 994-1809,(326) 734-1048

    11(143)988-7749 ,(359) 141-1983,(189) 488-8303,(808) 528-2132,(146) 154-1435,(114)142-9461 ,(212)687-1136 ,(751)121-1204

    12(101)904-6971 ,(517) 106-1043,(945) 565-7758,(442)892-6715 ,(423) 311-5400,(621) 157-1308

    13(127)378-7069 ,(692)833-3567 ,(472) 541-2195,(203) 156-1671,(164)137-9184 ,(153) 128-2106,(231) 153-3532,(103)265-8522

    14(206) 611-1905,(156) 694-6223,(736)103-1067 ,(137) 143-1634

    15(971)171-1079 ,(220) 640-1698,(204)432-1451 ,(123) 140-5961,(173)166-9052 ,(104) 125-3202,(163) 128-9482,(207) 176-1887

    16(212)894-6784 ,(115)592-9209 ,(352)209-2115 ,(140)193-1376 ,(666)749-1685

    17(293)793-2040 ,(139) 877-1467,(211) 553-1489,(169) 152-1267,(148)129-6236 ,(258) 641-1745,(201)106-1371 ,(115) 844-1268

    18(107) 763-8238,(145) 187-4945,(857)837-6324 ,(203) 873-1633,(192)145-8031 ,(563)144-1590 ,(173)385-9178 ,(171)202-3628

    19(201) 178-7597,(111) 976-1995,(987)586-1023 ,(847)184-8850 ,(155)710-6013 ,(178)128-1641

    20(178)180-1055 ,(100)151-1342 ,(121) 301-4615,(118) 445-7220,(898)628-2009 ,(178)141-3891 ,(166)775-1602

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Oh, I will add in an additional challenge if you two feel like it.

    Take data of the following form, with a lot, lot more data (say 100 000 rows and 250 categories)

    CREATE TABLE #Test

    Category INT,

    Code VARCHAR(5)

    )

    INSERT INTO #Test VALUES (1,'a')

    INSERT INTO #Test VALUES (1,'t')

    INSERT INTO #Test VALUES (1,'e')

    INSERT INTO #Test VALUES (1,'s')

    INSERT INTO #Test VALUES (1,'u')

    INSERT INTO #Test VALUES (2,'u')

    INSERT INTO #Test VALUES (2,'e')

    INSERT INTO #Test VALUES (2,'n')

    INSERT INTO #Test VALUES (2,'a')

    INSERT INTO #Test VALUES (3,'i')

    INSERT INTO #Test VALUES (3,'z')

    and output data of the form. (order of the characters isn't important, since there's nothing to order by)

    1, 'a,t,e,s,u'

    2, 'u,e,n,a'

    3, 'i,z'

    My gut feel is that a user-defined aggregate will out perform any T-SQL by quite a way.

    Heh... "Gut feel"... time for some "liposuction" courtesy of some "real" T-SQL to slim down the times with :hehe:

    http://omnibuzz-sql.blogspot.com/2006/06/concatenate-values-in-column-in-sql.html

    That's the URL that Matt posted explaining how poorly the "Concatenation Function" that everyone has come to know and hate runs so terribly slow. Well, it's not SQL Server 2000, T-SQL, or the UDF that's slow...

    In every concatenation example code I've seen (including a couple that yours-truly has made the mistake of writing :blush: ), they do it absolutely the wrong way... first, they make a UDF that looks something like this... (format from the the "Omnibuzz" URL that Matt posted...--=======================================================

    -- Written by someone who doesn't really know how to use

    -- the COALESCE function. Requires the STUFF function

    -- to remove the leading comma the code produces.

    --=======================================================

    CREATE FUNCTION dbo.fnConcat (@Category INT)

    RETURNS VARCHAR(8000)

    AS

    BEGIN

    DECLARE @Result VARCHAR(1000)

    SELECT @Result = COALESCE(@Result,'')+','+Code

    FROM dbo.JBMTest

    WHERE Category = @Category

    RETURN STUFF(@Result,1,1,'')

    END

    Anyone care to identify what's wrong with that picture? :blink: Well, whoever wrote that bloody thing, doesn't really know how to use COALESCE here... if s(he) did, then they'd know that the extra STUFF function to remove the superfluous leading comma is absolutely NOT necessary if the COALESCE function is used correctly... like this......--=======================================================

    -- Better function uses the power of COALESCE to

    -- elimanate the need for the STUFF function.

    --=======================================================

    CREATE FUNCTION dbo.fnConcat (@Category INT)

    RETURNS VARCHAR(8000)

    AS

    BEGIN

    DECLARE @Result VARCHAR(1000)

    SELECT @Result = COALESCE(@Result+',','')+Code

    FROM dbo.JBMTest

    WHERE Category = @Category

    RETURN @Result

    ENDSee the difference??? Also, the problem with that code is that there is absolutely no guarantee as to the order of the parts of the result string. Ya gotta have an ORDER BY somewhere... Since I usually create tables with an IDENTITY column just for this reason, my final UDF code to solve Gails good problem definition is as follows:--=======================================================

    -- Correctly written function uses the power of the

    -- COALESCE function to eliminate the need for the STUFF

    -- function AND the ORDER BY of a "positional" column

    -- (could be an IDENTITY or DATETIME column) guarantees

    -- the parts of the result string will be in the correct

    -- order.

    --=======================================================

    CREATE FUNCTION dbo.fnConcat(@Category INT)

    RETURNS VARCHAR(8000)

    AS

    BEGIN

    DECLARE @Return VARCHAR(8000)

    SELECT @Return = COALESCE(@Return+',','')+Code

    FROM dbo.JBMTest

    WHERE Category = @Category

    ORDER BY RowNum

    RETURN @Return

    ENDNow... someone goes through all that trouble to guarantee that the function works fast and correctly, like in the above, and then they write some really terrible code to use it... probably because they don't know how GROUP BY or DISINCT works anymore than they know how COALESCE works. 😛 PLEASE don't bother running this code... I stopped it after 9 minutes...

    --=======================================================

    -- This is some really bad code that will take

    -- comparitively forever to run. Can YOU see why?

    --=======================================================

    DECLARE @Dummy1 INT, @Dummy2 VARCHAR(8000)

    SELECT @Dummy1 = Category,

    @Dummy2 = dbo.fnConcat(Category)

    FROM dbo.JBMTest

    GROUP BY Category

    So, why am I being so tough on that simple bit of seeming set-based code? The answer is that the author of that code didn't account for what's going to happen behind the scenes... if there are 100,000 rows of 250 categories, that means each category will have an (approximate) average of 400 "Codes".

    The code above will do a GROUP BY on 100,000 rows... the function will process an average of 400 rows per row... 100,000 x 400 is a whopping big 40 MILLION rows that need to not only be calculated, but then also need to be grouped as well. (Another way to state this is that the correct answer for each Category will be calculated 400 times!) And then folks have the nerve to say that T-SQL sucks at concatenation! :hehe: It's not T-SQL, it's the code writer that doesn't know how to correctly use T-SQL or what it does behind the scenes! 😀

    I know, I know... "Put your money where your mouth is, Jeff." OK... Let's get started...

    First, here's the code to setup a test table as Gail suggested... 100,000 rows, 250 "Categories", 26 different single letter "Codes"... this takes about 1 second to generate...--===== Create and populate a 100,000 row test table.

    -- Column "RowNum" has a range of 1 to 1,000,000 unique numbers

    -- Column "Category" has a range of 1 to 250 non-unique numbers

    -- Column "Code" has a range of "A-Z"

    -- Takes just over a second to execute including creation of the PK.

    SELECT TOP 100000

    RowNum = IDENTITY(INT,1,1),

    Category = ABS(CHECKSUM(NEWID()))%250+1,

    Code = CHAR(ABS(CHECKSUM(NEWID()))%26+65)

    INTO dbo.JBMTest

    FROM Master.dbo.SysColumns t1,

    Master.dbo.SysColumns t2 --Lack of join criteria makes this a CROSS-JOIN

    --===== A table is not properly formed unless a Primary Key has been assigned

    ALTER TABLE dbo.JBMTest

    ADD PRIMARY KEY CLUSTERED (RowNum)

    As a reminder... here's the correct function to use for concatenation of this table as Gail described...--=======================================================

    -- Correctly written function uses the power of the

    -- COALESCE function to eliminate the need for the STUFF

    -- function AND the ORDER BY of a "positional" column

    -- (could be an IDENTITY or DATETIME column) guarantees

    -- the parts of the result string will be in the correct

    -- order.

    --=======================================================

    CREATE FUNCTION dbo.fnConcat(@Category INT)

    RETURNS VARCHAR(8000)

    AS

    BEGIN

    DECLARE @Return VARCHAR(8000)

    SELECT @Return = COALESCE(@Return+',','')+Code

    FROM dbo.JBMTest

    WHERE Category = @Category

    ORDER BY RowNum

    RETURN @Return

    END

    ...and, here's the CORRECT way to use the function to concatenate as Gail described...--=======================================================

    -- The right way to make a Comedy Delimited column...

    -- elimate the duplication for the Category column up

    -- front instead of recalculating the same answer over

    -- and over and over as done with the GROUP BY on both

    -- the Category and function.

    --=======================================================

    DECLARE @Dummy1 INT, @Dummy2 VARCHAR(8000)

    SELECT @Dummy1 = d.Category,

    @Dummy2 = dbo.fnConcat(d.Category)

    FROM (--==== Derived table "d" finds distinct Category

    SELECT DISTINCT Category

    FROM dbo.JBMTest

    )dTHAT code returns in even less time (6 seconds on my slow 6year old desktop... can't wait for Matt to run it on his box!) than the XML PATH method that everyone has become so enamoured with. And, THAT's without any special indexing... you should see how fast it runs with the correctly designed index...

    --===== Create an index 6 times the performance

    CREATE INDEX IX_JBMTest_Category_Code

    ON dbo.JBMTest (Category,Code)

    Matt, if you'd like to do the honors on the same box that you tested the XML Path code on, here's the whole shootin' match in one script...

    --===== Use a safe database that everyone has

    USE TempDB

    GO

    --===== If the test table exists, drop it

    IF OBJECT_ID('dbo.JBMTest','U') IS NOT NULL

    DROP TABLE dbo.JBMTest

    GO

    PRINT 'Creating the test table in TempDB, where it''s safe...'

    --===== Create and populate a 100,000 row test table.

    -- Column "RowNum" has a range of 1 to 1,000,000 unique numbers

    -- Column "Category" has a range of 1 to 250 non-unique numbers

    -- Column "Code" has a range of "A-Z"

    -- Takes just over a second to execute including creation of the PK.

    SELECT TOP 100000

    RowNum = IDENTITY(INT,1,1),

    Category = ABS(CHECKSUM(NEWID()))%250+1,

    Code = CHAR(ABS(CHECKSUM(NEWID()))%26+65)

    INTO dbo.JBMTest

    FROM Master.dbo.SysColumns t1,

    Master.dbo.SysColumns t2 --Lack of join criteria makes this a CROSS-JOIN

    --===== A table is not properly formed unless a Primary Key has been assigned

    ALTER TABLE dbo.JBMTest

    ADD PRIMARY KEY CLUSTERED (RowNum)

    --===== Create an index for additional speed

    CREATE INDEX IX_JBMTest_Category_Code

    ON dbo.JBMTest (Category,Code)

    GO

    --========================================================================================

    --===== If the function already exists, drop it

    IF OBJECT_ID('dbo.fnConcat') IS NOT NULL

    DROP FUNCTION dbo.fnConcat

    GO

    --===== Creat the function for the test

    CREATE FUNCTION dbo.fnConcat(@Category INT)

    RETURNS VARCHAR(8000)

    AS

    BEGIN

    DECLARE @Return VARCHAR(8000)

    SELECT @Return = COALESCE(@Return+',','')+Code

    FROM dbo.JBMTest

    WHERE Category = @Category

    ORDER BY RowNum

    RETURN @Return

    END

    GO

    --=======================================================================================

    PRINT REPLICATE('=',78)

    PRINT SPACE(16)+'Jeff Moden''s "Improved" Concatenation Method".'

    PRINT REPLICATE('-',78)

    --=======================================================================================

    --===== Declare and start a time to measure duration with

    DECLARE @StartTime DATETIME

    SET @StartTime = GETDATE()

    --===== Run the concatenation code into some variable to eliminate any disparity between

    -- display times on different machines

    DECLARE @Dummy1 INT, @Dummy2 VARCHAR(8000)

    SELECT @Dummy1 = d.Category,

    @Dummy2 = dbo.fnConcat(d.Category)

    FROM (--==== Derived table "d" finds distinct Category

    SELECT DISTINCT Category

    FROM dbo.JBMTest

    )d

    --===== Display the duration

    PRINT CONVERT(CHAR(12),GETDATE()-@StartTime,114) + ' Duration (hh:mi:ss:mmm)'

    PRINT REPLICATE('=',78)

    --===========================================================================================================

    -- Houskeeping... cleanup the stuff we put on Matt's drive

    --===========================================================================================================

    --===== If the target table already exists, drop it

    IF OBJECT_ID('TempDB.dbo.JBMTest','U') IS NOT NULL

    DROP TABLE TempDB.dbo.JBMTest

    --===== If the function already exists, drop it

    IF OBJECT_ID('dbo.fnConcat') IS NOT NULL

    DROP FUNCTION dbo.fnConcat

    Here's what I get for 100,000 rows with 250 categories (avg of 400 codes per category)...


    Creating the test table in TempDB, where it's safe...

    (100000 row(s) affected)

    ==============================================================================

    Jeff Moden's "Improved" Concatenation Method".

    ------------------------------------------------------------------------------

    00:00:01:110 Duration (hh:mi:ss:mmm)

    ==============================================================================

    ... and here's what I get with 1,000,000 rows with 500 categories (avg of 2000 codes per category).


    Creating the test table in TempDB, where it's safe...

    (1000000 row(s) affected)

    ==============================================================================

    Jeff Moden's "Improved" Concatenation Method".

    ------------------------------------------------------------------------------

    00:00:10:903 Duration (hh:mi:ss:mmm)

    ==============================================================================

    Fast enough for ya? 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff - one last update on Phone Test #1.

    I got the results to match. two smallish updates:

    - changed the AND t.N < LEN(m.Doc)-14 to AND t.N < LEN(m.Doc)-11

    and that got rid of MOST of the missing ones.

    - the other one involves dealing with a difference in behavior between 2000 and 2005 with regards to

    the "inner loop" process (the dynamic SQL statement). It does some funky things unless you add some dummy values to

    make sure the @ variables are updating correctly.

    So the updated dynamic looks like:

    SET @SQL = 'DECLARE @PrevRID INT, @PrevIndex INT , @dummy INT,@dummy2 INT

    SELECT @PrevRID = 0,@PrevIndex = 0

    UPDATE TempDB.dbo.MyHead WITH (TABLOCK)

    SET @PrevIndex = [Index] = CASE WHEN RID = @PrevRID THEN @PrevIndex+1 ELSE 1 END,

    @dummy=@previndex,

    @PrevRID = RID,

    @dummy2=@prevrid

    FROM TempDB.dbo.MyHead

    WITH (INDEX(IX_tmpMyHead_RID_StartPos)) WHERE StartPos > 0'

    EXEC (@SQL)

    No noticeable difference in processing time.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Jeff - I was going to quote you...but I didn't know what to pick up. You're right - most of the examples in there are fairly disgusting SQL.

    That being said - I took your script and added two more contenders, and came out with this. The 100,000 row test was TOO fast. As Tim "the Toolman" Taylor would say - "your table lacked power...so I reworked it". (It's okay, grunting is allowed). Only change I made was to bop the rowcount to 10,000,000 and the category count to 25,000 (so we don't blow out the varchar limit).

    Results...

    (10000000 row(s) affected)

    ==============================================================================

    Jeff Moden's "Improved" Concatenation Method".

    ------------------------------------------------------------------------------

    00:00:21:547 Duration (hh:mi:ss:mmm)

    ==============================================================================

    ==============================================================================

    The XML path() method

    ------------------------------------------------------------------------------

    00:00:08:423 Duration (hh:mi:ss:mmm)

    ==============================================================================

    ==============================================================================

    The CLR UDA version

    ------------------------------------------------------------------------------

    00:00:20:517 Duration (hh:mi:ss:mmm)

    ==============================================================================

    As I said earlier - I was very surprised by this result as well... The User-defined aggregate dbo.concatenate is a copy and paste out of the BOL example, and it definitely held its own. The XML Path just tore it up....

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • No noticeable difference in processing time.

    You mean on the dynamic SQL by itself or the whole Phone Test #1?

    Thanks for the great tip (we call them "oolies" in the US Navy) on the dynamic SQL in 2k5...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • ==============================================================================

    The XML path() method

    ------------------------------------------------------------------------------

    00:00:08:423 Duration (hh:mi:ss:mmm)

    ==============================================================================

    Now... THAT's impressive! It reduced the processing time by almost 2/3rds. Rare thing when you find a gem like that. Thanks!

    EDIT: What I REALLY like about it, is it make the CLR blow chunks! Of course, my T-SQL didn't fair any better, but it was a close race 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (11/15/2007)


    No noticeable difference in processing time.

    You mean on the dynamic SQL by itself or the whole Phone Test #1?

    Thanks for the great tip (we call them "oolies" in the US Navy) on the dynamic SQL in 2k5...

    No noticeable time difference processing phone Test 1 once I made the adjustments. Just got the match counts equal on both methods.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Sorry... must still have the "ranger eye's" going on from lack of sleep the other day... I just say the results you posted for the run.

    Thanks for fixing my boo-boo. I also gotta say that was a hell of a lot of fun! All of it! Learned a heck of a lot, today!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Took a while to get the code realigned correctly after the copy/paste operation

    Matt,

    Reformatting code that you've copied from those "Code" windows is actually pretty easy (although I wish they'd just fix it so it works right!!!).... put the mouse pointer just above a code window (must be outside the code window). Click and hold while you select down past the code window (again, must be outside the code window).

    Copy and paste the select into word... leading spaces will be preserved. Then, do a Search'n'Replace... replace ^l (circumflex small "L", manual line break in Word) with ^p (circumflex small "P", paragraph mark in Word).

    Then, copy from the Word document into Query Analyzer (or that "thing" for 2k5 😛 ) and all is pretty... 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (11/15/2007)


    Took a while to get the code realigned correctly after the copy/paste operation

    Matt,

    Reformatting code that you've copied from those "Code" windows is actually pretty easy (although I wish they'd just fix it so it works right!!!).... put the mouse pointer just above a code window (must be outside the code window). Click and hold while you select do past the code window (again, must be outside the code window).

    Copy and paste the select into word... leading spaces will be preserved. Then, do a Search'n'Replace... replace ^l (circumflex small "L", manual line break in Word) with ^p (circumflex small "P", paragraph mark in Word).

    Then, copy from the Word document into Query Analyzer (or that "thing" for 2k5 😛 ) and all is pretty... 😉

    I umm...know... I've been avoiding that method ever since I fat-fingered some SSC code into the middle of several very large technical documents.....:blink:

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Jeff Moden (11/15/2007)


    Sorry... must still have the "ranger eye's" going on from lack of sleep the other day... I just say the results you posted for the run.

    Thanks for fixing my boo-boo. I also gotta say that was a hell of a lot of fun! All of it! Learned a heck of a lot, today!

    So did I - it turned out to be VERY useful. I'm still mulling over these results, because I think the best solution might be a combination of the two processes. There seems to be a giant suction noise coming from the performance factor of a CROSS APPLY. The RegexMatches TVF seems to kick some butt, but using it in a cross apply seems to even the game out. I'm wondering how a (scalar) regexMatch function hooked up to a Tally Table might do.

    ...

    Where RegexMatch would take in string to check, pattern to look for, and the occurence number, and would return the nth match in a string.

    ...

    Let me think on it this weekend. I have to be on a plane by 6AM, so tonight's not an option.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Viewing 15 posts - 46 through 60 (of 63 total)

You must be logged in to reply to this topic. Login to reply