substring comparison for last 2 characters

  • dwain.c (7/2/2013)


    Erik - Just because your Prod system doesn't have a million rows to test against, doesn't mean you can't test against a million rows!

    The test harness in the article demonstrates the generic approach for this (which I do not take credit for by the way).

    This would directly compare PatternSplitCM against your SVF.

    CREATE TABLE #Phones (phone VARCHAR(12));

    WITH Tally (n) AS (

    SELECT TOP 1000000 ROW_NUMBER() OVER (ORDER BY (SELECT 1))

    FROM sys.all_columns a CROSS JOIN sys.all_columns b)

    INSERT INTO #Phones

    SELECT 1000000+ABS(CHECKSUM(NEWID()))%1000000

    FROM Tally;

    DECLARE @Phone VARCHAR(12), @StartDT DATETIME;

    PRINT 'PatternSplitCM';

    SELECT @StartDT = GETDATE();

    SET STATISTICS TIME ON;

    SELECT @Phone=Item

    FROM #Phones

    CROSS APPLY dbo.PatternSplitCM(phone, '[0-9]')

    WHERE [Matched]=1;

    SET STATISTICS TIME OFF;

    SELECT ElapsedMS=DATEDIFF(millisecond, @StartDT, GETDATE());

    PRINT 'extractinteger';

    SELECT @StartDT = GETDATE();

    SET STATISTICS TIME ON;

    SELECT @Phone=dbo.extractinteger(phone)

    FROM #Phones;

    SET STATISTICS TIME OFF;

    SELECT ElapsedMS=DATEDIFF(millisecond, @StartDT, GETDATE());

    GO

    DROP TABLE #Phones;

    Edit: I modified it to SELECT the ElapsedMS of each run, which is needed due to a bias that is introduced by applying STATISTICS TIME to an SVF as demonstrated here: http://www.sqlservercentral.com/articles/T-SQL/91724/

    Hi Dwain,

    I'm having a little trouble getting your splitter to parse. After changing the way it sets local variables to work with 2k5, I get these errors:

    Msg 156, Level 15, State 1, Procedure PatternSplitLoop, Line 42

    Incorrect syntax near the keyword 'WHEN'.

    Msg 102, Level 15, State 1, Procedure PatternSplitLoop, Line 54

    Incorrect syntax near 'a'.

    In the mean time, I did run your performance test on my SVF alone:

    ElapsedMS

    401603

    But I have no idea if that's good or bad.

  • You're going to have a bit of a problem if you're using SQL 2005.

    You don't need to worry about PatternSplitLoop as it is not the fastest of the methods. Use PatternSplitCM instead.

    PatternSplitCM has one construct that is not compatible with SQL 2005. That is the method of generating the Tally table.

    WITH numbers AS (

    SELECT TOP(ISNULL(DATALENGTH(@List), 0))

    n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))

    FROM

    (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d (n),

    (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e (n),

    (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) f (n),

    (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) g (n))

    You'll need to change this to a Ben-gan style in-line tally table (using cascading CTEs). Like this:

    WITH Nbrs_3(n) AS (SELECT 1 UNION SELECT 0)

    ,Nbrs_2 (n) AS (SELECT 1 FROM Nbrs_3 n1 CROSS JOIN Nbrs_3 n2)

    ,Nbrs_1 (n) AS (SELECT 1 FROM Nbrs_2 n1 CROSS JOIN Nbrs_2 n2)

    ,Nbrs_0 (n) AS (SELECT 1 FROM Nbrs_1 n1 CROSS JOIN Nbrs_1 n2)

    ,Nbrs (n) AS (SELECT 1 FROM Nbrs_0 n1 CROSS JOIN Nbrs_0 n2)

    ,numbers (n) AS (SELECT TOP(ISNULL(DATALENGTH(@List), 0))

    ROW_NUMBER() OVER (ORDER BY n) As n FROM Nbrs)

    I think a direct replacement of what I've given you into the body of PatternSplitCM should make it compilable on SQL 2005 (I hope).


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • @timscronin,

    From your initial and follow-up descriptions, it looks like Scott's solution works. Are you all set now or do you still need some help?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Oh and BTW, I was getting times of around 65000 ms (for PatternSplitCM) on my clocked down laptop against the 1,000,000 rows of random phone numbers I was generating in the test harness I gave you.

    While not directly comparable due to different machines, once you get PatternSplitCM working, I think you should see quite a difference.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Jeff Moden (7/2/2013)


    @timscronin,

    From your initial and follow-up descriptions, it looks like Scott's solution works. Are you all set now or do you still need some help?

    Good idea there to check back with the OP. Erik and I have hijacked this thread on a bit of a tangent.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • dwain.c (7/2/2013)


    Oh and BTW, I was getting times of around 65000 ms (for PatternSplitCM) on my clocked down laptop against the 1,000,000 rows of random phone numbers I was generating in the test harness I gave you.

    While not directly comparable due to different machines, once you get PatternSplitCM working, I think you should see quite a difference.

    You certainly were right:

    ElapsedMS

    14596

  • erikd (7/2/2013)


    dwain.c (7/2/2013)


    Oh and BTW, I was getting times of around 65000 ms (for PatternSplitCM) on my clocked down laptop against the 1,000,000 rows of random phone numbers I was generating in the test harness I gave you.

    While not directly comparable due to different machines, once you get PatternSplitCM working, I think you should see quite a difference.

    You certainly were right:

    ElapsedMS

    14596

    Glad to hear you got it working and that it worked well for you.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 7 posts - 16 through 21 (of 21 total)

You must be logged in to reply to this topic. Login to reply