string manipulation help

  • I have the following values..

    IT

    IT.Control.Bing

    IT.Control.Bing.AC Review

    IT.FldSvc.ARTS.Admin

    IT.FldSvc.ARTS.Hardware.Printers/Copiers/Fax

    IT.ProdSupp.UPMB.OASYS

    IT.ProdSuppE.Atlanta.Accounting.Validation

    I need to parse the string and return all values after the 3rd period. The return values would be.....

    AC Review

    Admin

    Hardware.Printers/Copiers/Fax

    OASYS

    Accounting.Validation

    I also want to grouped these values to the characters before the 2nd period. In other words....

    AC Review will be grouped under IT.Control

    Admin and Hardware.Printers/Copiers/Fax will be grouped under IT.FldSvc

    OASYS will be grouped under IT.ProdSupp

    Accounting.Validation will be grouped under IT.ProdSuppE

    Help please.....

  • You can create two user defined functions. One retrieves the names (after two dot) and one retrieves the group names.

    Here is an example of user defined function. It can be run using Northwind database of SQL Server 2000.

    CREATE FUNCTION TestFun (@input nvarchar(100) )

    RETURNS nvarchar(30)

    AS

    BEGIN

    return substring(@input, 1, 2)

    END

    GO

    select dbo.TestFun(productname), count(*)

    from products

    group by dbo.TestFun(productname)

    By the way, you need to use "CHARINDEX" to find the "dot" in your user defined functions.

    http://www.speedydb.com

    Charles Zhang

  • You haven't provided your table schema so I have used the following in my query below:

    CREATE TABLE Items (

    name varchar(100) NOT NULL

    )

    GO

    INSERT INTO Items

    SELECT 'IT'

    UNION ALL SELECT 'IT.Control.Bing'

    UNION ALL SELECT 'IT.Control.Bing.AC Review'

    UNION ALL SELECT 'IT.FldSvc.ARTS.Admin'

    UNION ALL SELECT 'IT.FldSvc.ARTS.Hardware.Printers/Copiers/Fax'

    UNION ALL SELECT 'IT.ProdSupp.UPMB.OASYS'

    UNION ALL SELECT 'IT.ProdSuppE.Atlanta.Accounting.Validation'

    Here's the query.

    SELECT LEFT(I.name, I.P2 - 1), RIGHT(I.name, LEN(I.name) - I.P3)

    FROM (

    SELECT I3.name, I3.P2, P3 = CHARINDEX('.', I3.name, I3.P2 + 1)

    FROM (

    SELECT I2.name, P2 = CHARINDEX('.', I2.name, I2.P1 + 1)

    FROM (

    SELECT I1.name, P1 = CHARINDEX('.', I1.name)

    FROM Items I1

    ) I2

    ) I3

    ) I

    WHERE (I.P3 > 0)

    ORDER BY 1

    Here are the results of this query using the data above. I'm not sure how well the query will perform on a very large table.

    [font="Courier New"]IT.Control AC Review

    IT.FldSvc Admin

    IT.FldSvc Hardware.Printers/Copiers/Fax

    IT.ProdSupp OASYS

    IT.ProdSuppE Accounting.Validation[/font]

  • The following variation has a level of SELECT nesting that is one fewer, but gives the same results.

    SELECT LEFT(I.name, I.P2 - 1), RIGHT(I.name, LEN(I.name) - CHARINDEX('.', I.name, I.P2 + 1))

    FROM (

    SELECT I2.name, P2 = CHARINDEX('.', I2.name, I2.P1 + 1)

    FROM (

    SELECT I1.name, P1 = CHARINDEX('.', I1.name)

    FROM Items I1

    ) I2

    ) I

    WHERE (CHARINDEX('.', I.name, I.P2 + 1) > 0)

    ORDER BY 1

  • SELECT Name,

    SUBSTRING(Name,NULLIF(CHARINDEX('.',Name,NULLIF(CHARINDEX('.',Name,NULLIF(CHARINDEX('.',Name),0)+1),0)+1),0)+1,100)

    FROM dbo.ITEMS

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • hi

    i think smith is right

    [font="Courier New"]Aram Koukia: http://www.koukia.ca[/font]
  • aram_golbaghi (2/8/2009)


    hi

    i think smith is right

    About what and why?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Who can say if I'm right other than the OP, it depends on the OP's precise requirements.

    OP does state the following which Jeff's query doesn't do.

    I also want to grouped these values to the characters before the 2nd period.

    However, my queries do not return rows corresponding to the following strings because they do not contain at least 3 '.' characters. Only the OP knows how these should be dealt with.

    IT

    IT.Control.Bing

    I've tried comparing the performance of 4 different variations of my query where the number of nested select statements varies from 1 to 4. These 4 queries produce identical output. My testing used an Items table with 294,407 rows. There were no indexes defined on the table.

    The results of this testing shows that the number of nested SELECT statements make little difference to the execution time (< 5% difference in both CPU time and elapsed time between best and worst cases), and that the actual execution plans appear to be the same. It seems that the query optimizer is able to recognise that all 4 queries only require a single pass (in this case a table scan).

  • Here are the 4 queries I used for the testing mentioned in my previous post. Unlike my original queries, they do return a row corresponding to every row in the Items table. This is achieved by using CASE WHEN clauses in the SELECT list and removing the WHERE clause.

    PRINT ''

    PRINT '--------------------------'

    PRINT '4 nested select statements'

    SET STATISTICS TIME ON

    SELECT

    CASE WHEN (I.P2 > 0) THEN LEFT(I.name, I.P2 - 1) ELSE I.name END,

    CASE WHEN (I.P3 > 0) THEN RIGHT(I.name, LEN(I.name) - I.P3) ELSE NULL END

    FROM (

    SELECT I3.name, I3.P2, P3 = CHARINDEX('.', I3.name, I3.P2 + 1)

    FROM (

    SELECT I2.name, P2 = CHARINDEX('.', I2.name, I2.P1 + 1)

    FROM (

    SELECT I1.name, P1 = CHARINDEX('.', I1.name)

    from Items I1

    ) I2

    ) I3

    ) I

    ORDER BY 1

    SET STATISTICS TIME OFF

    GO

    PRINT ''

    PRINT '--------------------------'

    PRINT '3 nested select statements'

    SET STATISTICS TIME ON

    SELECT

    CASE WHEN (I.P2 > 0) THEN LEFT(I.name, I.P2 - 1) ELSE I.name END,

    CASE WHEN (I.P3 > 0) THEN RIGHT(I.name, LEN(I.name) - I.P3) ELSE NULL END

    FROM (

    SELECT I3.name, I3.P2, P3 = CHARINDEX('.', I3.name, I3.P2 + 1)

    FROM (

    SELECT I2.name, P2 = CHARINDEX('.', I2.name, CHARINDEX('.', I2.name) + 1)

    FROM Items I2

    ) I3

    ) I

    ORDER BY 1

    SET STATISTICS TIME OFF

    GO

    PRINT ''

    PRINT '--------------------------'

    PRINT '2 nested select statements'

    SET STATISTICS TIME ON

    SELECT

    CASE WHEN (I.P2 > 0) THEN LEFT(I.name, I.P2 - 1) ELSE I.name END,

    CASE WHEN (I.P3 > 0) THEN RIGHT(I.name, LEN(I.name) - I.P3) ELSE NULL END

    FROM (

    SELECT

    I3.name,

    P2 = CHARINDEX('.', I3.name, CHARINDEX('.', I3.name) + 1),

    P3 = CHARINDEX('.', I3.name, CHARINDEX('.', I3.name, CHARINDEX('.', I3.name) + 1) + 1)

    FROM Items I3

    ) I

    ORDER BY 1

    SET STATISTICS TIME OFF

    GO

    PRINT ''

    PRINT '--------------------------'

    PRINT 'Single select statement'

    SET STATISTICS TIME ON

    SELECT

    CASE WHEN (CHARINDEX('.', name, CHARINDEX('.', name) + 1) > 0) THEN

    LEFT(name, CHARINDEX('.', name, CHARINDEX('.', name) + 1) - 1)

    ELSE name END,

    CASE WHEN (CHARINDEX('.', name, CHARINDEX('.', name, CHARINDEX('.', name) + 1) + 1) > 0) THEN

    RIGHT(name, LEN(name) - CHARINDEX('.', name, CHARINDEX('.', name, CHARINDEX('.', name) + 1) + 1))

    ELSE NULL END

    FROM Items

    ORDER BY 1

    SET STATISTICS TIME OFF

    GO

    Here are my timing statistics for the 4 queries.

    [font="Courier New"]--------------------------

    4 nested select statements

    (294407 row(s) affected)

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 5180 ms, elapsed time = 5586 ms.

    --------------------------

    3 nested select statements

    (294407 row(s) affected)

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 5117 ms, elapsed time = 5450 ms.

    --------------------------

    2 nested select statements

    (294407 row(s) affected)

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 5054 ms, elapsed time = 5507 ms.

    --------------------------

    Single select statement

    (294407 row(s) affected)

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 5102 ms, elapsed time = 5498 ms.[/font]

    Given that the performance of the 4 queries are equivalent, we are free to consider other factors such as readability when choosing between them. I personally find the 3rd query with a single derived table the most transparent of the four.

    SELECT

    CASE WHEN (I.P2 > 0) THEN LEFT(I.name, I.P2 - 1) ELSE I.name END,

    CASE WHEN (I.P3 > 0) THEN RIGHT(I.name, LEN(I.name) - I.P3) ELSE NULL END

    FROM (

    SELECT

    I3.name,

    P2 = CHARINDEX('.', I3.name, CHARINDEX('.', I3.name) + 1),

    P3 = CHARINDEX('.', I3.name, CHARINDEX('.', I3.name, CHARINDEX('.', I3.name) + 1) + 1)

    FROM Items I3

    ) I

    ORDER BY 1

  • Incidently, avoiding the use of the NULLIF function in these queries does give a significant performance improvement. The following compares Jeff's query with a functionally equivalent query that does not use the NULLIF function.

    PRINT ''

    PRINT '--------------------------'

    PRINT 'Without NULLIF'

    SET STATISTICS TIME ON

    SELECT

    name,

    CASE WHEN (CHARINDEX('.', name, CHARINDEX('.', name, CHARINDEX('.', name) + 1) + 1) > 0) THEN

    RIGHT(name, LEN(name) - CHARINDEX('.', name, CHARINDEX('.', name, CHARINDEX('.', name) + 1) + 1))

    ELSE NULL END

    FROM dbo.Items

    SET STATISTICS TIME OFF

    GO

    PRINT ''

    PRINT '--------------------------'

    PRINT 'With NULLIF'

    SET STATISTICS TIME ON

    SELECT

    name,

    SUBSTRING(name,NULLIF(CHARINDEX('.',name,NULLIF(CHARINDEX('.',name,NULLIF(CHARINDEX('.',name),0)+1),0)+1),0)+1,100)

    FROM dbo.Items

    SET STATISTICS TIME OFF

    GO

    Here are my timing statistics comparing the two queries:

    [font="Courier New"]--------------------------

    Without NULLIF

    (294407 row(s) affected)

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 2621 ms, elapsed time = 3561 ms.

    --------------------------

    With NULLIF

    (294407 row(s) affected)

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 5070 ms, elapsed time = 5461 ms.[/font]

  • The task itself is stupid enough not to bother to run any benchmark tests.

    SQL Server is a RDMS, which stands for Relational Database Management System.

    Can you show me any piece of relational database here?

    Andrew, VB Script running against a flat file will beat any of your SQL options by times and times.

    So, I don't see any point to enter the losers' competition.

    If you mean to do it in SQL you need to create relational database first.

    Then you need to populate that database by parsing incoming strings.

    After that you'll get really effective output.

    _____________
    Code for TallyGenerator

  • Sergiy,

    The task itself is stupid enough not to bother to run any benchmark tests.

    I have to admit getting a bit carried away with this string manipulation problem. The only thing I'd say is that sometimes you are forced by circumstances to do things in SQL Server that it wasn't designed for.

  • Thanks Andrew. Your script is exactly what I needed. The data is already in a sql table and I am using SRS to create the report.

  • andrewd.smith (2/8/2009)


    Who can say if I'm right other than the OP, it depends on the OP's precise requirements.

    OP does state the following which Jeff's query doesn't do.

    I also want to grouped these values to the characters before the 2nd period.

    However, my queries do not return rows corresponding to the following strings because they do not contain at least 3 '.' characters. Only the OP knows how these should be dealt with.

    IT

    IT.Control.Bing

    I've tried comparing the performance of 4 different variations of my query where the number of nested select statements varies from 1 to 4. These 4 queries produce identical output. My testing used an Items table with 294,407 rows. There were no indexes defined on the table.

    The results of this testing shows that the number of nested SELECT statements make little difference to the execution time (< 5% difference in both CPU time and elapsed time between best and worst cases), and that the actual execution plans appear to be the same. It seems that the query optimizer is able to recognise that all 4 queries only require a single pass (in this case a table scan).

    Yep... you're definitely correct on that... I forgot the grouping. Sorry about that.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • andrewd.smith (2/9/2009)


    The only thing I'd say is that sometimes you are forced by circumstances to do things in SQL Server that it wasn't designed for.

    Unless you're a professional.

    _____________
    Code for TallyGenerator

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply