Catch Data between symbols in SQL

  • Hi geniuses,

    I need to catch some data from a column named AllLevels, that contains, in each line:

    Analysis»Letters»Numbers»Detail

    I want to be able to separate the data, in order to organize it this way:

    LevelI LevelIILevelIIILevelIV

    Analysis LettersNumbers Detail

    Thanks in advance.

    Regards

  • Can you post ddl and sample data. See the first link in my signature.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • I'm creating the query from scratch, can't really expose data.

    Can you start based on that? This is mostly a query problem.

    AllLevels

    Analysis»Letters»Numbers»Detail

    Description»Letters»Numbers»Detail

    Planning»Letters»Symbols»Detail

    Thanks in advance.

    Regards

  • As always the case with matching/parsing tasks, it's helpful to actually see the real data to be able to offer a meaningful solution/advise.

    If you cannot expose sensitive data, can you make an effort and generate a couple [or more] of bogus records that are similar to the original ones in complexity and pattern?

  • are there deeper than 4 levels in the "real" data? are there always exactly four levels?

    the right solution depends a lot on some assumptions we can make with the data

    here's just one example, using PARSENAME, which expects a maximum of four sections:

    With MySampleData (AllLevels)

    AS

    (

    SELECT 'Analysis»Letters»Numbers»Detail' UNION ALL

    SELECT 'Description»Letters»Numbers»Detail' UNION ALL

    SELECT 'Planning»Letters»Symbols»Detail'

    )

    SELECT

    PARSENAME(REPLACE(AllLevels,'»','.'),4) As Level1,

    PARSENAME(REPLACE(AllLevels,'»','.'),3) As Level2,

    PARSENAME(REPLACE(AllLevels,'»','.'),2) As Level3,

    PARSENAME(REPLACE(AllLevels,'»','.'),1) As Level4

    FROM MySampleData

    other possibilities are using a custom function to do the same, or the DelimitedSplit8K function and a PIVOT to get the data in the desired format.

    here, a scalar function seems to be an easier implementation for me, i think.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • You know I was about to reply with a query to parse the string out but then noticed this in Sean's signature.

    Need to split a string? Try Jeff Moden's splitter[/url].

    I would read that and try out the function he has near the bottom. It will split your string into multiple rows. Use cross apply to join it to your table, then you can group the different rows back together to give you your columns.

    Or you can write some parsing code using LEFT, SUBSTRING and CHARINDEX. Even assuming that all 4 levels will always be there it isn't exactly trivial (at least not for me.)

    Here is a start to it. You can see using Jeff's splitter might be easier though.

    declare @temp varchar(250)

    SET @temp = 'Analysis»Letters»Numbers»Detail'

    SELECT LEFT(@temp,charindex('»',@temp)-1),

    SUBSTRING(@temp, charindex('»',@temp)+1, charindex('»',@temp,charindex('»',@temp)+1)-charindex('»',@temp)-1)

    Kenneth FisherI was once offered a wizards hat but it got in the way of my dunce cap.--------------------------------------------------------------------------------For better, quicker answers on T-SQL questions, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/[/url]For better answers on performance questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/[/url]Link to my Blog Post --> www.SQLStudies.com[/url]

  • It is very easy to create a small script that creates the table and insert the data into the table. Since it does save some time from anyone that tries to help you, you should include in your question such a script. Here is a small script that creates the table and inserts data:

    create table Demo (vc varchar(200))

    go

    insert into Demo (vc) values ('Analysis»Letters»Numbers»Detail')

    insert into Demo (vc) values ('Description»Letters»Numbers»Detail')

    insert into Demo (vc) values ('Planning»Letters»Symbols»Detail')

    If you don't have dots inside the strings, you can use the function parsename. This function is intended to parse full name of objects that are referenced in the database and give the name of the server or database or schema or object. Since those parts are separated by period, the code bellow will work if there is no period inside your strings. If you can have periods inside the strings, you'll have to modify it.

    select parsename(replace(vc,'»','.'),4), parsename(replace(vc,'»','.'),3), parsename(replace(vc,'»','.'),2), parsename(replace(vc,'»','.'),1)

    from Demo

    Adi

    --------------------------------------------------------------
    To know how to ask questions and increase the chances of getting asnwers:
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Thanks Guys.

    Lowell, yes there are more levels indeed.

    Regards

  • davdam8 (2/6/2013)


    Thanks Guys.

    Lowell, yes there are more levels indeed.

    Regards

    Then I would suggest you look at the article in my signature about splitting strings. Then look at the articles in my signature about cross tabs and pivots. The techniques found in those will get what you need.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • I worked it out. This is the query that would get you the results you want:

    --Creating Table

    Create Table Ex1

    (

    Id Int Identity(1,1),

    AllLevels Varchar(MAX)

    )

    --Inserting Sample Data

    Insert Into Ex1

    Select 'Analysis»Letters»Numbers»Detail'

    Union ALL

    Select 'Description»Letters»Numbers»Detail'

    Union ALL

    Select 'Planning»Letters»Symbols»Detail'

    --Query for your requirement

    Select Id,

    MAX(Case When rn = 'Level1' Then df Else '' End) As Level1,

    MAX(Case When rn = 'Level2' Then df Else '' End) As Level2,

    MAX(Case When rn = 'Level3' Then df Else '' End) As Level3,

    MAX(Case When rn = 'Level4' Then df Else '' End) As Level4

    From

    (

    Select Id, AllLevels, a.df, 'Level' + Cast(ROW_NUMBER() Over(Partition By Id Order By Id) As Varchar) As rn From Ex1 As b

    Cross Apply

    (

    SELECT SUBSTRING('»' + b.AllLevels + '»',N+1,CHARINDEX('»','»' + b.AllLevels + '»',N+1)-N-1) As df

    FROM dbo.Tally As a

    WHERE a.N < LEN('»' + b.AllLevels + '»')

    AND SUBSTRING('»' + b.AllLevels + '»',N,1) = '»'

    ) a

    ) As p

    Group By Id

    But, it takes a 8 looooong seconds to get the result for this teeeny sample data. So, for tables with huge amounts of data this would definitely not be a good one.

    This means that your design is not good. Inserting Delimited strings and pivoting them is not what SQL Server is meant to do.

    You should rather do it with a reporting tool.

    This is what I could come up with...may be someone might come up with something better.

    Vinu Vijayan

    For better and faster solutions please check..."How to post data/code on a forum to get the best help" - Jeff Moden[/url] 😉

  • You can switch the Cross Apply in the query with an Outer Apply, that would reduce the time by more than 70%, as follows :

    Select Id,

    MAX(Case When rn = 'Level1' Then df Else '' End) As Level1,

    MAX(Case When rn = 'Level2' Then df Else '' End) As Level2,

    MAX(Case When rn = 'Level3' Then df Else '' End) As Level3,

    MAX(Case When rn = 'Level4' Then df Else '' End) As Level4

    From

    (

    Select Id, AllLevels, a.df, 'Level' + Cast(ROW_NUMBER() Over(Partition By Id Order By Id) As Varchar) As rn From Ex1 As b

    Outer Apply

    (

    SELECT SUBSTRING('»' + b.AllLevels + '»',N+1,CHARINDEX('»','»' + b.AllLevels + '»',N+1)-N-1) As df

    FROM dbo.Tally As a

    WHERE a.N < LEN('»' + b.AllLevels + '»')

    AND SUBSTRING('»' + b.AllLevels + '»',N,1) = '»'

    ) a

    ) As p

    Group By Id

    Vinu Vijayan

    For better and faster solutions please check..."How to post data/code on a forum to get the best help" - Jeff Moden[/url] 😉

  • Even Better.......if you create a Non Clustered index On the Tally table then you get the results even fatser.

    Create NonClustered Index NCLIX_Tally On Tally(n)

    Select Id,

    MAX(Case When rn = 'Level1' Then df Else '' End) As Level1,

    MAX(Case When rn = 'Level2' Then df Else '' End) As Level2,

    MAX(Case When rn = 'Level3' Then df Else '' End) As Level3,

    MAX(Case When rn = 'Level4' Then df Else '' End) As Level4

    From

    (

    Select Id, AllLevels, a.df, 'Level' + Cast(ROW_NUMBER() Over(Partition By Id Order By Id) As Varchar) As rn From Ex1 As b

    Outer Apply

    (

    SELECT SUBSTRING('»' + b.AllLevels + '»',N+1,CHARINDEX('»','»' + b.AllLevels + '»',N+1)-N-1) As df

    FROM dbo.Tally As a

    WHERE a.N < LEN('»' + b.AllLevels + '»')

    AND SUBSTRING('»' + b.AllLevels + '»',N,1) = '»'

    ) a

    ) As p

    Group By Id

    The time and IO of the query after adding the index is as follows:

    Table 'Tally'. Scan count 3, logical reads 12, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Ex1'. Scan count 1, logical reads 1, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 119 ms.

    If you don't know what a Tally table is then you need to check the following link. It shows how to create a tally Table and how to split a string using it.

    Jeoff Moden's Splitter using Tally Table[/url]

    Hope this helps. 🙂

    Vinu Vijayan

    For better and faster solutions please check..."How to post data/code on a forum to get the best help" - Jeff Moden[/url] 😉

  • ok, still sticking with a scalar function, here's a parsename enhancement I slapped together a year and a half ago; it takes an unlimited number of elements:

    the original PARSENAME thinks of elements as going right to left...that is the first element is the right most;

    my example is the opposite... the leftmost substring is 1, then 2 etc., which actually fits better with this requirement.

    see the thread here if you want the details:

    http://www.sqlservercentral.com/Forums/Topic1176642-1292-1.aspx#bm1176804

    also see this thread: dwainc posted a solution using the DelimitedSplit8K solution, and his solution is roughly eight times faster on a million rows of data:

    http://www.sqlservercentral.com/Forums/Topic1372793-1292-1.aspx

    With MySampleData (AllLevels)

    AS

    (

    SELECT 'Analysis»Letters»Numbers»Detail' UNION ALL

    SELECT 'Description»Letters»Numbers»Detail' UNION ALL

    SELECT 'Planning»Letters»Symbols»Detail' UNION ALL

    SELECT 'Analysis»Letters»Numbers»Detail»SubDetail' UNION ALL

    SELECT 'Description»Letters»Numbers»Detail»Version»Major' UNION ALL

    SELECT 'Planning»Letters»Symbols»Detail»Version»Major»Minor»Revision»SubDetail»'

    )

    SELECT

    dbo.fn_parsename(AllLevels,'»',1) As Level1,

    dbo.fn_parsename(AllLevels,'»',2) As Level2,

    dbo.fn_parsename(AllLevels,'»',3) As Level3,

    dbo.fn_parsename(AllLevels,'»',4) As Level4,

    dbo.fn_parsename(AllLevels,'»',5) As Level5,

    dbo.fn_parsename(AllLevels,'»',6) As Level6,

    dbo.fn_parsename(AllLevels,'»',7) As Level7,

    dbo.fn_parsename(AllLevels,'»',8) As Level8,

    dbo.fn_parsename(AllLevels,'»',9) As Level9

    FROM MySampleData

    and here's my version for you to enhance or change if you want:

    CREATE FUNCTION dbo.fn_parsename

    (

    @pString VARCHAR(7999),

    @pDelimiter CHAR(1),

    @Occurrance int

    )

    RETURNS VARCHAR(8000)

    AS

    BEGIN

    DECLARE @Results VARCHAR(8000)

    --===== "Inline" CTE Driven "Tally Table” produces values up to

    -- 10,000... enough to cover VARCHAR(8000)

    ;WITH

    E1(N) AS ( --=== Create Ten 1's

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 --10

    ),

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000

    cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT N)) FROM E4) ,

    --===== Do the split

    InterResults

    AS

    (

    SELECT ROW_NUMBER() OVER (ORDER BY N) AS ItemNumber,

    SUBSTRING(@pString, N, CHARINDEX(@pDelimiter, @pString + @pDelimiter, N) - N) AS Item

    FROM cteTally

    WHERE N < LEN(@pString) + 2

    AND SUBSTRING(@pDelimiter + @pString, N, 1) = @pDelimiter

    )

    SELECT @Results = Item FROM InterResults WHERE ItemNumber = @Occurrance

    return @Results

    END --FUNCTION

    GO

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • vinu512 (2/7/2013)


    I worked it out. This is the query that would get you the results you want:

    --Creating Table

    Create Table Ex1

    (

    Id Int Identity(1,1),

    AllLevels Varchar(MAX)

    )

    --Inserting Sample Data

    Insert Into Ex1

    Select 'Analysis»Letters»Numbers»Detail'

    Union ALL

    Select 'Description»Letters»Numbers»Detail'

    Union ALL

    Select 'Planning»Letters»Symbols»Detail'

    --Query for your requirement

    Select Id,

    MAX(Case When rn = 'Level1' Then df Else '' End) As Level1,

    MAX(Case When rn = 'Level2' Then df Else '' End) As Level2,

    MAX(Case When rn = 'Level3' Then df Else '' End) As Level3,

    MAX(Case When rn = 'Level4' Then df Else '' End) As Level4

    From

    (

    Select Id, AllLevels, a.df, 'Level' + Cast(ROW_NUMBER() Over(Partition By Id Order By Id) As Varchar) As rn From Ex1 As b

    Cross Apply

    (

    SELECT SUBSTRING('»' + b.AllLevels + '»',N+1,CHARINDEX('»','»' + b.AllLevels + '»',N+1)-N-1) As df

    FROM dbo.Tally As a

    WHERE a.N < LEN('»' + b.AllLevels + '»')

    AND SUBSTRING('»' + b.AllLevels + '»',N,1) = '»'

    ) a

    ) As p

    Group By Id

    But, it takes a 8 looooong seconds to get the result for this teeeny sample data. So, for tables with huge amounts of data this would definitely not be a good one.

    This means that your design is not good. Inserting Delimited strings and pivoting them is not what SQL Server is meant to do.

    You should rather do it with a reporting tool.

    This is what I could come up with...may be someone might come up with something better.

    Sooooo close Vinu! Just a few little tweaks and it runs like a rocket:

    --Query for your requirement

    SELECT b.*, x.*

    FROM #Ex1 As b

    CROSS APPLY ( -- x

    SELECT

    MAX(Case When rn = 'Level1' Then df Else '' End) As Level1,

    MAX(Case When rn = 'Level2' Then df Else '' End) As Level2,

    MAX(Case When rn = 'Level3' Then df Else '' End) As Level3,

    MAX(Case When rn = 'Level4' Then df Else '' End) As Level4

    FROM ( -- p

    SELECT

    a.df,

    rn = 'Level' + CAST(ROW_NUMBER() OVER(Partition By Id Order By Id) As Varchar)

    FROM ( -- a

    SELECT df = LEFT(b.AllLevels, CHARINDEX('»',b.AllLevels,1)-1)

    UNION ALL

    SELECT df = SUBSTRING(b.AllLevels,N+1,

    ISNULL(NULLIF(CHARINDEX('»',b.AllLevels,n+1),0)-(n+1),8000))

    FROM dbo.Tally As a

    WHERE a.N < = DATALENGTH(b.AllLevels)

    AND SUBSTRING(b.AllLevels,N,1) = '»'

    ) a

    ) p

    ) x

    ORDER BY Id

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • How about using the DelimitedSplit8k? This seems to be super fast to me.

    Select Id,

    MAX(Case When ItemNumber = 1 Then Item Else '' End) As Level1,

    MAX(Case When ItemNumber = 2 Then Item Else '' End) As Level2,

    MAX(Case When ItemNumber = 3 Then Item Else '' End) As Level3,

    MAX(Case When ItemNumber = 4 Then Item Else '' End) As Level4

    From

    (

    select *

    from Ex1

    cross apply dbo.DelimitedSplit8K(ex1.AllLevels, '»')

    ) As p

    Group By Id

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply