Split String into 3 parts with a twist!

  • I have the following sample data (original and desired results)

    with cteOriginal (CompanyName, CompName1, CompName2, CompName3)

    as

    (

    select 'A V H S L ALQUILER Y VENTA DE HOGARES', NULL, NULL, NULL union all

    select 'TAXI GIL PIERRE', NULL, NULL, NULL union all

    select 'DE GOUDSMID M HEIJKOOP', NULL, NULL, NULL union all

    select 'S P S S', NULL, NULL, NULL union all

    select 'BOUCHERIE CHARCUTERIE ST HENRI', NULL, NULL, NULL union all

    select 'A A M RODERKERKEN', NULL, NULL, NULL union all

    select 'O MUNDO E A NOSSA CASA - C H E', NULL, NULL, NULL)

    select * from cteOriginal

    ;with cteResult (CompanyName, CompName1, CompName2, CompName3)

    as

    (

    select 'A V H S L ALQUILER Y VENTA DE HOGARES', 'AVHS', 'ALQUILER', 'Y' union all

    select 'TAXI GIL PIERRE', 'TAXI', 'GIL', 'PIERRE' union all

    select 'DE GOUDSMID M HEIJKOOP', 'DE', 'GOUDSMID', 'M' union all

    select 'S P S S', 'SPSS', NULL, NULL union all

    select 'BOUCHERIE CHARCUTERIE ST HENRI', 'BOUCHERIE', 'CHARCUTERIE', 'ST' union all

    select 'A A M RODERKERKEN', 'AAM', 'RODERKERKEN', NULL union all

    select 'O MUNDO E A NOSSA CASA - C H E', 'O', 'MUNDO', 'E')

    select * from cteResult

    I have functions that extract first and second words but this isn't working for records like number 4.

    Any clever suggestions?

    Thanks.

    ---------------------------------------------------------

    It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
    David Edwards - Media lens[/url]

    Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
    Howard Zinn

  • Not exactly clever nor pretty but this might get you close:

    with cteOriginal (CompanyName, CompName1, CompName2, CompName3)

    as

    (

    select 'A V H S L ALQUILER Y VENTA DE HOGARES', NULL, NULL, NULL union all

    select 'TAXI GIL PIERRE', NULL, NULL, NULL union all

    select 'DE GOUDSMID M HEIJKOOP', NULL, NULL, NULL union all

    select 'S P S S', NULL, NULL, NULL union all

    select 'BOUCHERIE CHARCUTERIE ST HENRI', NULL, NULL, NULL union all

    select 'A A M RODERKERKEN', NULL, NULL, NULL union all

    select 'O MUNDO E A NOSSA CASA - C H E', NULL, NULL, NULL)

    SELECT CompanyName, CompName1, CompName2

    ,CompName3=LTRIM(LEFT(CompName3, CHARINDEX(' ', CompName3 + ' ')))

    FROM (

    SELECT CompanyName, CompName1

    ,CompName2=LTRIM(LEFT(CompName2, CHARINDEX(' ', CompName2 + ' ')))

    ,CompName3=LTRIM(RIGHT(CompName2, LEN(CompName2)-LEN(LEFT(CompName2, CHARINDEX(' ', CompName2 + ' ')))))

    FROM (

    SELECT CompanyName, CompName1=REPLACE(CompName1, ' ', '')

    ,CompName2=LTRIM(SUBSTRING(CompanyName, DATALENGTH(CompName1)+1, LEN(CompanyName)))

    FROM (

    SELECT CompName1=

    CASE WHEN PATINDEX('[A-Z][ ][A-Z][ ][A-Z][ ][A-Z][ ]%', CompanyName + ' ') = 1

    THEN LEFT(CompanyName, 8)

    WHEN PATINDEX('[A-Z][ ][A-Z][ ][A-Z][ ]%', CompanyName + ' ') = 1

    THEN LEFT(CompanyName, 6)

    WHEN PATINDEX('[A-Z][ ][A-Z][ ]%', CompanyName + ' ') = 1

    THEN LEFT(CompanyName, 4)

    WHEN PATINDEX('[A-Z][ ]%', CompanyName + ' ') = 1

    THEN LEFT(CompanyName, 2)

    ELSE LEFT(CompanyName, CHARINDEX(' ', CompanyName)-1) END

    ,CompanyName

    FROM cteOriginal) a) a) a


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Thanks dwain.

    You should see what I came up with lol... ii's shockingly bad compared to yours.

    Mine has so many IF statments lol!

    ---------------------------------------------------------

    It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
    David Edwards - Media lens[/url]

    Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
    Howard Zinn

  • I should have been clearer with my problem definition!

    The rules are as follows:

    I have a string = 'A B C D EFG ProperWord QR S T'

    I want to generate a comma delimited 3 part string with the following rules:

    1) Part1 = Concatenate all single chacaters in the string until you reach part of string where two or more characters are joined

    So based on the above string Part1 would be ABCD

    2) Part2 = EFG

    3) Part3 = ProperWord

    This is some of the code I'be writing but I just can't get this to work well!

    ALTER FUNCTION dbo.fn_NormaliseOrgName(@OrgName NVARCHAR(200))

    RETURNS NVARCHAR(200)

    AS

    BEGIN

    declare @OrgName NVARCHAR(200) = 'RA M I R A DI PETTI C & C SAS'

    DECLARE @Part1 NVARCHAR(200) = ''

    DECLARE @Part2 NVARCHAR(200) = ''

    DECLARE @Part3 NVARCHAR(200) = ''

    IF PATINDEX('%[A-Z][A-Z]%', @OrgName) > 1

    BEGIN

    SET @Part1 = LEFT(@OrgName, PATINDEX('%[A-Z][A-Z]%', @OrgName)-1)

    IF LTRIM(RTRIM(REPLACE(@OrgName, @Part1, '') = ''

    BEGIN

    RETURN @Part1 + ',' + @Part2 + ',' + @Part3

    END

    ELSE

    select @Part1 = case when PATINDEX('%[A-Z][A-Z]%', @OrgName) > 1

    then left(@OrgName, PATINDEX('%[A-Z][A-Z]%', @OrgName)-1)

    else LEFT(@OrgName, patindex('%[ ]%', @OrgName + ' ')-1) end

    select @part1

    select @Part2 = substring(@OrgName, len(@Part1)+1 , LEN(@OrgName))

    select @Part2

    --select @Part1

    select @Part2 = left( ltrim(substring(@OrgName, len(@Part1)+1 , LEN(@OrgName))), charindex(' ', ltrim(substring(@OrgName, len(@Part1)+1 , LEN(@OrgName))))-1)

    select @Part2

    --RETURN @Part1

    RETURN @Part2

    --RETURN @Part3

    END

    select dbo.fn_NormaliseOrgName('R A M I R A DI PETTI C & C SAS')

    ---------------------------------------------------------

    It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
    David Edwards - Media lens[/url]

    Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
    Howard Zinn

  • Abu,

    I think my code will handle the sample you provided in your text, albeit it doesn't join the 3 parts into a comma delimited string (but that's easy).

    On the other hand, I am unsure of how to handle the example in your code:

    R A M I R A DI PETTI C & C SAS

    If your intent is to combine to RAMIRA, you could do that by introducing two additional WHEN clauses at the top of my CASE on PATINDEX.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Hi

    This isn't exactly pretty ... but it's another way to skin the cat

    ;with cteOriginal (CompanyName, CompName1, CompName2, CompName3)

    as

    (

    select 'A V H S L ALQUILER Y VENTA DE HOGARES', NULL, NULL, NULL union all

    select 'TAXI GIL PIERRE', NULL, NULL, NULL union all

    select 'DE GOUDSMID M HEIJKOOP', NULL, NULL, NULL union all

    select 'S P S S', NULL, NULL, NULL union all

    select 'BOUCHERIE CHARCUTERIE ST HENRI', NULL, NULL, NULL union all

    select 'A A M RODERKERKEN', NULL, NULL, NULL union all

    select 'O MUNDO E A NOSSA CASA - C H E', NULL, NULL, NULL)

    -- Compress up the first single letters in the string

    ,preprocess as (

    select CompanyName

    ,stuff(

    CompanyName

    ,1

    ,isnull(nullif(patindex('% [a-z][a-z]%',substring(companyname,1,8)),0),isnull(nullif(charindex(' ',CompanyName,7),0),999))

    ,replace(substring(companyname,1,isnull(nullif(patindex('% [a-z][a-z]%',substring(companyname,1,8)),0),isnull(nullif(charindex(' ',CompanyName,7),0),999))),' ','') + ' '

    ) StuffedCompanyName

    from cteOriginal

    )

    -- extract each company name in turn.

    ,r1 as (

    select CompanyName

    ,rtrim(substring(StuffedCompanyName,1,isnull(nullif(charindex(' ',StuffedCompanyName),0),999))) Company1

    ,ltrim(substring(StuffedCompanyName,isnull(nullif(charindex(' ',StuffedCompanyName),0),999),999)) leftover

    from preprocess

    )

    ,r2 as (

    select CompanyName

    ,Company1

    ,nullif(rtrim(substring(leftover,1,isnull(nullif(charindex(' ',leftover),0),999))),'') Company2

    ,ltrim(substring(leftover,isnull(nullif(charindex(' ',leftover),0),999),999)) leftover

    from r1

    )

    ,r3 as (

    select CompanyName

    ,Company1

    ,Company2

    ,nullif(rtrim(substring(leftover,1,isnull(nullif(charindex(' ',leftover),0),999))),'') Company3

    from r2

    )

    select * from r3

    Micky

  • mickyT thanks so much for this! I forgot to thank you in my other thread!

    Your help is much appreciated!

    ---------------------------------------------------------

    It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
    David Edwards - Media lens[/url]

    Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
    Howard Zinn

  • No problem and you're welcome

  • It took a little effort, but here's a replace function that will do the job and more. Won't swear it can't be improved upon or what kind of performance curve it may have. For this small example it's quick. But consider it nothing more than a first draft. If you have thoughts of using it in a production environment you should certainly do some stress testing since it's a scalar function. If you are going to use it (for example) to just import data once a day then it should work just fine. I just wouldn't stick it in the SELECT statement of a heavily-used query in production without testing it thoroughly.

    EDIT: I read your notes above after posting this. The function should work fine as it is, but you might need to rearrange a few columns in the pivot table query to get exactly what you need.

    CREATE FUNCTION dbo.svfSplitReplaceByDelimiter

    (

    @STR VARCHAR(8000)

    ,@delimiter CHAR(1)

    ,@subchar1 CHAR(1) = '@'

    ,@subchar2 CHAR(1) = '~'

    ,@replaceval VARCHAR(50) = '|'

    )

    RETURNS VARCHAR(MAX)

    AS

    BEGIN

    /*

    Parameters:

    @STR VARCHAR(8000) -- the input string

    ,@delimiter CHAR(1) -- the delimiter to use for the split

    ,@subchar1 CHAR(1) = '@' -- pick a character not in the input string (internal use)

    ,@subchar2 CHAR(1) = '~' -- pick a character not in the input string (internal use)

    ,@replaceval VARCHAR(50) = '|' -- the delimiter used for the final array; can be up to 50 chars

    per tag and can be useful for building XML or HTML strings

    */

    DECLARE

    @pos INT

    ,@len INT

    ,@strRep VARCHAR(8000)

    ,@strTemp VARCHAR(8000)

    ,@strRFrag VARCHAR(50)

    ,@strOut VARCHAR(8000)

    ,@counter INT

    SET @STR = RTRIM(LTRIM(@str))

    SET @strRep = REPLACE(@str,@delimiter,@subchar1)

    SET @strTemp = @strRep

    SET @strOut = ''

    --get the last chunk for use later

    SET @pos = CHARINDEX(@subchar1,REVERSE(@strTemp))-1

    SET @strRFrag = RIGHT(@strTemp,@pos)

    --start looking for single characters

    --and if found aggregate them

    SET @counter = 1

    SET @pos = CHARINDEX(@subchar1,@strRep)

    SET @len = LEN(@strTemp)

    WHILE @counter <= @len

    BEGIN

    SET @strTemp = LEFT(@strRep,@pos)

    IF RIGHT(@strTemp,1) = @subchar1

    BEGIN

    IF @pos = 2

    SET @strTemp = REPLACE(@strTemp,@subchar1,@subchar2)

    ELSE

    SET @strTemp = REPLACE(@strTemp,@subchar1,@delimiter)

    IF @pos > 2

    SET @strOut = @strOut + @delimiter + @strTemp

    ELSE

    SET @strOut = @strOut + @strTemp

    IF RIGHT(@strOut,1) = @subchar2

    SET @strOut = LEFT(@strOut,LEN(@strOut)-1)

    SET @pos = CHARINDEX(@subchar1,@strRep)

    SET @strRep = RIGHT(@strRep,LEN(@strRep)-@pos)

    END

    SET @pos = CHARINDEX(@subchar1,@strRep)

    SET @counter = @counter + 1

    END

    IF LEN(@strRFrag) = 1

    SET @strOut = @strOut + @strRFrag

    ELSE

    SET @strOut = @strOut + @delimiter + @strRFrag

    SET @strOut = REPLACE(REPLACE(@strOut,@delimiter+@delimiter,@delimiter),' ',' ')

    SET @strOut = REPLACE(REPLACE(@strOut,@delimiter,@replaceval),' ',' ')

    IF LEFT(@strOut,1) = @delimiter

    SET @strOut = RIGHT(@strOut,LEN(@strOut)-1)

    IF RIGHT(@strOut,1) = @delimiter

    SET @strOut = LEFT(@strOut,LEN(@strOut)-1)

    SET @strOut = RTRIM(LTRIM(@strOut))

    RETURN @strOut

    END

    GO

    Now load the sample data and test the function. (Requires the function DelimitedSplit8K which can be found by searching this site. It's been posted so many times and it's such an indispensable function that I figure most people already have it.)

    IF OBJECT_ID('tempdb..#TempTable') IS NOT NULL

    DROP TABLE #TempTable

    CREATE TABLE #TempTable (

    [ID] INT IDENTITY(1,1) NOT NULL,

    [OrigCompanyName] NVARCHAR(150) NULL,

    [CompanyName] NVARCHAR(150) NULL,

    PRIMARY KEY (ID))

    ;WITH cteOriginal(CompanyName,CompName1,CompName2,CompName3)

    AS (

    SELECT

    'A V H S L ALQUILER Y VENTA DE HOGARES'

    ,NULL

    ,NULL

    ,NULL

    UNION ALL

    SELECT

    'TAXI GIL PIERRE'

    ,NULL

    ,NULL

    ,NULL

    UNION ALL

    SELECT

    'DE GOUDSMID M HEIJKOOP'

    ,NULL

    ,NULL

    ,NULL

    UNION ALL

    SELECT

    'S P S S'

    ,NULL

    ,NULL

    ,NULL

    UNION ALL

    SELECT

    'BOUCHERIE CHARCUTERIE ST HENRI'

    ,NULL

    ,NULL

    ,NULL

    UNION ALL

    SELECT

    'A A M RODERKERKEN'

    ,NULL

    ,NULL

    ,NULL

    UNION ALL

    SELECT

    'O MUNDO E A NOSSA CASA - C H E'

    ,NULL

    ,NULL

    ,NULL

    )

    INSERT INTO #TempTable

    SELECT

    CompanyName AS OrigCompanyName

    ,(SELECT dbo.svfSplitReplaceByDelimiter(cte.CompanyName,' ','@','~','|')) AS CompanyName

    FROM

    cteOriginal AS cte

    --get rid of stray leading and trailing characters

    UPDATE #TempTable

    SET CompanyName =

    (CASE

    WHEN LEFT(CompanyName,1) = '|' THEN RIGHT(CompanyName,LEN(CompanyName)-1)

    WHEN RIGHT(CompanyName,1) = '|' THEN LEFT(CompanyName,LEN(CompanyName)-1)

    ELSE CompanyName

    END)

    --create the pivot table result

    SELECT

    r1.ID

    ,(SELECT OrigCompanyName FROM #TempTable WHERE ID = r1.ID) AS CompanyName

    ,r1.Part1

    ,r1.Part2

    ,r1.Part3

    ,r1.Part4

    ,r1.Part5

    ,r1.Part6

    FROM

    (

    SELECT

    r.ID

    ,MAX(CASE WHEN r.ItemNumber = 1 THEN r.Item END) AS Part1

    ,MAX(CASE WHEN r.ItemNumber = 2 THEN r.Item END) AS Part2

    ,MAX(CASE WHEN r.ItemNumber = 3 THEN r.Item END) AS Part3

    ,MAX(CASE WHEN r.ItemNumber = 4 THEN r.Item END) AS Part4

    ,MAX(CASE WHEN r.ItemNumber = 5 THEN r.Item END) AS Part5

    ,MAX(CASE WHEN r.ItemNumber = 6 THEN r.Item END) AS Part6

    FROM

    (

    SELECT

    ID

    ,ItemNumber

    ,Item

    FROM

    #TempTable AS tt

    CROSS APPLY

    dbo.DelimitedSplit8K(tt.CompanyName,'|') AS dsk1

    WHERE

    dsk1.Item <> ''

    ) r

    GROUP BY

    r.ID

    ) r1

    The output:

    IDCompanyNamePart1Part2Part3Part4Part5Part6

    1A V H S L ALQUILER Y VENTA DE HOGARESAVHSLALQUILERYVENTADEHOGARES

    2TAXI GIL PIERRETAXIGILPIERRENULLNULLNULL

    3DE GOUDSMID M HEIJKOOPDEGOUDSMIDMHEIJKOOPNULLNULL

    4S P S SSPSSNULLNULLNULLNULLNULL

    5BOUCHERIE CHARCUTERIE ST HENRIBOUCHERIECHARCUTERIESTHENRINULLNULL

    6A A M RODERKERKENAAMRODERKERKENNULLNULLNULLNULL

    7O MUNDO E A NOSSA CASA - C H EOMUNDOEANOSSACASA-CHE

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply