Split string

  • I have a table where values into field is like this way

    There are multiple rows now we need to find distinct value from all this row and create a lookup table

    Here are 3 rows from this tables

    1       ["Acct Vert - Chemical","Acct Vert - Energy","Acct Vert - Seaport","Prod Vert - Access Control","Prod Vert - Structured Cabling","Prod Vert - Video"]

    2     ["Acct Vert - Chemical","Acct Vert - Energy","Acct Vert - Seaport","Prod Vert - Access Control","Prod Vert - Video"]

    3     ["Acct Vert - Chemical"]

    I was thinking to create a data from split string function and get data.

    After doing that i need to create another table where main table id and lookup id and create a record.

    Here is examples

    After creating a lookup table data will like this way

    1 Acct Vert – Chemical

    2 Acct Vert – Energy

    3 Acct Vert - Seaport

    Here is another table we need it

    1 1

    1 2

    1 3

    2 1

    2 2

    2 2

    3 1

    Create table Maintable ( id int IDENTITY(1,1), multiname varchar(500) )

    Create table looktable ( id int IDENTITY(1,1), name varchar(100) )

    Create table facttable ( MainTableid int , Lookuptableid int )

    insert into Maintable Values ('["Acct Vert - Chemical","Acct Vert - Energy","Acct Vert - Seaport","Prod Vert - Access Control","Prod Vert - Structured Cabling","Prod Vert - Video"]')

    insert into Maintable Values ('["Acct Vert - Chemical","Acct Vert - Energy","Acct Vert - Seaport","Prod Vert - Access Control","Prod Vert - Video"]')

    insert into Maintable Values ('["Acct Vert - Chemical"]')

  • Did you searched SSC for a split string article ?

    "Tally OH! An Improved SQL 8K “CSV Splitter” Function"

     

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • This was removed by the editor as SPAM

  • Here's a quick example solution that may work depending on the data size, processing frequency, etc. (Temporary tables used here for example dataset processing)

    IF OBJECT_ID('tempdb..#Maintable') IS NULL
    BEGIN
    --DROP TABLE #Maintable

    Create table #Maintable ( id int IDENTITY(1,1), multiname varchar(500) );
    insert into #Maintable Values ('["Acct Vert - Chemical","Acct Vert - Energy","Acct Vert - Seaport","Prod Vert - Access Control","Prod Vert - Structured Cabling","Prod Vert - Video"]')
    insert into #Maintable Values ('["Acct Vert - Chemical","Acct Vert - Energy","Acct Vert - Seaport","Prod Vert - Access Control","Prod Vert - Video"]')
    insert into #Maintable Values ('["Acct Vert - Chemical"]')

    END

    IF OBJECT_ID('tempdb..#Maintable_Split') IS NULL
    BEGIN
    --DROP TABLE #Maintable_Split

    ; WITH cte AS (
    SELECT
    x.id
    , REPLACE(REPLACE(REPLACE(
    x.multiname
    ,'["','')-- replace the array square brace and opening item beginning double quote
    ,'"]','')-- replace the array square brace and closing item ending double quote
    ,'","',CHAR(30))-- replace the (double quote - comma - double quote) record delimiter to a single sparsely used hidden characture for use in the string split later
    AS multiname
    FROM #Maintable x
    )

    SELECT
    x.id
    , y.value AS name
    INTO #Maintable_Split
    FROM cte x
    OUTER APPLY STRING_SPLIT(x.multiname,CHAR(30)) y -- split the string based on the single sparsely used hidden characture you created above in the cte

    END

    IF OBJECT_ID('tempdb..#looktable') IS NULL
    BEGIN
    --DROP TABLE #looktable

    Create table #looktable ( id int IDENTITY(1,1), name varchar(100) )

    INSERT INTO #looktable(name)
    SELECT
    x.name
    FROM #Maintable_Split x
    LEFT JOIN #looktable y ON y.name = x.name
    WHERE y.id IS NULL
    GROUP BY
    x.name

    END

    IF OBJECT_ID('tempdb..#facttable') IS NULL
    BEGIN
    --DROP TABLE #facttable

    Create table #facttable ( MainTableid int , Lookuptableid int )

    INSERT INTO #facttable (MainTableid,Lookuptableid)
    SELECT
    x.id AS MainTableid
    , y.id AS Lookuptableid
    FROM #Maintable_Split x
    JOIN #looktable y ON y.name = x.name
    LEFT JOIN #facttable z ON z.MainTableid = x.id AND z.Lookuptableid = y.id
    WHERE z.MainTableid IS NULL

    END

    SELECT *
    FROM #facttable x
    WHERE 1=1
  • @sks_989 ,

    You're using SQL Server 2017.  It has a nice little goodie known as TRIM() that will easily replace the need for multiple REPLACES.

    Also, get in the habit of leaving some simple bread-crumbs in the form of comments that provide both visual code separators and a short logical description of what the code does for the next person to touch the code... which might be you a year from now. 😀

    This will do as you ask except I'll let you have some of the fun by writing the two FK's that are necessary for the #FactTable.

    --=============================================================================
    -- Create and populate the MainTable from the given data.
    -- This is NOT a part of the solution. This is just setting up the data.
    --=============================================================================
    DROP TABLE IF EXISTS #MainTable
    ;
    CREATE TABLE #MainTable
    (
    ID int IDENTITY(1,1)
    ,MultiName varchar(500) NOT NULL
    ,CONSTRAINT PK_MainTable PRIMARY KEY CLUSTERED (ID)
    )
    ;
    INSERT INTO #MainTable WITH (TABLOCK)
    VALUES ('["Acct Vert - Chemical","Acct Vert - Energy","Acct Vert - Seaport","Prod Vert - Access Control","Prod Vert - Structured Cabling","Prod Vert - Video"]')
    ,('["Acct Vert - Chemical","Acct Vert - Energy","Acct Vert - Seaport","Prod Vert - Access Control","Prod Vert - Video"]')
    ,('["Acct Vert - Chemical"]')
    ;
    --=============================================================================
    -- Create and populate a working table that contains the split/clean data.
    --=============================================================================
    DROP TABLE IF EXISTS #Split
    ;
    SELECT MainTableID = mt.ID
    ,t1.Name
    INTO #Split
    FROM #Maintable mt
    CROSS APPLY STRING_SPLIT(multiname,',')s1
    CROSS APPLY (VALUES(TRIM('[]"' FROM s1.Value)))t1(Name)
    ;
    --=============================================================================
    -- Create and populate the Lookup table from the unique split data.
    --=============================================================================
    DROP TABLE IF EXISTS #LookUp
    ;
    CREATE TABLE #LookUp
    (
    LookUpTableID int IDENTITY(1,1)
    ,Name varchar(100) NOT NULL
    ,CONSTRAINT PK_LookUp PRIMARY KEY CLUSTERED (LookUpTableID)
    ,INDEX AK_Name UNIQUE (Name)
    )
    ;
    INSERT INTO #LookUp WITH (TABLOCK)
    (Name)
    SELECT DISTINCT
    Name
    FROM #Split
    ORDER BY Name
    OPTION (MAXDOP 1) --Keep parallelism from messing up the order
    ;
    --=============================================================================
    -- Create and populate the FactTable "bridge" table joning by VERT names.
    --=============================================================================
    DROP TABLE IF EXISTS #FactTable
    ;
    CREATE TABLE #FactTable
    (
    MainTableID int NOT NULL
    ,LookupTableID int NOT NULL
    ,CONSTRAINT PK_FactTable PRIMARY KEY CLUSTERED (MainTableID,LookupTableID)
    )
    ;
    INSERT INTO #FactTable WITH (TABLOCK)
    (MainTableID, LookUpTableID)
    SELECT s.MainTableID
    ,l.LookUpTableID
    FROM #Split s
    JOIN #LookUp l ON s.Name = l.Name
    ;
    --===== All done... Drink BEER!

     

    Also, I did like like Justin did... I used Temp Tables for this demo because I use DROP TABLE for this demo to make demo reruns easier and don't want you to make the mistake of dropping a real table by mistake. 😀

    Also... if it were me, I'd split the "VERT" names into a VertCategory and VertType column.  For that, you'd need to use the DelimitedSplit8K function that Johan provide the link for above because STRING_SPLIT() does not return a position ordinal until 2022 (what the hell was MS thinking when they made that mistake?).  It'll make future data analysis and reporting a shedload easier.  I didn't want to presume to do it that way for you, though.

    @justin... welcome aboard, mate!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Quick thought, this can be simplified by using OPENJSON

    😎

    Here is an example:

    USE TEEST;
    GO
    SET NOCOUNT ON;
    GO
    ---------------------------------------------------------------------
    -- Declare the test variable
    ---------------------------------------------------------------------
    DECLARE @TSTRING NVARCHAR(MAX) = N'["Acct Vert - Chemical","Acct Vert - Energy","Acct Vert - Seaport","Prod Vert - Access Control","Prod Vert - Structured Cabling","Prod Vert - Video"]';
    ---------------------------------------------------------------------
    -- Return the Ordinal position (key) and the Value for each entry in
    -- the test string
    ---------------------------------------------------------------------
    ;WITH PRE_PARSE(VKEY,VVAL) AS
    (
    SELECT
    SJ.
    ,SJ.[value]
    FROM OPENJSON(@TSTRING) SJ
    )
    ---------------------------------------------------------------------
    -- Split each value by the "-" delimiter
    ---------------------------------------------------------------------
    ,SPLIT_VALUES(RRID,VKEY,SVAL) AS
    (
    SELECT
    ROW_NUMBER() OVER (PARTITION BY PP.VKEY ORDER BY PP.VKEY) AS RRID
    ,PP.VKEY
    ,SP.[value] AS SVAL
    FROM PRE_PARSE PP
    CROSS APPLY string_split(PP.VVAL,N'-') SP
    )
    ---------------------------------------------------------------------
    -- Final output
    ---------------------------------------------------------------------
    SELECT
    SV.VKEY
    ,MAX(CASE WHEN RRID = 1 THEN SV.SVAL END) AS VCOL
    ,MAX(CASE WHEN RRID = 2 THEN SV.SVAL END) AS VVAL
    FROM SPLIT_VALUES SV
    GROUP BY SV.VKEY
    ;

    The result set:

    VKEY  VCOL        VVAL
    ----- ----------- --------------------
    0 Acct Vert Chemical
    1 Acct Vert Energy
    2 Acct Vert Seaport
    3 Prod Vert Access Control
    4 Prod Vert Structured Cabling
    5 Prod Vert Video
  • Jeff's code seems to generate primary keys and produces the correct output afaik.  Creating foreign keys should be straightforward for the OP.  The split could be accomplished with OPENJSON without specifying a schema and without the need for STRING_SPLIT imo

    Using OPENJSON (Jeff's code)

    --===== Before getting started... Drink beer NOW!
    --=============================================================================
    -- Create and populate a working table that contains the split/clean data.
    --=============================================================================
    DROP TABLE IF EXISTS #Split
    ;
    SELECT MainTableID = mt.ID
    ,oj.value [Name]
    INTO #Split
    FROM #Maintable mt
    CROSS APPLY openjson(mt.MultiName) oj
    ;

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Although JSON is a simple solution, I generally avoid it for single level JSON because it's a fair bit slower than DelimitedSplit8k, Eirikur's wonderful "Lead" addition to the DelimitedSplit8k function, or STRING_SPLIT().  Just sayin'... 😀

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Comma is not an escape character in JSON (towards the bottom of the page) 😉

    https://www.json.org/json-en.html

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Steve Collins wrote:

    Comma is not an escape character in JSON (towards the bottom of the page) 😉

    https://www.json.org/json-en.html%5B/quote%5D

    M$-JSon might not be strict 😉

    😎

     

  • Steve Collins wrote:

    Jeff's code seems to generate primary keys and produces the correct output afaik.  Creating foreign keys should be straightforward for the OP.  The split could be accomplished with OPENJSON without specifying a schema and without the need for STRING_SPLIT imo

    Using OPENJSON (Jeff's code)

    --===== Before getting started... Drink beer NOW!
    --=============================================================================
    -- Create and populate a working table that contains the split/clean data.
    --=============================================================================
    DROP TABLE IF EXISTS #Split
    ;
    SELECT MainTableID = mt.ID
    ,oj.value [Name]
    INTO #Split
    FROM #Maintable mt
    CROSS APPLY openjson(mt.MultiName) oj
    ;

    The same can be done without specifying a schema with just about any method capable of performing a Split.  The OP asked for tables, though... probably for building some form of Normalized data for much larger data or maybe even a DW.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • If you load up one of the strings with a bunch of commas it causes an error afaik.  JSON doesn't care how many commas in a row.  A tiny point maybe 🙂

    Try with this input

       DROP TABLE IF EXISTS #MainTable
    ;
    CREATE TABLE #MainTable
    (
    ID int IDENTITY(1,1)
    ,MultiName varchar(500) NOT NULL
    ,CONSTRAINT PK_MainTable PRIMARY KEY CLUSTERED (ID)
    )
    ;
    INSERT INTO #MainTable WITH (TABLOCK)
    VALUES ('["Acct Vert - Chemical,,, etc","Acct Vert - Energy","Acct Vert - Seaport","Prod Vert - Access Control","Prod Vert - Structured Cabling","Prod Vert - Video"]')
    ,('["Acct Vert - Chemical","Acct Vert - Energy","Acct Vert - Seaport","Prod Vert - Access Control","Prod Vert - Video"]')
    ,('["Acct Vert - Chemical"]')
    ;

    Also, Jeff could you explain a little about the (TABLOCK) hint?  Also, " OPTION (MAXDOP 1) --Keep parallelism from messing up the order"?

     

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Steve Collins wrote:

    Also, Jeff could you explain a little about the (TABLOCK) hint?  Also, " OPTION (MAXDOP 1) --Keep parallelism from messing up the order"?

    Sure...

    About the WITH (TABLOCK)... it's being used on a Temp table and TembDB is in the simple recovery model.  We're inserting more than one row into a Temp table that has a Clustered Index on it.  WITH (TABLOCK) will cause it to be minimally logged because of the Simple Recovery Model.  It's become an automatic habit for me.  I do it even if there's no Clustered Index involved or something else may prevent the Minimal Logging because it'll still prevent "eventual escalation" from row to page and make things a bit faster.  Lot's faster if there's a fair bit of data and it has a Clustered Index on it when starting out empty.  It'll also cause minimal logging if inserting into a heap.  You can't get Minimal logging for INSERTs without it so I add it to code in TempDB as a matter of rote.

    As for the MAXDOP 1 thing, that's another habit of mine if I want the IDENTITY column to reflect the order of data.  Even if parallelism weren't possible, using MAXDOP in such a fashion pretty much guarantees it won't happen.  If it does, things might not end up in the order I want them.  It's also a visual indicator (to me, as least) that my intent for the IDENTITY column matching the sort order was of primary concern.

    For inserts that also have variables in the code, I'll also include the Recompile option for such inserts  because Minimal Logging frequently doesn't happen without it if variables are present.  Sometimes it not required but I don't trust MS or the optimizer to make such a decision for me.

    Some people claim it's all overkill.  Ok... whatever.  Someday they'll need to wear "Depends" and then they'll understand. 😀

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Interesting and much appreciated.  Happy Thanksgiving!

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply