Split string

Question

Post reply

Split string

sks_989

SSC Eights!

Points: 843
More actions
November 9, 2022 at 6:28 am

#4110354

I have a table where values into field is like this way
There are multiple rows now we need to find distinct value from all this row and create a lookup table
Here are 3 rows from this tables
1 ["Acct Vert - Chemical","Acct Vert - Energy","Acct Vert - Seaport","Prod Vert - Access Control","Prod Vert - Structured Cabling","Prod Vert - Video"]
2 ["Acct Vert - Chemical","Acct Vert - Energy","Acct Vert - Seaport","Prod Vert - Access Control","Prod Vert - Video"]
3 ["Acct Vert - Chemical"]
I was thinking to create a data from split string function and get data.
After doing that i need to create another table where main table id and lookup id and create a record.
Here is examples
After creating a lookup table data will like this way
1 Acct Vert – Chemical
2 Acct Vert – Energy
3 Acct Vert - Seaport
Here is another table we need it
1 1
1 2
1 3
2 1
2 2
2 2
3 1
Create table Maintable ( id int IDENTITY(1,1), multiname varchar(500) )
Create table looktable ( id int IDENTITY(1,1), name varchar(100) )
Create table facttable ( MainTableid int , Lookuptableid int )
insert into Maintable Values ('["Acct Vert - Chemical","Acct Vert - Energy","Acct Vert - Seaport","Prod Vert - Access Control","Prod Vert - Structured Cabling","Prod Vert - Video"]')
insert into Maintable Values ('["Acct Vert - Chemical","Acct Vert - Energy","Acct Vert - Seaport","Prod Vert - Access Control","Prod Vert - Video"]')
insert into Maintable Values ('["Acct Vert - Chemical"]')
November 18, 2022 at 6:44 am

This was removed by the editor as SPAM

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

Johan Bijnens SSC Guru Points: 135504 More actions · Answer 1

Did you searched SSC for a split string article ?

"Tally OH! An Improved SQL 8K “CSV Splitter” Function"

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data/code to get the best help[/url]

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution

Need a bit of Powershell? How about this

Who am I ? Sometimes this is me but most of the time this is me

justin.drummonds Newbie Points: 1 More actions · Answer 2

Here's a quick example solution that may work depending on the data size, processing frequency, etc. (Temporary tables used here for example dataset processing)

IF OBJECT_ID('tempdb..#Maintable') IS NULL
BEGIN
--DROP TABLE #Maintable

Create table #Maintable ( id int IDENTITY(1,1), multiname varchar(500) );
insert into #Maintable Values ('["Acct Vert - Chemical","Acct Vert - Energy","Acct Vert - Seaport","Prod Vert - Access Control","Prod Vert - Structured Cabling","Prod Vert - Video"]')
insert into #Maintable Values ('["Acct Vert - Chemical","Acct Vert - Energy","Acct Vert - Seaport","Prod Vert - Access Control","Prod Vert - Video"]')
insert into #Maintable Values ('["Acct Vert - Chemical"]')

END

IF OBJECT_ID('tempdb..#Maintable_Split') IS NULL
BEGIN
--DROP TABLE #Maintable_Split

; WITH cte AS (
SELECT
x.id
, REPLACE(REPLACE(REPLACE(
x.multiname
,'["','')-- replace the array square brace and opening item beginning double quote
,'"]','')-- replace the array square brace and closing item ending double quote
,'","',CHAR(30))-- replace the (double quote - comma - double quote) record delimiter to a single sparsely used hidden characture for use in the string split later
AS multiname
FROM #Maintable x
)

SELECT
x.id
, y.value AS name
INTO #Maintable_Split
FROM cte x
OUTER APPLY STRING_SPLIT(x.multiname,CHAR(30)) y -- split the string based on the single sparsely used hidden characture you created above in the cte

END

IF OBJECT_ID('tempdb..#looktable') IS NULL
BEGIN
--DROP TABLE #looktable

Create table #looktable ( id int IDENTITY(1,1), name varchar(100) )

INSERT INTO #looktable(name)
SELECT
x.name
FROM #Maintable_Split x
LEFT JOIN #looktable y ON y.name = x.name
WHERE y.id IS NULL 
GROUP BY
x.name

END

IF OBJECT_ID('tempdb..#facttable') IS NULL
BEGIN
--DROP TABLE #facttable

Create table #facttable ( MainTableid int , Lookuptableid int )

INSERT INTO #facttable (MainTableid,Lookuptableid)
SELECT
x.id AS MainTableid
, y.id AS Lookuptableid
FROM #Maintable_Split x
JOIN #looktable y ON y.name = x.name
LEFT JOIN #facttable z ON z.MainTableid = x.id AND z.Lookuptableid = y.id
WHERE z.MainTableid IS NULL

END

SELECT *
FROM #facttable x
WHERE 1=1

Jeff Moden SSC Guru Points: 1004591 More actions · Answer 3

@sks_989 ,

You're using SQL Server 2017. It has a nice little goodie known as TRIM() that will easily replace the need for multiple REPLACES.

Also, get in the habit of leaving some simple bread-crumbs in the form of comments that provide both visual code separators and a short logical description of what the code does for the next person to touch the code... which might be you a year from now.

This will do as you ask except I'll let you have some of the fun by writing the two FK's that are necessary for the #FactTable.

--=============================================================================
--      Create and populate the MainTable from the given data.
--      This is NOT a part of the solution.  This is just setting up the data.
--=============================================================================
   DROP TABLE IF EXISTS #MainTable
;
 CREATE TABLE #MainTable 
        (
         ID         int IDENTITY(1,1)
        ,MultiName  varchar(500) NOT NULL
        ,CONSTRAINT PK_MainTable PRIMARY KEY CLUSTERED (ID)
        )
;
 INSERT INTO #MainTable WITH (TABLOCK)
 VALUES  ('["Acct Vert - Chemical","Acct Vert - Energy","Acct Vert - Seaport","Prod Vert - Access Control","Prod Vert - Structured Cabling","Prod Vert - Video"]')
        ,('["Acct Vert - Chemical","Acct Vert - Energy","Acct Vert - Seaport","Prod Vert - Access Control","Prod Vert - Video"]')
        ,('["Acct Vert - Chemical"]')
;
--=============================================================================
--      Create and populate a working table that contains the split/clean data.
--=============================================================================
   DROP TABLE IF EXISTS #Split
;
 SELECT  MainTableID = mt.ID
        ,t1.Name
   INTO #Split
   FROM #Maintable mt
  CROSS APPLY STRING_SPLIT(multiname,',')s1
  CROSS APPLY (VALUES(TRIM('[]"' FROM s1.Value)))t1(Name)
;
--=============================================================================
--      Create and populate the Lookup table from the unique split data.
--=============================================================================
   DROP TABLE IF EXISTS #LookUp
;
 CREATE TABLE #LookUp
        (
         LookUpTableID int IDENTITY(1,1)
        ,Name          varchar(100) NOT NULL
        ,CONSTRAINT PK_LookUp PRIMARY KEY CLUSTERED (LookUpTableID)
        ,INDEX AK_Name UNIQUE (Name)
        )
;
 INSERT INTO #LookUp WITH (TABLOCK)
        (Name)
 SELECT DISTINCT
        Name 
   FROM #Split
  ORDER BY Name
 OPTION (MAXDOP 1) --Keep parallelism from messing up the order
;
--=============================================================================
--      Create and populate the FactTable "bridge" table joning by VERT names.
--=============================================================================
   DROP TABLE IF EXISTS #FactTable
;
 CREATE TABLE #FactTable
        (
         MainTableID   int NOT NULL
        ,LookupTableID int NOT NULL
        ,CONSTRAINT PK_FactTable PRIMARY KEY CLUSTERED (MainTableID,LookupTableID)
        )
;
 INSERT INTO #FactTable WITH (TABLOCK)
        (MainTableID, LookUpTableID)
 SELECT  s.MainTableID
        ,l.LookUpTableID
   FROM #Split  s
   JOIN #LookUp l ON s.Name = l.Name
;
--===== All done... Drink BEER!

Also, I did like like Justin did... I used Temp Tables for this demo because I use DROP TABLE for this demo to make demo reruns easier and don't want you to make the mistake of dropping a real table by mistake.

Also... if it were me, I'd split the "VERT" names into a VertCategory and VertType column. For that, you'd need to use the DelimitedSplit8K function that Johan provide the link for above because STRING_SPLIT() does not return a position ordinal until 2022 (what the hell was MS thinking when they made that mistake?). It'll make future data analysis and reporting a shedload easier. I didn't want to presume to do it that way for you, though.

@justin... welcome aboard, mate!

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Eirikur Eiriksson SSC Guru Points: 182947 More actions · Answer 4

Quick thought, this can be simplified by using OPENJSON

Here is an example:

USE TEEST;
GO
SET NOCOUNT ON;
GO
---------------------------------------------------------------------
-- Declare the test variable
---------------------------------------------------------------------
DECLARE @TSTRING NVARCHAR(MAX) = N'["Acct Vert - Chemical","Acct Vert - Energy","Acct Vert - Seaport","Prod Vert - Access Control","Prod Vert - Structured Cabling","Prod Vert - Video"]';
---------------------------------------------------------------------
-- Return the Ordinal position (key) and the Value for each entry in 
-- the test string
---------------------------------------------------------------------
;WITH PRE_PARSE(VKEY,VVAL) AS
(
    SELECT
        SJ.
       ,SJ.[value]
    FROM    OPENJSON(@TSTRING) SJ
)
---------------------------------------------------------------------
-- Split each value by the "-" delimiter
---------------------------------------------------------------------
,SPLIT_VALUES(RRID,VKEY,SVAL) AS 
(
    SELECT
        ROW_NUMBER() OVER (PARTITION BY PP.VKEY ORDER BY PP.VKEY) AS RRID
       ,PP.VKEY
       ,SP.[value] AS SVAL
    FROM PRE_PARSE  PP
    CROSS APPLY string_split(PP.VVAL,N'-') SP
)
---------------------------------------------------------------------
-- Final output
---------------------------------------------------------------------
SELECT
    SV.VKEY
   ,MAX(CASE WHEN RRID = 1 THEN SV.SVAL END) AS VCOL
   ,MAX(CASE WHEN RRID = 2 THEN SV.SVAL END) AS VVAL
FROM SPLIT_VALUES   SV
GROUP BY SV.VKEY
;

The result set:

VKEY  VCOL        VVAL
----- ----------- --------------------
0     Acct Vert    Chemical
1     Acct Vert    Energy
2     Acct Vert    Seaport
3     Prod Vert    Access Control
4     Prod Vert    Structured Cabling
5     Prod Vert    Video

Steve Collins SSCrazy Points: 2199 More actions · Answer 5

Jeff's code seems to generate primary keys and produces the correct output afaik. Creating foreign keys should be straightforward for the OP. The split could be accomplished with OPENJSON without specifying a schema and without the need for STRING_SPLIT imo

Using OPENJSON (Jeff's code)

--===== Before getting started... Drink beer NOW!
--=============================================================================
--      Create and populate a working table that contains the split/clean data.
--=============================================================================
   DROP TABLE IF EXISTS #Split
;
 SELECT  MainTableID = mt.ID
        ,oj.value [Name]
   INTO #Split
   FROM #Maintable mt
  CROSS APPLY openjson(mt.MultiName) oj
;

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Jeff Moden SSC Guru Points: 1004591 More actions · Answer 6

Although JSON is a simple solution, I generally avoid it for single level JSON because it's a fair bit slower than DelimitedSplit8k, Eirikur's wonderful "Lead" addition to the DelimitedSplit8k function, or STRING_SPLIT(). Just sayin'...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Steve Collins SSCrazy Points: 2199 More actions · Answer 7

Comma is not an escape character in JSON (towards the bottom of the page)

https://www.json.org/json-en.html

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Eirikur Eiriksson SSC Guru Points: 182947 More actions · Answer 8

Steve Collins wrote:

Comma is not an escape character in JSON (towards the bottom of the page)

https://www.json.org/json-en.html%5B/quote%5D

M$-JSon might not be strict

This reply was modified 2 years, 6 months ago by Eirikur Eiriksson.

Jeff Moden SSC Guru Points: 1004591 More actions · Answer 9

Steve Collins wrote:

Jeff's code seems to generate primary keys and produces the correct output afaik. Creating foreign keys should be straightforward for the OP. The split could be accomplished with OPENJSON without specifying a schema and without the need for STRING_SPLIT imo
Using OPENJSON (Jeff's code)
--===== Before getting started... Drink beer NOW!
--=============================================================================
--      Create and populate a working table that contains the split/clean data.
--=============================================================================
   DROP TABLE IF EXISTS #Split
;
 SELECT  MainTableID = mt.ID
        ,oj.value [Name]
   INTO #Split
   FROM #Maintable mt
  CROSS APPLY openjson(mt.MultiName) oj
;

The same can be done without specifying a schema with just about any method capable of performing a Split. The OP asked for tables, though... probably for building some form of Normalized data for much larger data or maybe even a DW.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Steve Collins SSCrazy Points: 2199 More actions · Answer 10

If you load up one of the strings with a bunch of commas it causes an error afaik. JSON doesn't care how many commas in a row. A tiny point maybe

Try with this input

   DROP TABLE IF EXISTS #MainTable
;
 CREATE TABLE #MainTable 
        (
         ID         int IDENTITY(1,1)
        ,MultiName  varchar(500) NOT NULL
        ,CONSTRAINT PK_MainTable PRIMARY KEY CLUSTERED (ID)
        )
;
 INSERT INTO #MainTable WITH (TABLOCK)
 VALUES  ('["Acct Vert - Chemical,,, etc","Acct Vert - Energy","Acct Vert - Seaport","Prod Vert - Access Control","Prod Vert - Structured Cabling","Prod Vert - Video"]')
        ,('["Acct Vert - Chemical","Acct Vert - Energy","Acct Vert - Seaport","Prod Vert - Access Control","Prod Vert - Video"]')
        ,('["Acct Vert - Chemical"]')
;

Also, Jeff could you explain a little about the (TABLOCK) hint? Also, " OPTION (MAXDOP 1) --Keep parallelism from messing up the order"?

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Jeff Moden SSC Guru Points: 1004591 More actions · Answer 11

Steve Collins wrote:

Also, Jeff could you explain a little about the (TABLOCK) hint? Also, " OPTION (MAXDOP 1) --Keep parallelism from messing up the order"?

Sure...

About the WITH (TABLOCK)... it's being used on a Temp table and TembDB is in the simple recovery model. We're inserting more than one row into a Temp table that has a Clustered Index on it. WITH (TABLOCK) will cause it to be minimally logged because of the Simple Recovery Model. It's become an automatic habit for me. I do it even if there's no Clustered Index involved or something else may prevent the Minimal Logging because it'll still prevent "eventual escalation" from row to page and make things a bit faster. Lot's faster if there's a fair bit of data and it has a Clustered Index on it when starting out empty. It'll also cause minimal logging if inserting into a heap. You can't get Minimal logging for INSERTs without it so I add it to code in TempDB as a matter of rote.

As for the MAXDOP 1 thing, that's another habit of mine if I want the IDENTITY column to reflect the order of data. Even if parallelism weren't possible, using MAXDOP in such a fashion pretty much guarantees it won't happen. If it does, things might not end up in the order I want them. It's also a visual indicator (to me, as least) that my intent for the IDENTITY column matching the sort order was of primary concern.

For inserts that also have variables in the code, I'll also include the Recompile option for such inserts because Minimal Logging frequently doesn't happen without it if variables are present. Sometimes it not required but I don't trust MS or the optimizer to make such a decision for me.

Some people claim it's all overkill. Ok... whatever. Someday they'll need to wear "Depends" and then they'll understand.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Steve Collins SSCrazy Points: 2199 More actions · Answer 12

Interesting and much appreciated. Happy Thanksgiving!

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Split string

Cookies on SQLServerCentral