Alternative Way to Parse JSON String?

  • I know there are JSON parsers out there, but I'm hoping to avoid using a separate function as the query is being built using a builder in a C++ application. That said I'd like to get this done given the parameters outlined below. Hoping someone can point me in the right direction.

    Given the field CLIENT_DATA of data type VARCHAR(MAX) with data:

    {"LASTNAME":"Doe","FIRSTNAME":"John","EMAILADDRESS":"something@email.com","USERNAME":"exusername","PHONENUMBER":"15555555555"}

    I'd like to know if its possible to split the data out using a combination of MSSQL FUNCTIONS SUCH AS ( SUBSTRING, CHARINDEX, PATINDEX, LEN, LEFT, RIGHT, STUFF etc ) and without setting any DECLARE @VARS

    What I have so far is this: (NOTE: This is part of a larger query that returns other data, hence the alias at the end)

    SELECT SUBSTRING( CLIENT_DATA, ( CHARINDEX( 'LASTNAME', CLIENT_DATA ) + 11 ), CHARINDEX( '","', CLIENT_DATA ) ) FROM SVE_VERIFY_ANALYSIS_T AS P WHERE C.CLIENT_ID = P.CLIENT_ID ) AS LASTNAME

    This returns:

    Doe","FIRSTNAME":

    When It shoould only return:

    Doe

    I understand why this is happening, because the value i'm passing in the length field of the SUBSTRING function is being determined by the first occurrence found of '","'. The challenge I have is that the fields in the json string are not always in the same place, that is, LASTNAME could be the first for one record or somewhere in the middle for another etc etc.

    I've tried different alterations but nothing coming close. I'd like to believe this can be done with only string functions but I'm stumped on the correct order.

    THIS IS THE FULL QUERY IF INTERESTED:

    SELECT DISTINCT

    C.CLIENT_ID,

    ( SELECT TOP 1 CLIENT_DATA FROM SVE_VERIFY_ANALYSIS_T AS P WHERE C.CLIENT_ID = P.CLIENT_ID ) AS CLIENT_DATA,

    ( SELECT TOP 1 SUBSTRING( CLIENT_DATA, ( CHARINDEX( 'LASTNAME', CLIENT_DATA ) + 11 ), CHARINDEX( '","', CLIENT_DATA ) ) FROM SVE_VERIFY_ANALYSIS_T AS P WHERE C.CLIENT_ID = P.CLIENT_ID ) AS LASTNAME,

    ( SELECT TOP 1 CHARINDEX( 'LASTNAME', CLIENT_DATA ) + 11 FROM SVE_VERIFY_ANALYSIS_T AS P WHERE C.CLIENT_ID = P.CLIENT_ID ) AS STARTING,

    ( SELECT TOP 1 CHARINDEX( '","', CLIENT_DATA ) FROM SVE_VERIFY_ANALYSIS_T AS P WHERE C.CLIENT_ID = P.CLIENT_ID ) AS LENGTH,

    ( SELECT COUNT( * ) FROM SVE_VERIFY_ANALYSIS_T AS P WHERE C.CLIENT_ID = P.CLIENT_ID AND RESULT = 'P' ) AS PASS,

    ( SELECT COUNT( * ) FROM SVE_VERIFY_ANALYSIS_T AS P WHERE C.CLIENT_ID = P.CLIENT_ID AND RESULT = 'A' ) AS AMBG,

    ( SELECT COUNT( * ) FROM SVE_VERIFY_ANALYSIS_T AS P WHERE C.CLIENT_ID = P.CLIENT_ID AND RESULT = 'F' ) AS FAIL,

    ( SELECT COUNT( * ) FROM SVE_VERIFY_ANALYSIS_T AS P WHERE C.CLIENT_ID = P.CLIENT_ID AND RESULT = 'N' ) AS ERRR,

    ( SELECT COUNT( * ) FROM SVE_VERIFY_ANALYSIS_T AS P WHERE C.CLIENT_ID = P.CLIENT_ID ) AS TOTAL

    FROM

    SVE_VERIFY_ANALYSIS_T AS C

  • A purely T-SQL solution:

    Consuming JSON Strings in SQL Server[/url] by Phil Factor

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • I have a far better idea...

    Drive to the building where the person who sent you the hierarchical, denormalized crap and beat the hell out of them for being so stupid. Then, hum a rock at Phil for delivering it as an Edge table. 😉

    Remember that, like XML, JSON is a managed-code encoding of data and it's easier to parse there than in a database. If someone is sending you JSON encoded data to be included in a database, then they're just being lazy because it's a whole lot easier for them to send you something more compatible with databases.

    I won't even bring up the obvious tag bloat and the totally unnecessary hierarchical encoding of the data.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • omegadev21 (11/20/2015)


    I know there are JSON parsers out there, but I'm hoping to avoid using a separate function as the query is being built using a builder in a C++ application. That said I'd like to get this done given the parameters outlined below. Hoping someone can point me in the right direction.

    Given the field CLIENT_DATA of data type VARCHAR(MAX) with data:

    {"LASTNAME":"Doe","FIRSTNAME":"John","EMAILADDRESS":"something@email.com","USERNAME":"exusername","PHONENUMBER":"15555555555"}

    I'd like to know if its possible to split the data out using a combination of MSSQL FUNCTIONS SUCH AS ( SUBSTRING, CHARINDEX, PATINDEX, LEN, LEFT, RIGHT, STUFF etc ) and without setting any DECLARE @VARS

    What I have so far is this: (NOTE: This is part of a larger query that returns other data, hence the alias at the end)

    SELECT SUBSTRING( CLIENT_DATA, ( CHARINDEX( 'LASTNAME', CLIENT_DATA ) + 11 ), CHARINDEX( '","', CLIENT_DATA ) ) FROM SVE_VERIFY_ANALYSIS_T AS P WHERE C.CLIENT_ID = P.CLIENT_ID ) AS LASTNAME

    This returns:

    Doe","FIRSTNAME":

    When It shoould only return:

    Doe

    I understand why this is happening, because the value i'm passing in the length field of the SUBSTRING function is being determined by the first occurrence found of '","'. The challenge I have is that the fields in the json string are not always in the same place, that is, LASTNAME could be the first for one record or somewhere in the middle for another etc etc.

    I've tried different alterations but nothing coming close. I'd like to believe this can be done with only string functions but I'm stumped on the correct order.

    THIS IS THE FULL QUERY IF INTERESTED:

    SELECT DISTINCT

    C.CLIENT_ID,

    ( SELECT TOP 1 CLIENT_DATA FROM SVE_VERIFY_ANALYSIS_T AS P WHERE C.CLIENT_ID = P.CLIENT_ID ) AS CLIENT_DATA,

    ( SELECT TOP 1 SUBSTRING( CLIENT_DATA, ( CHARINDEX( 'LASTNAME', CLIENT_DATA ) + 11 ), CHARINDEX( '","', CLIENT_DATA ) ) FROM SVE_VERIFY_ANALYSIS_T AS P WHERE C.CLIENT_ID = P.CLIENT_ID ) AS LASTNAME,

    ( SELECT TOP 1 CHARINDEX( 'LASTNAME', CLIENT_DATA ) + 11 FROM SVE_VERIFY_ANALYSIS_T AS P WHERE C.CLIENT_ID = P.CLIENT_ID ) AS STARTING,

    ( SELECT TOP 1 CHARINDEX( '","', CLIENT_DATA ) FROM SVE_VERIFY_ANALYSIS_T AS P WHERE C.CLIENT_ID = P.CLIENT_ID ) AS LENGTH,

    ( SELECT COUNT( * ) FROM SVE_VERIFY_ANALYSIS_T AS P WHERE C.CLIENT_ID = P.CLIENT_ID AND RESULT = 'P' ) AS PASS,

    ( SELECT COUNT( * ) FROM SVE_VERIFY_ANALYSIS_T AS P WHERE C.CLIENT_ID = P.CLIENT_ID AND RESULT = 'A' ) AS AMBG,

    ( SELECT COUNT( * ) FROM SVE_VERIFY_ANALYSIS_T AS P WHERE C.CLIENT_ID = P.CLIENT_ID AND RESULT = 'F' ) AS FAIL,

    ( SELECT COUNT( * ) FROM SVE_VERIFY_ANALYSIS_T AS P WHERE C.CLIENT_ID = P.CLIENT_ID AND RESULT = 'N' ) AS ERRR,

    ( SELECT COUNT( * ) FROM SVE_VERIFY_ANALYSIS_T AS P WHERE C.CLIENT_ID = P.CLIENT_ID ) AS TOTAL

    FROM

    SVE_VERIFY_ANALYSIS_T AS C

    Do you have a sample of the JSON that you're actually trying to split? Yes, I realize there's some PII in it and you'll have to obfuscate that but it would be good to see an example of what you're trying to work with.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ah... maybe I missed it. Is this the data example?

    Given the field CLIENT_DATA of data type VARCHAR(MAX) with data:

    {"LASTNAME":"Doe","FIRSTNAME":"John","EMAILADDRESS":"something@email.com","USERNAME":"exusername","PHONENUMBER":"15555555555"}

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (11/20/2015)


    Ah... maybe I missed it. Is this the data example?

    Given the field CLIENT_DATA of data type VARCHAR(MAX) with data:

    {"LASTNAME":"Doe","FIRSTNAME":"John","EMAILADDRESS":"something@email.com","USERNAME":"exusername","PHONENUMBER":"15555555555"}

    @jeff Moden, yeah that's the sample...

    If only I could take a blowtorch or rock to the newb the company hired to work on the front end (u know the one who sent the json string)...but alas the purge isn't real so have to keep my hypothetical murderous tendencies at bay lmao rofl lmao...In any event I'm working with a previous db structure until I can convince the powers that be to restructure the schema.

  • omegadev21 (11/20/2015)


    Jeff Moden (11/20/2015)


    Ah... maybe I missed it. Is this the data example?

    Given the field CLIENT_DATA of data type VARCHAR(MAX) with data:

    {"LASTNAME":"Doe","FIRSTNAME":"John","EMAILADDRESS":"something@email.com","USERNAME":"exusername","PHONENUMBER":"15555555555"}

    @jeff Moden, yeah that's the sample...

    If only I could take a blowtorch or rock to the newb the company hired to work on the front end (u know the one who sent the json string)...but alas the purge isn't real so have to keep my hypothetical murderous tendencies at bay lmao rofl lmao...In any event I'm working with a previous db structure until I can convince the powers that be to restructure the schema.

    I missed your example too. To parse that stuff out you can start with this (using Jeff's DelimitedSplit8K):

    DECLARE @json varchar(max) =

    '{"LASTNAME":"Doe","FIRSTNAME":"John","EMAILADDRESS":"something@email.com","USERNAME":"exusername","PHONENUMBER":"15555555555"}';

    SELECT

    ItemNumber,

    aLabel = SUBSTRING(item,2,s-2),

    aValue = SUBSTRING(item,s+3,LEN(item)-(s+3))

    FROM

    (

    SELECT ds1.ItemNumber, ds1.Item, s=CHARINDEX('":"',ds1.Item)

    FROM dbo.DelimitedSplit8K(REPLACE(REPLACE(@json,'}',''),'{',''),',') ds1

    ) j;

    and get this:

    ItemNumber aLabel aValue

    -------------------- ------------ --------------------

    1 LASTNAME Doe

    2 FIRSTNAME John

    3 EMAILADDRESS something@email.com

    4 USERNAME exusername

    5 PHONENUMBER 15555555555

    edit: added output.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Alan.B (11/20/2015)


    omegadev21 (11/20/2015)


    Jeff Moden (11/20/2015)


    Ah... maybe I missed it. Is this the data example?

    Given the field CLIENT_DATA of data type VARCHAR(MAX) with data:

    {"LASTNAME":"Doe","FIRSTNAME":"John","EMAILADDRESS":"something@email.com","USERNAME":"exusername","PHONENUMBER":"15555555555"}

    @jeff Moden, yeah that's the sample...

    If only I could take a blowtorch or rock to the newb the company hired to work on the front end (u know the one who sent the json string)...but alas the purge isn't real so have to keep my hypothetical murderous tendencies at bay lmao rofl lmao...In any event I'm working with a previous db structure until I can convince the powers that be to restructure the schema.

    I missed your example too. To parse that stuff out you can start with this (using Jeff's DelimitedSplit8K):

    DECLARE @json varchar(max) =

    '{"LASTNAME":"Doe","FIRSTNAME":"John","EMAILADDRESS":"something@email.com","USERNAME":"exusername","PHONENUMBER":"15555555555"}';

    SELECT

    ItemNumber,

    aLabel = SUBSTRING(item,2,s-2),

    aValue = SUBSTRING(item,s+3,LEN(item)-(s+3))

    FROM

    (

    SELECT ds1.ItemNumber, ds1.Item, s=CHARINDEX('":"',ds1.Item)

    FROM dbo.DelimitedSplit8K(REPLACE(REPLACE(@json,'}',''),'{',''),',') ds1

    ) j;

    and get this:

    ItemNumber aLabel aValue

    -------------------- ------------ --------------------

    1 LASTNAME Doe

    2 FIRSTNAME John

    3 EMAILADDRESS something@email.com

    4 USERNAME exusername

    5 PHONENUMBER 15555555555

    edit: added output.

    @alan.B...If this work, this may be the last resort. As I indicated In my original post I'm trying to avoid using DECLARES if it is even possible to do so. I'll give it a go in the morning.

  • omegadev21 (11/20/2015)


    Alan.B (11/20/2015)


    omegadev21 (11/20/2015)


    Jeff Moden (11/20/2015)


    Ah... maybe I missed it. Is this the data example?

    Given the field CLIENT_DATA of data type VARCHAR(MAX) with data:

    {"LASTNAME":"Doe","FIRSTNAME":"John","EMAILADDRESS":"something@email.com","USERNAME":"exusername","PHONENUMBER":"15555555555"}

    @jeff Moden, yeah that's the sample...

    If only I could take a blowtorch or rock to the newb the company hired to work on the front end (u know the one who sent the json string)...but alas the purge isn't real so have to keep my hypothetical murderous tendencies at bay lmao rofl lmao...In any event I'm working with a previous db structure until I can convince the powers that be to restructure the schema.

    I missed your example too. To parse that stuff out you can start with this (using Jeff's DelimitedSplit8K):

    DECLARE @json varchar(max) =

    '{"LASTNAME":"Doe","FIRSTNAME":"John","EMAILADDRESS":"something@email.com","USERNAME":"exusername","PHONENUMBER":"15555555555"}';

    SELECT

    ItemNumber,

    aLabel = SUBSTRING(item,2,s-2),

    aValue = SUBSTRING(item,s+3,LEN(item)-(s+3))

    FROM

    (

    SELECT ds1.ItemNumber, ds1.Item, s=CHARINDEX('":"',ds1.Item)

    FROM dbo.DelimitedSplit8K(REPLACE(REPLACE(@json,'}',''),'{',''),',') ds1

    ) j;

    and get this:

    ItemNumber aLabel aValue

    -------------------- ------------ --------------------

    1 LASTNAME Doe

    2 FIRSTNAME John

    3 EMAILADDRESS something@email.com

    4 USERNAME exusername

    5 PHONENUMBER 15555555555

    edit: added output.

    @alan.B...If this work, this may be the last resort. As I indicated In my original post I'm trying to avoid using DECLARES if it is even possible to do so. I'll give it a go in the morning.

    Understand that the DECLARE in Alan's code is just for his example code. You can read from the entire table column for this. I'm in the middle of a deployment at work (waiting for a script to complete) so can't spend much time on this right now. I'll be back after the deployment to show you what I mean if someone doesn't beat me to it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ok... caught a break from work.

    Here's a two row test table. This is NOT a part of the solution. It's just a test table to show the possibilities. Notice how the postion of the first and last names has been switched in the second row of data.

    CREATE TABLE #TestTable

    (Client_Data VARCHAR(MAX))

    ;

    INSERT INTO #TestTable

    (Client_Data)

    SELECT '{"LASTNAME":"Doe","FIRSTNAME":"John","EMAILADDRESS":"something@email.com","USERNAME":"exusername","PHONENUMBER":"15555555555"}'

    UNION ALL

    SELECT '{"FIRSTNAME":"Sally","LASTNAME":"Jones","EMAILADDRESS":"sjones@email.com","USERNAME":"yyusername","PHONENUMBER":"19999999999"}'

    ;

    Since the data is flat rather than hierarchical, we can actually split it and return a result set that looks more like a properly normalized table using the following code. It's also a bit tolerant of spurious double quotes, commas, and colons that my be embedded in the data as well as being tolerant of the position of the JSON field order. It will NOT automatically pick up on additional fields as it currently stands be we could also make that happen with a little dynamic SQL, if needed.

    WITH ctePreprocess AS

    (

    SELECT RowNum = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    ,Client_Data = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(Client_Data,'{',''),'}',''),'","',CHAR(30)),'":"',CHAR(31)),'"','')

    FROM #TestTable

    )

    SELECT LastName = MAX(CASE WHEN Split1.Item LIKE 'LASTNAME%' THEN Split2.Item ELSE '' END)

    ,FirstName = MAX(CASE WHEN Split1.Item LIKE 'FIRSTNAME%' THEN Split2.Item ELSE '' END)

    ,EMailAddress = MAX(CASE WHEN Split1.Item LIKE 'EMAILADDRESS%' THEN Split2.Item ELSE '' END)

    ,UserName = MAX(CASE WHEN Split1.Item LIKE 'USERNAME%' THEN Split2.Item ELSE '' END)

    ,PhoneNumber = MAX(CASE WHEN Split1.Item LIKE 'PHONENUMBER%' THEN Split2.Item ELSE '' END)

    FROM ctePreprocess d

    CROSS APPLY dbo.DelimitedSplit8K(d.Client_Data,CHAR(30)) split1

    CROSS APPLY dbo.DelimitedSplit8K(split1.Item ,CHAR(31)) Split2

    WHERE Split2.ItemNumber = 2

    GROUP BY d.RowNum

    ;

    Here are the results from the test above.

    LastName FirstName EMailAddress UserName PhoneNumber

    ----------- ----------- ------------------- ---------- -----------

    Doe John something@email.com exusername 15555555555

    Jones Sally sjones@email.com yyusername 19999999999

    You can get a copy of the DelimitedSplit8K function in the "Resources" section of the following article...

    http://www.sqlservercentral.com/articles/Tally+Table/72993/

    Note that the function is limited to VARCHAR(8000) and will run slower when you pass it a VARCHAR(MAX) but it should be ok for you. If you DO have values in the column that are wider than 8000 characters, consider using the CLR function, which is in the same place as where you get the DelimitedSplit8K function from.

    If you're DBA squawks about either, the DelimitedSplit8K function is an iTVF (Inline Table Valued Function) and does NOT have the same performance problems as typical Scalar or mTVF (Multi-statement Table Valued Functions) and the CLR has been deeply checked by me (source code is also included in the RESOURCEs section)).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • @jeff: Thanks for this. I will run this by our DBA when I get in the office on Monday, meanwhile I'll implement in my test environment at home and look at the results. Interesting using the WITH clause, I had not thought to use it, then again it was late in my day on Friday.

  • omegadev21 (11/21/2015)


    @Jeff: Thanks for this. I will run this by our DBA when I get in the office on Monday, meanwhile I'll implement in my test environment at home and look at the results. Interesting using the WITH clause, I had not thought to use it, then again it was late in my day on Friday.

    NP. The "WITH clause" is better known as a "CTE".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Well, none of the parsing solutions can be named correct.

    This part:

    REPLACE(REPLACE(@json,'}',''),'{','')

    might work for the given simplified set of data.

    But if simply removing "{" and "}" from a JSON string would not possibly corrupt the data nobody would put them into JSON format in the first place.

    Just like with quoted delimited strings - correct parsing of them takes a bit more than removing quotes.

    _____________
    Code for TallyGenerator

  • Sergiy (11/29/2015)


    Well, none of the parsing solutions can be named correct.

    This part:

    REPLACE(REPLACE(@json,'}',''),'{','')

    might work for the given simplified set of data.

    But if simply removing "{" and "}" from a JSON string would not possibly corrupt the data nobody would put them into JSON format in the first place.

    Just like with quoted delimited strings - correct parsing of them takes a bit more than removing quotes.

    I couldn't agree more. The Dev shouldn't have coded this stuff as JSON. It's flat data and that's all I did was treat it as flat data. Hopefully, they don't get any hierarchical stuff here.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • @ Jeff

    Thanks again for the input. Its nice to know this can be done but in the end, I made the developer cave and follow a normalized schema that I designed, because of two main reasons: (1) the DBA was seeing some performance spikes, and two I was getting inconsistent data from the dev.

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply