November 20, 2015 at 3:09 pm
I know there are JSON parsers out there, but I'm hoping to avoid using a separate function as the query is being built using a builder in a C++ application. That said I'd like to get this done given the parameters outlined below. Hoping someone can point me in the right direction.
Given the field CLIENT_DATA of data type VARCHAR(MAX) with data:
{"LASTNAME":"Doe","FIRSTNAME":"John","EMAILADDRESS":"something@email.com","USERNAME":"exusername","PHONENUMBER":"15555555555"}
I'd like to know if its possible to split the data out using a combination of MSSQL FUNCTIONS SUCH AS ( SUBSTRING, CHARINDEX, PATINDEX, LEN, LEFT, RIGHT, STUFF etc ) and without setting any DECLARE @VARS
What I have so far is this: (NOTE: This is part of a larger query that returns other data, hence the alias at the end)
SELECT SUBSTRING( CLIENT_DATA, ( CHARINDEX( 'LASTNAME', CLIENT_DATA ) + 11 ), CHARINDEX( '","', CLIENT_DATA ) ) FROM SVE_VERIFY_ANALYSIS_T AS P WHERE C.CLIENT_ID = P.CLIENT_ID ) AS LASTNAME
This returns:
Doe","FIRSTNAME":
When It shoould only return:
Doe
I understand why this is happening, because the value i'm passing in the length field of the SUBSTRING function is being determined by the first occurrence found of '","'. The challenge I have is that the fields in the json string are not always in the same place, that is, LASTNAME could be the first for one record or somewhere in the middle for another etc etc.
I've tried different alterations but nothing coming close. I'd like to believe this can be done with only string functions but I'm stumped on the correct order.
THIS IS THE FULL QUERY IF INTERESTED:
SELECT DISTINCT
C.CLIENT_ID,
( SELECT TOP 1 CLIENT_DATA FROM SVE_VERIFY_ANALYSIS_T AS P WHERE C.CLIENT_ID = P.CLIENT_ID ) AS CLIENT_DATA,
( SELECT TOP 1 SUBSTRING( CLIENT_DATA, ( CHARINDEX( 'LASTNAME', CLIENT_DATA ) + 11 ), CHARINDEX( '","', CLIENT_DATA ) ) FROM SVE_VERIFY_ANALYSIS_T AS P WHERE C.CLIENT_ID = P.CLIENT_ID ) AS LASTNAME,
( SELECT TOP 1 CHARINDEX( 'LASTNAME', CLIENT_DATA ) + 11 FROM SVE_VERIFY_ANALYSIS_T AS P WHERE C.CLIENT_ID = P.CLIENT_ID ) AS STARTING,
( SELECT TOP 1 CHARINDEX( '","', CLIENT_DATA ) FROM SVE_VERIFY_ANALYSIS_T AS P WHERE C.CLIENT_ID = P.CLIENT_ID ) AS LENGTH,
( SELECT COUNT( * ) FROM SVE_VERIFY_ANALYSIS_T AS P WHERE C.CLIENT_ID = P.CLIENT_ID AND RESULT = 'P' ) AS PASS,
( SELECT COUNT( * ) FROM SVE_VERIFY_ANALYSIS_T AS P WHERE C.CLIENT_ID = P.CLIENT_ID AND RESULT = 'A' ) AS AMBG,
( SELECT COUNT( * ) FROM SVE_VERIFY_ANALYSIS_T AS P WHERE C.CLIENT_ID = P.CLIENT_ID AND RESULT = 'F' ) AS FAIL,
( SELECT COUNT( * ) FROM SVE_VERIFY_ANALYSIS_T AS P WHERE C.CLIENT_ID = P.CLIENT_ID AND RESULT = 'N' ) AS ERRR,
( SELECT COUNT( * ) FROM SVE_VERIFY_ANALYSIS_T AS P WHERE C.CLIENT_ID = P.CLIENT_ID ) AS TOTAL
FROM
SVE_VERIFY_ANALYSIS_T AS C
November 20, 2015 at 4:46 pm
A purely T-SQL solution:
Consuming JSON Strings in SQL Server[/url] by Phil Factor
-- Itzik Ben-Gan 2001
November 20, 2015 at 9:28 pm
I have a far better idea...
Drive to the building where the person who sent you the hierarchical, denormalized crap and beat the hell out of them for being so stupid. Then, hum a rock at Phil for delivering it as an Edge table. 😉
Remember that, like XML, JSON is a managed-code encoding of data and it's easier to parse there than in a database. If someone is sending you JSON encoded data to be included in a database, then they're just being lazy because it's a whole lot easier for them to send you something more compatible with databases.
I won't even bring up the obvious tag bloat and the totally unnecessary hierarchical encoding of the data.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 20, 2015 at 9:45 pm
omegadev21 (11/20/2015)
I know there are JSON parsers out there, but I'm hoping to avoid using a separate function as the query is being built using a builder in a C++ application. That said I'd like to get this done given the parameters outlined below. Hoping someone can point me in the right direction.Given the field CLIENT_DATA of data type VARCHAR(MAX) with data:
{"LASTNAME":"Doe","FIRSTNAME":"John","EMAILADDRESS":"something@email.com","USERNAME":"exusername","PHONENUMBER":"15555555555"}
I'd like to know if its possible to split the data out using a combination of MSSQL FUNCTIONS SUCH AS ( SUBSTRING, CHARINDEX, PATINDEX, LEN, LEFT, RIGHT, STUFF etc ) and without setting any DECLARE @VARS
What I have so far is this: (NOTE: This is part of a larger query that returns other data, hence the alias at the end)
SELECT SUBSTRING( CLIENT_DATA, ( CHARINDEX( 'LASTNAME', CLIENT_DATA ) + 11 ), CHARINDEX( '","', CLIENT_DATA ) ) FROM SVE_VERIFY_ANALYSIS_T AS P WHERE C.CLIENT_ID = P.CLIENT_ID ) AS LASTNAME
This returns:
Doe","FIRSTNAME":
When It shoould only return:
Doe
I understand why this is happening, because the value i'm passing in the length field of the SUBSTRING function is being determined by the first occurrence found of '","'. The challenge I have is that the fields in the json string are not always in the same place, that is, LASTNAME could be the first for one record or somewhere in the middle for another etc etc.
I've tried different alterations but nothing coming close. I'd like to believe this can be done with only string functions but I'm stumped on the correct order.
THIS IS THE FULL QUERY IF INTERESTED:
SELECT DISTINCT
C.CLIENT_ID,
( SELECT TOP 1 CLIENT_DATA FROM SVE_VERIFY_ANALYSIS_T AS P WHERE C.CLIENT_ID = P.CLIENT_ID ) AS CLIENT_DATA,
( SELECT TOP 1 SUBSTRING( CLIENT_DATA, ( CHARINDEX( 'LASTNAME', CLIENT_DATA ) + 11 ), CHARINDEX( '","', CLIENT_DATA ) ) FROM SVE_VERIFY_ANALYSIS_T AS P WHERE C.CLIENT_ID = P.CLIENT_ID ) AS LASTNAME,
( SELECT TOP 1 CHARINDEX( 'LASTNAME', CLIENT_DATA ) + 11 FROM SVE_VERIFY_ANALYSIS_T AS P WHERE C.CLIENT_ID = P.CLIENT_ID ) AS STARTING,
( SELECT TOP 1 CHARINDEX( '","', CLIENT_DATA ) FROM SVE_VERIFY_ANALYSIS_T AS P WHERE C.CLIENT_ID = P.CLIENT_ID ) AS LENGTH,
( SELECT COUNT( * ) FROM SVE_VERIFY_ANALYSIS_T AS P WHERE C.CLIENT_ID = P.CLIENT_ID AND RESULT = 'P' ) AS PASS,
( SELECT COUNT( * ) FROM SVE_VERIFY_ANALYSIS_T AS P WHERE C.CLIENT_ID = P.CLIENT_ID AND RESULT = 'A' ) AS AMBG,
( SELECT COUNT( * ) FROM SVE_VERIFY_ANALYSIS_T AS P WHERE C.CLIENT_ID = P.CLIENT_ID AND RESULT = 'F' ) AS FAIL,
( SELECT COUNT( * ) FROM SVE_VERIFY_ANALYSIS_T AS P WHERE C.CLIENT_ID = P.CLIENT_ID AND RESULT = 'N' ) AS ERRR,
( SELECT COUNT( * ) FROM SVE_VERIFY_ANALYSIS_T AS P WHERE C.CLIENT_ID = P.CLIENT_ID ) AS TOTAL
FROM
SVE_VERIFY_ANALYSIS_T AS C
Do you have a sample of the JSON that you're actually trying to split? Yes, I realize there's some PII in it and you'll have to obfuscate that but it would be good to see an example of what you're trying to work with.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 20, 2015 at 9:51 pm
Ah... maybe I missed it. Is this the data example?
Given the field CLIENT_DATA of data type VARCHAR(MAX) with data:
{"LASTNAME":"Doe","FIRSTNAME":"John","EMAILADDRESS":"something@email.com","USERNAME":"exusername","PHONENUMBER":"15555555555"}
--Jeff Moden
Change is inevitable... Change for the better is not.
November 20, 2015 at 10:07 pm
Jeff Moden (11/20/2015)
Ah... maybe I missed it. Is this the data example?
Given the field CLIENT_DATA of data type VARCHAR(MAX) with data:
{"LASTNAME":"Doe","FIRSTNAME":"John","EMAILADDRESS":"something@email.com","USERNAME":"exusername","PHONENUMBER":"15555555555"}
@jeff Moden, yeah that's the sample...
If only I could take a blowtorch or rock to the newb the company hired to work on the front end (u know the one who sent the json string)...but alas the purge isn't real so have to keep my hypothetical murderous tendencies at bay lmao rofl lmao...In any event I'm working with a previous db structure until I can convince the powers that be to restructure the schema.
November 20, 2015 at 10:48 pm
omegadev21 (11/20/2015)
Jeff Moden (11/20/2015)
Ah... maybe I missed it. Is this the data example?
Given the field CLIENT_DATA of data type VARCHAR(MAX) with data:
{"LASTNAME":"Doe","FIRSTNAME":"John","EMAILADDRESS":"something@email.com","USERNAME":"exusername","PHONENUMBER":"15555555555"}
@jeff Moden, yeah that's the sample...
If only I could take a blowtorch or rock to the newb the company hired to work on the front end (u know the one who sent the json string)...but alas the purge isn't real so have to keep my hypothetical murderous tendencies at bay lmao rofl lmao...In any event I'm working with a previous db structure until I can convince the powers that be to restructure the schema.
I missed your example too. To parse that stuff out you can start with this (using Jeff's DelimitedSplit8K):
DECLARE @json varchar(max) =
'{"LASTNAME":"Doe","FIRSTNAME":"John","EMAILADDRESS":"something@email.com","USERNAME":"exusername","PHONENUMBER":"15555555555"}';
SELECT
ItemNumber,
aLabel = SUBSTRING(item,2,s-2),
aValue = SUBSTRING(item,s+3,LEN(item)-(s+3))
FROM
(
SELECT ds1.ItemNumber, ds1.Item, s=CHARINDEX('":"',ds1.Item)
FROM dbo.DelimitedSplit8K(REPLACE(REPLACE(@json,'}',''),'{',''),',') ds1
) j;
and get this:
ItemNumber aLabel aValue
-------------------- ------------ --------------------
1 LASTNAME Doe
2 FIRSTNAME John
3 EMAILADDRESS something@email.com
4 USERNAME exusername
5 PHONENUMBER 15555555555
edit: added output.
-- Itzik Ben-Gan 2001
November 20, 2015 at 10:55 pm
Alan.B (11/20/2015)
omegadev21 (11/20/2015)
Jeff Moden (11/20/2015)
Ah... maybe I missed it. Is this the data example?
Given the field CLIENT_DATA of data type VARCHAR(MAX) with data:
{"LASTNAME":"Doe","FIRSTNAME":"John","EMAILADDRESS":"something@email.com","USERNAME":"exusername","PHONENUMBER":"15555555555"}
@jeff Moden, yeah that's the sample...
If only I could take a blowtorch or rock to the newb the company hired to work on the front end (u know the one who sent the json string)...but alas the purge isn't real so have to keep my hypothetical murderous tendencies at bay lmao rofl lmao...In any event I'm working with a previous db structure until I can convince the powers that be to restructure the schema.
I missed your example too. To parse that stuff out you can start with this (using Jeff's DelimitedSplit8K):
DECLARE @json varchar(max) =
'{"LASTNAME":"Doe","FIRSTNAME":"John","EMAILADDRESS":"something@email.com","USERNAME":"exusername","PHONENUMBER":"15555555555"}';
SELECT
ItemNumber,
aLabel = SUBSTRING(item,2,s-2),
aValue = SUBSTRING(item,s+3,LEN(item)-(s+3))
FROM
(
SELECT ds1.ItemNumber, ds1.Item, s=CHARINDEX('":"',ds1.Item)
FROM dbo.DelimitedSplit8K(REPLACE(REPLACE(@json,'}',''),'{',''),',') ds1
) j;
and get this:
ItemNumber aLabel aValue
-------------------- ------------ --------------------
1 LASTNAME Doe
2 FIRSTNAME John
3 EMAILADDRESS something@email.com
4 USERNAME exusername
5 PHONENUMBER 15555555555
edit: added output.
@alan.B...If this work, this may be the last resort. As I indicated In my original post I'm trying to avoid using DECLARES if it is even possible to do so. I'll give it a go in the morning.
November 21, 2015 at 5:57 am
omegadev21 (11/20/2015)
Alan.B (11/20/2015)
omegadev21 (11/20/2015)
Jeff Moden (11/20/2015)
Ah... maybe I missed it. Is this the data example?
Given the field CLIENT_DATA of data type VARCHAR(MAX) with data:
{"LASTNAME":"Doe","FIRSTNAME":"John","EMAILADDRESS":"something@email.com","USERNAME":"exusername","PHONENUMBER":"15555555555"}
@jeff Moden, yeah that's the sample...
If only I could take a blowtorch or rock to the newb the company hired to work on the front end (u know the one who sent the json string)...but alas the purge isn't real so have to keep my hypothetical murderous tendencies at bay lmao rofl lmao...In any event I'm working with a previous db structure until I can convince the powers that be to restructure the schema.
I missed your example too. To parse that stuff out you can start with this (using Jeff's DelimitedSplit8K):
DECLARE @json varchar(max) =
'{"LASTNAME":"Doe","FIRSTNAME":"John","EMAILADDRESS":"something@email.com","USERNAME":"exusername","PHONENUMBER":"15555555555"}';
SELECT
ItemNumber,
aLabel = SUBSTRING(item,2,s-2),
aValue = SUBSTRING(item,s+3,LEN(item)-(s+3))
FROM
(
SELECT ds1.ItemNumber, ds1.Item, s=CHARINDEX('":"',ds1.Item)
FROM dbo.DelimitedSplit8K(REPLACE(REPLACE(@json,'}',''),'{',''),',') ds1
) j;
and get this:
ItemNumber aLabel aValue
-------------------- ------------ --------------------
1 LASTNAME Doe
2 FIRSTNAME John
3 EMAILADDRESS something@email.com
4 USERNAME exusername
5 PHONENUMBER 15555555555
edit: added output.
@alan.B...If this work, this may be the last resort. As I indicated In my original post I'm trying to avoid using DECLARES if it is even possible to do so. I'll give it a go in the morning.
Understand that the DECLARE in Alan's code is just for his example code. You can read from the entire table column for this. I'm in the middle of a deployment at work (waiting for a script to complete) so can't spend much time on this right now. I'll be back after the deployment to show you what I mean if someone doesn't beat me to it.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 21, 2015 at 1:55 pm
Ok... caught a break from work.
Here's a two row test table. This is NOT a part of the solution. It's just a test table to show the possibilities. Notice how the postion of the first and last names has been switched in the second row of data.
CREATE TABLE #TestTable
(Client_Data VARCHAR(MAX))
;
INSERT INTO #TestTable
(Client_Data)
SELECT '{"LASTNAME":"Doe","FIRSTNAME":"John","EMAILADDRESS":"something@email.com","USERNAME":"exusername","PHONENUMBER":"15555555555"}'
UNION ALL
SELECT '{"FIRSTNAME":"Sally","LASTNAME":"Jones","EMAILADDRESS":"sjones@email.com","USERNAME":"yyusername","PHONENUMBER":"19999999999"}'
;
Since the data is flat rather than hierarchical, we can actually split it and return a result set that looks more like a properly normalized table using the following code. It's also a bit tolerant of spurious double quotes, commas, and colons that my be embedded in the data as well as being tolerant of the position of the JSON field order. It will NOT automatically pick up on additional fields as it currently stands be we could also make that happen with a little dynamic SQL, if needed.
WITH ctePreprocess AS
(
SELECT RowNum = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
,Client_Data = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(Client_Data,'{',''),'}',''),'","',CHAR(30)),'":"',CHAR(31)),'"','')
FROM #TestTable
)
SELECT LastName = MAX(CASE WHEN Split1.Item LIKE 'LASTNAME%' THEN Split2.Item ELSE '' END)
,FirstName = MAX(CASE WHEN Split1.Item LIKE 'FIRSTNAME%' THEN Split2.Item ELSE '' END)
,EMailAddress = MAX(CASE WHEN Split1.Item LIKE 'EMAILADDRESS%' THEN Split2.Item ELSE '' END)
,UserName = MAX(CASE WHEN Split1.Item LIKE 'USERNAME%' THEN Split2.Item ELSE '' END)
,PhoneNumber = MAX(CASE WHEN Split1.Item LIKE 'PHONENUMBER%' THEN Split2.Item ELSE '' END)
FROM ctePreprocess d
CROSS APPLY dbo.DelimitedSplit8K(d.Client_Data,CHAR(30)) split1
CROSS APPLY dbo.DelimitedSplit8K(split1.Item ,CHAR(31)) Split2
WHERE Split2.ItemNumber = 2
GROUP BY d.RowNum
;
Here are the results from the test above.
LastName FirstName EMailAddress UserName PhoneNumber
----------- ----------- ------------------- ---------- -----------
Doe John something@email.com exusername 15555555555
Jones Sally sjones@email.com yyusername 19999999999
You can get a copy of the DelimitedSplit8K function in the "Resources" section of the following article...
http://www.sqlservercentral.com/articles/Tally+Table/72993/
Note that the function is limited to VARCHAR(8000) and will run slower when you pass it a VARCHAR(MAX) but it should be ok for you. If you DO have values in the column that are wider than 8000 characters, consider using the CLR function, which is in the same place as where you get the DelimitedSplit8K function from.
If you're DBA squawks about either, the DelimitedSplit8K function is an iTVF (Inline Table Valued Function) and does NOT have the same performance problems as typical Scalar or mTVF (Multi-statement Table Valued Functions) and the CLR has been deeply checked by me (source code is also included in the RESOURCEs section)).
--Jeff Moden
Change is inevitable... Change for the better is not.
November 21, 2015 at 2:15 pm
@jeff: Thanks for this. I will run this by our DBA when I get in the office on Monday, meanwhile I'll implement in my test environment at home and look at the results. Interesting using the WITH clause, I had not thought to use it, then again it was late in my day on Friday.
November 21, 2015 at 2:55 pm
omegadev21 (11/21/2015)
@Jeff: Thanks for this. I will run this by our DBA when I get in the office on Monday, meanwhile I'll implement in my test environment at home and look at the results. Interesting using the WITH clause, I had not thought to use it, then again it was late in my day on Friday.
NP. The "WITH clause" is better known as a "CTE".
--Jeff Moden
Change is inevitable... Change for the better is not.
November 29, 2015 at 10:19 pm
Well, none of the parsing solutions can be named correct.
This part:
REPLACE(REPLACE(@json,'}',''),'{','')
might work for the given simplified set of data.
But if simply removing "{" and "}" from a JSON string would not possibly corrupt the data nobody would put them into JSON format in the first place.
Just like with quoted delimited strings - correct parsing of them takes a bit more than removing quotes.
_____________
Code for TallyGenerator
December 1, 2015 at 4:00 pm
Sergiy (11/29/2015)
Well, none of the parsing solutions can be named correct.This part:
REPLACE(REPLACE(@json,'}',''),'{','')
might work for the given simplified set of data.
But if simply removing "{" and "}" from a JSON string would not possibly corrupt the data nobody would put them into JSON format in the first place.
Just like with quoted delimited strings - correct parsing of them takes a bit more than removing quotes.
I couldn't agree more. The Dev shouldn't have coded this stuff as JSON. It's flat data and that's all I did was treat it as flat data. Hopefully, they don't get any hierarchical stuff here.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 1, 2015 at 4:16 pm
@ Jeff
Thanks again for the input. Its nice to know this can be done but in the end, I made the developer cave and follow a normalized schema that I designed, because of two main reasons: (1) the DBA was seeing some performance spikes, and two I was getting inconsistent data from the dev.
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply