How To Flatten JSON Data?

  • I receive data in JSON files and for one particular type of data I am having problems getting it into a sensible SQL format. The data consists of an ID and a JSON array of bank accounts belonging to that ID. For the data below, I want one row for ID 4710592 with the data where field "pri" is Bert and another row where it is Bob (note: this field isn't necessarily populated in the actual data).

    CREATE TABLE #temptable ( [id] nvarchar(4000), [entry] nvarchar(4000) )
    INSERT INTO #temptable ([id], [entry])
    ( N'4710592', N'[
    ,{"acc_name":"Mr. J. Smith"}
    ,{"acc_nick_name":"J. Smith--GBP"}
    ,{"bank_addr":"NATIONAL WESTMINSTER BANK PLC Leicester"}
    ,{"acc_name":"Mr. J. Smith No. 2 account"}
    ,{"acc_nick_name":"J. Smith--USD"}
    ,{"bank_addr":"SANTANDER Moorgate"}

    When I run the query below, I get as many rows per ID value as there are columns and only one column is populated on each row. I've tried using PIVOT, but that only gives me a single row per ID with the MAX (or MIN or whatever other aggregate function I use) per column. I'm sure there must be a relatively easy way to do this, but I'm stumped. Any assistance would be gratefully received.

    SELECT *
    FROM #temptable t
    WITH (
    pri NVARCHAR(4000),
    is_pens_pay NVARCHAR(4000),
    acc_nr NVARCHAR(4000),
    curry_id NVARCHAR(4000),
    limit NVARCHAR(4000),
    acc_name NVARCHAR(4000),
    acc_nick_name NVARCHAR(4000),
    init_deposit_amt NVARCHAR(4000),
    is_main NVARCHAR(4000),
    is_third_party NVARCHAR(4000),
    bank_name NVARCHAR(4000),
    bank_addr NVARCHAR(4000),
    sort_code_nr NVARCHAR(4000)
    ) oj

    PS: The only unique identifier for the entries in the array would be a combination of columns - for the sample data the sort_code_nr and acc_nr combination is unique.

  • Maybe this?

    j2.[key] AS field,
    j2.value AS val,
    row_number() over(partition by j2.[key] order by cast(j.[key] as int)) as rn
    FROM #temptable t
    CROSS APPLY OPENJSON(j.value) j2
    SELECT id,
    MAX(CASE WHEN field='pri' THEN val END) AS pri,
    MAX(CASE WHEN field='is_pens_pay' THEN val END) AS is_pens_pay,
    MAX(CASE WHEN field='acc_nr' THEN val END) AS acc_nr,
    MAX(CASE WHEN field='curry_id' THEN val END) AS curry_id,
    MAX(CASE WHEN field='limit' THEN val END) AS limit,
    MAX(CASE WHEN field='acc_name' THEN val END) AS acc_name,
    MAX(CASE WHEN field='acc_nick_name' THEN val END) AS acc_nick_name,
    MAX(CASE WHEN field='init_deposit_amt' THEN val END) AS init_deposit_amt,
    MAX(CASE WHEN field='is_main' THEN val END) AS is_main,
    MAX(CASE WHEN field='is_third_party' THEN val END) AS is_third_party,
    MAX(CASE WHEN field='bank_name' THEN val END) AS bank_name,
    MAX(CASE WHEN field='bank_addr' THEN val END) AS bank_addr,
    MAX(CASE WHEN field='sort_code_nr' THEN val END) AS sort_code_nr
    GROUP BY id,rn
    ORDER BY id,rn;


    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum
  • Thanks. That works perfectly.

  • In case anyone else comes across this as the solution to a similar problem, I had to add to the partition in order to get it to work with my full data set (ie. the original sample data didn't cover all scenarios).

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply