Parsing XML data into multiple Datasets

  • Hi,

    Thanks in advance for your help.  I’m trying to download some data from a website and import it into a SQL Database  Here is the website:

    Disclosure Table – The Takeover Panel

    The website posts the data in multiple formats; however I think the most useful format they provide (for this use case) is the XML version (Found below):

    https://www.thetakeoverpanel.org.uk/new/disclosureTable/v3/disclosuretable.xml

    The XLM file has multiple sections which I’d like to parse into multiple datasets (Table variables or Temp tables):

    [additions] [deletions], [amendments] and [maintable]

    DECLARE @Additions AS TABLE
    (
    [RowID] INT IDENTITY(1,1),
    [CaseID] INT,
    [Offeree_Name] NVARCHAR(MAX),
    [Offer_Period_Commenced] DATETIME,
    [Name] NVARCHAR(250),
    [ISIN] NVARCHAR(150),
    [NSI] NVARCHAR(150),
    [Offeror_Name] NVARCHAR(MAX),
    [Offeror_Rule_26_Deadline] NVARCHAR(100),
    [offeror_identified] DATETIME,
    [offeror_empty_2.10] NVARCHAR(MAX)
    )

    DECLARE @Deletions AS TABLE
    (
    [RowID] INT IDENTITY(1,1),
    [CaseID] INT,
    [Offeree_Name] NVARCHAR(MAX),
    [Offer_Period_Commenced] DATETIME,
    [Name] NVARCHAR(250),
    [ISIN] NVARCHAR(150),
    [NSI] NVARCHAR(150),
    [Offeror_Name] NVARCHAR(MAX),
    [Offeror_Rule_26_Deadline] NVARCHAR(100),
    [offeror_identified] DATETIME,
    [offeror_empty_2.10] NVARCHAR(MAX)
    )

    DECLARE @MainTable AS TABLE
    (
    [RowID] INT IDENTITY(1,1),
    [CaseID] INT,
    [Offeree_Name] NVARCHAR(MAX),
    [Offer_Period_Commenced] DATETIME,
    [Name] NVARCHAR(250),
    [ISIN] NVARCHAR(150),
    [NSI] NVARCHAR(150),
    [Offeror_Name] NVARCHAR(MAX),
    [Offeror_Rule_26_Deadline] NVARCHAR(100),
    [offeror_identified] DATETIME,
    [offeror_empty_2.10] NVARCHAR(MAX)
    )

    Each dataset has the same format.

    I’ve been able to extract individual nodes (ISIN) in this example:

    SELECT
    T.C.value('.', 'varchar(100)') AS [maintable_ISIN]
    FROM
    @myDoc.nodes('(/disclosure_table/maintable/case/offeree/twoten_information/twoten_line/ISIN)') as T(C)

    But I’d like to extract the whole thing directly into their relevant datasets so that they look like the attached screenshot (Example_Results.png).

    Can anyone help me extract the data into separate datasets?

    Thanks

    Attachments:
    You must be logged in to view attached files.
  • Posted this question elsewhere and got a great answer:

    https://learn.microsoft.com/en-us/answers/questions/1153653/parsing-xml-data-into-multiple-datasets.html#comment-1154683

     

         SELECT
    ROW_NUMBER()OVER(ORDER BY Caseid.value('@caseid[1]', 'int')) AS RowID,
    Caseid.value('@caseid[1]', 'int') AS caseid,
    offeree.value('@name[1]', 'NVARCHAR (200)') AS Offeree_Name,
    offeree.value('@offer_period_commenced[1]', 'VARCHAR(20)') AS Offer_Period_Commenced,
    twoten_information.value('name[1]', 'VARCHAR(20)') AS Name,
    twoten_information.value('ISIN[1]', 'VARCHAR(20)') AS ISIN,
    twoten_information.value('NSI[1]', 'VARCHAR(20)') AS NSI,
    offeror.value('@name[1]', 'NVARCHAR (200)') AS Offeror_Name,
    offeror.value('@rule_26_deadline[1]', 'NVARCHAR (200)') AS Offeror_rule_26_deadline,
    offeror.value('@offeror_identified[1]', 'NVARCHAR (200)') AS Offeror_identified,
    offeror.value('empty_2.10[1]', 'NVARCHAR (200)') AS [Offeror_empty_2.10]
    FROM @myDoc.nodes('/disclosure_table/maintable/case') AS XMLtable1(Caseid)
    CROSS APPLY Caseid.nodes('offeree') XMLtable2(offeree)
    CROSS APPLY Caseid.nodes('offeror') XMLtable3(offeror)
    CROSS APPLY offeree.nodes('twoten_information/twoten_line') XMLtable4(twoten_information)

     

    "To improve your answer:

    There is no need to use [1] for the attributes. Each attribute is unique in the context of its element.

    While addressing elements, it is better to use the following XPath expression:

    '(ISIN/text())[1]' instead of the 'ISIN[1]'

    It will produce a dramatic performance improvement.£

  • Aye... thanks for posting the link for that.  Much appreciated.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply