XML Import into multiple tables with data translation

  • Hi All, I've never had to do this before, so forgive me if I'm not explaining the question well, I'll try.

    I need to consume a live data feed from a golf tournament. And by consume, I really mean insert (merge) into our own SQL Server database on a regular intervals as a tournament progresses.

    This site didn't let me upload an XML file, but you can see a sample of the data feed here:

    http://images.pgalinks.com/scoring/TourFeedSmallSample.xml

    I need to insert this data into 2 tables, Player_Holes and Player_Shots. But while doing the insert, I need to lookup several things such as our player ID match to theirs on an external_id against the players table. The shot types translation, and some other logic about the process overall.

    The columns in my player_holes tables are: id, player_id, hole_id, round, shots (this is a total # of strokes) and date_created/date_modified.

    Shots table is similar: id, player_id, hole_id, round, shot_number, shot_type_id, club, distance, date_created/date_modified.

    The only way I know how to do it, is inefficient. I would parse the XML in ColdFusion (please no comments on ColdFusion, that's what we use for webdev), and then loop over it and do inserts for each player, each hole for each round, and the shots would probably be separate for each hole.

    It would be so much better and more efficient if I could do it in SQL directly. I've done some research and SQL Server Data Tools looks promising. I've never used it, so would have to learn, but also I'm not sure if that'd work in this application when we want to run is as a scheduled task every few minutes.

    I'm really hoping there is an efficient way to just do it with T-SQL.

    Any advice greatly appreciated.

  • Sounds like a good time to use Integration Services. If you created a package, you could then use SQL Agent to run the package you created on some interval (every X minutes, for example). The package would consist of several steps... something like this:

    1. an FTP command to get the file(s).

    2. XQuery to query the XML file so that you can select the data to insert.

    3. With 2, use some INSERT statements to write the parsed data from step 2 into the tables you need.

  • Here is a quick suggestion on how to use a blob table and views to parse the xml

    😎

    USE tempdb;

    GO

    /* Create the import table */

    CREATE TABLE dbo.TBL_XML_DOC

    ( XML_DOC_ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED NOT NULL

    ,XML_DOCUMENT XML NOT NULL

    );

    GO

    /* Load the xml */

    INSERT INTO dbo.TBL_XML_DOC(XML_DOCUMENT)

    SELECT CAST(BulkColumn AS XML) AS XML_DATA FROM OPENROWSET(

    BULK 'C:\IMPORT\TourFeedSmallSample.xml',

    SINGLE_BLOB

    ) AS X

    /* Create HEADER view */

    CREATE VIEW dbo.XML_HEADER

    AS

    SELECT

    TX.XML_DOC_ID AS XML_DOC_ID

    ,IMP.DOC.value('@T_ID' ,'VARCHAR(12)') AS T_ID

    ,IMP.DOC.value('@eventId' ,'INT') AS eventId

    ,IMP.DOC.value('@generated','DATETIME2(0)') AS generated

    ,IMP.DOC.value('@source' ,'VARCHAR(12)') AS [source]

    ,IMP.DOC.value('msg[1]/@msgId','INT') AS msgId

    ,IMP.DOC.query('(.)') AS DOC_XML

    FROM dbo.TBL_XML_DOC TX

    OUTER APPLY TX.XML_document.nodes('ConsolidatedPlayers') AS IMP(DOC)

    GO

    /* Create ROUNDS view */

    CREATE VIEW dbo.XML_ROUNDS

    AS

    SELECT

    HEADER.XML_DOC_ID

    ,IMP.DOC.value('@T_ID','VARCHAR(12)') AS T_ID

    ,P.DOC.value('@id' ,'INT') AS ROUNDS_id

    ,RND.DOC.value('@n' ,'INT') AS ROUNDS_n

    ,RND.DOC.query('(.)') AS ROUND_XML

    FROM dbo.XML_HEADER HEADER

    OUTER APPLY HEADER.DOC_XML.nodes('ConsolidatedPlayers') AS IMP(DOC)

    OUTER APPLY IMP.DOC.nodes('p') AS P (DOC)

    OUTER APPLY P.DOC.nodes('rnd') AS RND(DOC)

    /* Create HOLES view */

    CREATE VIEW dbo.XML_HOLES

    AS

    SELECT

    ROUNDS.XML_DOC_ID

    ,ROUNDS.T_ID

    ,ROUNDS.ROUNDS_id

    ,ROUNDS.ROUNDS_n

    ,HOLES.DOC.value('@n' ,'INT') AS HOLES_n

    ,HOLES.DOC.value('@cNum','INT') AS HOLES_cNum

    ,HOLES.DOC.value('@sc' ,'INT') AS HOLES_sc

    ,HOLES.DOC.value('@pDay','INT') AS HOLES_pDay

    ,HOLES.DOC.value('@pTot','INT') AS HOLES_pTot

    ,HOLES.DOC.query('(.)') AS HOLES_XML

    FROM dbo.XML_ROUNDS ROUNDS

    OUTER APPLY ROUNDS.ROUND_XML.nodes('rnd/holes/h') AS HOLES(DOC)

    /* Create SHOT view */

    CREATE VIEW dbo.XML_SHOT

    AS

    SELECT

    HOLES.XML_DOC_ID

    ,HOLES.T_ID

    ,HOLES.ROUNDS_id

    ,HOLES.ROUNDS_n

    ,HOLES.HOLES_n

    ,SHOT.DOC.value('@n' ,'INT') AS SHOT_n

    ,SHOT.DOC.value('@pid' ,'INT') AS SHOT_pid

    ,SHOT.DOC.value('@time','INT') AS SHOT_time

    ,SHOT.DOC.value('@putt','CHAR(1)') AS SHOT_putt

    ,SHOT.DOC.value('@t' ,'CHAR(1)') AS SHOT_t

    ,SHOT.DOC.value('@prv' ,'CHAR(1)') AS SHOT_prv

    ,SHOT.DOC.value('@tee' ,'CHAR(1)') AS SHOT_tee

    ,SHOT.DOC.value('@cup' ,'CHAR(1)') AS SHOT_cup

    ,SHOT.DOC.value('@from','VARCHAR(10)') AS SHOT_from

    ,SHOT.DOC.value('@to' ,'VARCHAR(10)') AS SHOT_to

    ,SHOT.DOC.value('@asc' ,'INT') AS SHOT_asc

    ,SHOT.DOC.value('@dist','INT') AS SHOT_dist

    ,SHOT.DOC.value('@left','INT') AS SHOT_left

    ,SHOT.DOC.value('@x' ,'FLOAT') AS SHOT_x

    ,SHOT.DOC.value('@y' ,'FLOAT') AS SHOT_y

    ,SHOT.DOC.value('@z' ,'FLOAT') AS SHOT_z

    ,SHOT.DOC.value('@club','VARCHAR(10)') AS SHOT_club

    ,SHOT.DOC.value('@con' ,'VARCHAR(50)') AS SHOT_con

    ,SHOT.DOC.value('shottext[1]' ,'VARCHAR(250)') AS SHOT_shottext

    FROM dbo.XML_HOLES HOLES

    OUTER APPLY HOLES.HOLES_XML.nodes('h/shot') AS SHOT(DOC)

  • Check This

  • Wow, this didn't even cross my mind at all. Could you explain some of the thinking behind it?

    I spent quite some time on it last week, first trying to work with .nodes(), but it turned out really slow for some reason when I had to get elements from parent nodes. Maybe I don't quite understand it, and the whole point of doing the APPLY there.

    Then I switch to just using OPENXML, which worked a lot faster, 3 seconds to deal with 100k rows of XML data and 8500 that actually would come back from the query. But it's just a select of the scores for now, nothing for the shots:

    DECLARE @xmlText varchar(max)

    SET @xmlText = (SELECT ConsolidatedPlayersCombinedText FROM tour_feeds WHERE id = 1)

    DECLARE @hDoc int

    EXEC sp_xml_preparedocument @hDoc output, @xmlText

    SELECT

    player_id, round_num, hole_number, shots, last_shot_time

    FROM OPENXML(@hdoc, '/ConsolidatedPlayers/p/rnd/holes/h/shot[last()]', 1)

    WITH (

    player_idvarchar(16) '../../../../@id',

    round_numsmallint '../../../@n',

    hole_numbersmallint '../@cNum',

    shots smallint '../@sc',

    last_shot_timevarchar(6) '@time'

    )

    EXEC sp_xml_removedocument @hDoc

  • There is a well-known issue with the xml datatype and methods and parent axis.

    Try this instead:

    IF OBJECT_ID('tempdb..#tmp') IS NOT NULL DROP TABLE #tmp

    GO

    SELECT CAST( ConsolidatedPlayersCombinedText AS XML ) ConsolidatedPlayersCombinedText

    INTO #tmp

    FROM tour_feeds WHERE id = 1

    SELECT

    p.c.value('@id', 'INT') player_id,

    r.c.value('@n', 'INT') round_num,

    h.c.value('@cNum', 'INT') hole_number,

    h.c.value('@sc', 'INT') shots,

    s.c.value('@time', 'VARCHAR(6)') last_shot_time

    FROM #tmp t

    CROSS APPLY t.ConsolidatedPlayersCombinedText.nodes('ConsolidatedPlayers/p') p(c)

    CROSS APPLY p.c.nodes('rnd') r(c)

    CROSS APPLY r.c.nodes('holes/h') h(c)

    CROSS APPLY h.c.nodes('shot[last()]') s(c)

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply