Loading XML into SQL via XML Loader Tool

  • Hello,

    I am trying to load an XML file into SQL Server via the XML Bulk Load Tool.

    However I am having difficulty in writing the mapping file with the XML Bulk Load Tool needs.

    My document follows the following DTD spec. I need only the mailbox element and its children.

    Can you please help me in writing the mapping file.

    <!ELEMENT site (regions, categories, catgraph, people, open_auctions, closed_auctions)>

    <!ELEMENT categories (category+)>

    <!ELEMENT category (name, description)>

    <!ATTLIST category id ID #REQUIRED>

    <!ELEMENT name (#PCDATA)>

    <!ELEMENT description (text | parlist)>

    <!ELEMENT text (#PCDATA | bold | keyword | emph)*>

    <!ELEMENT bold (#PCDATA | bold | keyword | emph)*>

    <!ELEMENT keyword (#PCDATA | bold | keyword | emph)*>

    <!ELEMENT emph (#PCDATA | bold | keyword | emph)*>

    <!ELEMENT parlist (listitem)*>

    <!ELEMENT listitem (text | parlist)*>

    <!ELEMENT catgraph (edge*)>

    <!ELEMENT edge EMPTY>

    <!ATTLIST edge from IDREF #REQUIRED to IDREF #REQUIRED>

    <!ELEMENT regions (africa, asia, australia, europe, namerica, samerica)>

    <!ELEMENT africa (item*)>

    <!ELEMENT asia (item*)>

    <!ELEMENT australia (item*)>

    <!ELEMENT namerica (item*)>

    <!ELEMENT samerica (item*)>

    <!ELEMENT europe (item*)>

    <!ELEMENT item (location, quantity, name, payment, description, shipping, incategory+, mailbox)>

    <!ATTLIST item id ID #REQUIRED

    featured CDATA #IMPLIED>

    <!ELEMENT location (#PCDATA)>

    <!ELEMENT quantity (#PCDATA)>

    <!ELEMENT payment (#PCDATA)>

    <!ELEMENT shipping (#PCDATA)>

    <!ELEMENT reserve (#PCDATA)>

    <!ELEMENT incategory EMPTY>

    <!ATTLIST incategory category IDREF #REQUIRED>

    <!ELEMENT mailbox (mail*)>

    <!ELEMENT mail (from, to, date, text)>

    <!ELEMENT from (#PCDATA)>

    <!ELEMENT to (#PCDATA)>

    <!ELEMENT date (#PCDATA)>

  • Umm? Something seems missing here. You've given us a DTD, what do you want us to do with it?

    (ps., an XSD would probably be easier)

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • I only have the DTD and a very large XML File (1 GB).

    I need to write the mapping file. I am not able to find any guidance on how to write this mapping file. The Sample on MS site is very simple where the node being loaded is right under the root node of the XML document.

    In my case, the node which is to be loaded into the database occurs inside multiple elements and can occur much deaper in the hierarchy.

    I need some guidance on writing the mapping file where the node is deep inside the XML document and not right at the top as shown in samples at MS site.

  • Do you know the structure of your XML file?

    If so, can you post some sample data to describe the structure?



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • The DTD here gives a good idea of the structure of XML file.

    If I could fire an XPATH expression of //mailbox, then I can get all the mailbox elements which exist in the document irrespective of the path/structure of the XML file.

    However I have no idea if XPath expressions like the one above can be used in the mapping file of XML Bulk Load Tool.

  • lmu92 (2/28/2010)


    Do you know the structure of your XML file?

    If so, can you post some sample data to describe the structure?

    I agree with Lutz. We can't help you very much unless you give us something concrete to work with.

    So far, you've given us the Input format (the DTD) so that's good. What we still need are:

    1) Output Format: This is the definition of the Table(s) it should go into. Please provide this as CREATE TABLE statement(s). Include your primary key and any foreign keys or other table constraints.

    2) Sample Input data: Take your 1GB XML File and edit it down to just 2 to 5 second level element-groups (I assume their is only one top-level element group). Zip that up and attach it to a post here.

    Hopefully, this should be enough for us to work with...

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Please find the test data file attached. it is zipped. (1.73 MB)

    I am only interested in the mail element other data can be discarded.

    the table where I want to load the content of the mailbox element is

    CREATE TABLE [dbo].[Messages](

    [id] [int] IDENTITY(1,1) NOT NULL,

    [to] [nvarchar](500) NOT NULL,

    [from] [nvarchar](50) NOT NULL,

    [date] [datetime] NOT NULL,

    [text] [ntext] NOT NULL,

    CONSTRAINT [PK_Messages] PRIMARY KEY CLUSTERED

    (

    [id] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

  • whoa! Did you know that this is embedding subordinate elements inside of text content? I'm not even sure if that's valid XML...

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • RBarryYoung (3/2/2010)


    whoa! Did you know that this is embedding subordinate elements inside of text content? I'm not even sure if that's valid XML...

    To me it looks more likt html formatting rather than subordinate elements. See the following example inside

    <bold>childish carrions imaginary wooden preventions bounteous sounded consider sayings fishified fine prime may</bold>

    ponderous doubtful rite

    <emph>dotage discipline choughs mew here</emph>

    vill discontent manage beatrice straight muse shame prays maecenas any conveyance fingers whereupon child case

    <keyword>season presently victory women beating</keyword>

    @SQL-Student: would you please confirm?

    Edit: when reading the content of the text element, do you need to include the < keyword > or < emph > tags, too? Or would the plain text be sufficient?

    If the former, it's going to be hard to do since it seems like XQUERY ignores the tags while extracting the content...



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Lutz:

    No, its in the DTD too (plus, if it was supposed to be embedded HTML, it would have to be entitized). So I guess it is valid XML after all. But its the kind of thing that I like to avoid, because it sure can give lots of XML consumers fits.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • SQL-Student:

    Can you prefix the DTD into the beginning of your XML file before importing, or do you have to keep them separate?

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • I just realized, there are actually three different things that the doc refers to as "XML Bulk Load" at one point or another. Which one did you mean? Can you post us an example of the command or code that you are trying to use?

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • The XML is 100% well formed. Since the XML Bulk loader tool was not working I wrote a .NET client to load the data using the XmlTextReader API and it worked.

    I would still like to learn how this could have been done using XML Bulk Load tool.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply