SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Loading XML Data into SQL Server (SQL Spackle)


Loading XML Data into SQL Server (SQL Spackle)

Author
Message
SQL-DBA
SQL-DBA
SSC Eights!
SSC Eights! (889 reputation)SSC Eights! (889 reputation)SSC Eights! (889 reputation)SSC Eights! (889 reputation)SSC Eights! (889 reputation)SSC Eights! (889 reputation)SSC Eights! (889 reputation)SSC Eights! (889 reputation)

Group: General Forum Members
Points: 889 Visits: 463
XML is not going to work with HTML as content unless you enclose it with CDATA tags, like so...

<Content>
<![CDATA[
<div id="contentcenter">
<p>Perform a <b>hardware</b> check with the utility to assure the quality of the system driver, etc., etc.</p>
</div>
]]>
</Content>

Steve
Amy.G
Amy.G
SSC-Addicted
SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)

Group: General Forum Members
Points: 459 Visits: 290
Your sample mainly contains display information. Get the data in XML format (without the display information) and you're all set with Waynes article.

Lutz,

Thank you for that link on the difference btw xml and html. Exactly what I needed.

There was actual data included in the xml file I was sent, but it was mixed in with all that other stuff. The person who sent me the file seemed pretty perplexed that I didn't know how to deal with it and load it into our database. With the comments I have received, I'm still not exactly sure.
weharrelson
weharrelson
Forum Newbie
Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)

Group: General Forum Members
Points: 9 Visits: 20
Sorry for the delay guys:

DECLARE @x XML
SELECT @x= CONVERT(xml, BulkColumn, 2) FROM
OPENROWSET(BULK 'C:\data.xml', SINGLE_BLOB) AS x
Amy.G
Amy.G
SSC-Addicted
SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)

Group: General Forum Members
Points: 459 Visits: 290
SQL-DBA (1/11/2011)
XML is not going to work with HTML as content unless you enclose it with CDATA tags, like so...

<Content>
<![CDATA[
<div id="contentcenter">
<p>Perform a <b>hardware</b> check with the utility to assure the quality of the system driver, etc., etc.</p>
</div>
]]>
</Content>

Steve


Steve,
Do you mean I need to go into the file and type in [CDATA[... where all HTML appears?

Amy
WayneS
WayneS
SSC-Insane
SSC-Insane (22K reputation)SSC-Insane (22K reputation)SSC-Insane (22K reputation)SSC-Insane (22K reputation)SSC-Insane (22K reputation)SSC-Insane (22K reputation)SSC-Insane (22K reputation)SSC-Insane (22K reputation)

Group: General Forum Members
Points: 22472 Visits: 10657
weharrelson (1/11/2011)
Sorry for the delay guys:

DECLARE @x XML
SELECT @x= CONVERT(xml, BulkColumn, 2) FROM
OPENROWSET(BULK 'C:\data.xml', SINGLE_BLOB) AS x



Ahh, I see. You loading the file in as I showed in the article, except that you're taking it straight to a variable. Then you're using that variable in sp_xml_preparedocument.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
Links: For better assistance in answering your questions, How to ask a question, Performance Problems, Common date/time routines,
CROSS-TABS and PIVOT tables Part 1 & Part 2, Using APPLY Part 1 & Part 2, Splitting Delimited Strings

WayneS
WayneS
SSC-Insane
SSC-Insane (22K reputation)SSC-Insane (22K reputation)SSC-Insane (22K reputation)SSC-Insane (22K reputation)SSC-Insane (22K reputation)SSC-Insane (22K reputation)SSC-Insane (22K reputation)SSC-Insane (22K reputation)

Group: General Forum Members
Points: 22472 Visits: 10657
Amy.G (1/11/2011)
Steve,
Do you mean I need to go into the file and type in [CDATA[... where all HTML appears?

Amy


Well, run this little test:
declare @xml1 XML, @xml2 XML;
set @xml1 = '<Content>
<![CDATA[
<div id="contentcenter">
<p>Perform a <b>hardware</b> check with the utility to assure the quality of the system driver, etc., etc.</p>
</div>
]]>
</Content>';

set @xml2 = '<Content>
<div id="contentcenter">
<p>Perform a <b>hardware</b> check with the utility to assure the quality of the system driver, etc., etc.</p>
</div>
</Content>';
select @xml1, @xml2;


@xml1 has the <![CDATA[ ... ]]>, and @xml2 doesn't.
Notice that in @xml2, that all of the HTML has become tokenized. From what I understand about HTML (very little), there are some tokens where the closing token is optional. If you have any of those in the XML file, SQL is probably going to throw an error.

As is stands, since the HTML is tokenized, it might not get in the way of extracting the data. You'd have to test it to see. Basically, if it can be put into an XML datatype (either variable or table column), then you will probably be able to work with it without having to add the CDATA stuff.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
Links: For better assistance in answering your questions, How to ask a question, Performance Problems, Common date/time routines,
CROSS-TABS and PIVOT tables Part 1 & Part 2, Using APPLY Part 1 & Part 2, Splitting Delimited Strings

LutzM
LutzM
SSC-Insane
SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)

Group: General Forum Members
Points: 24677 Visits: 13559
Here's a short example to query data like you've posted (please note that I added a few end tags to change it into a valid xml structure as per SQL requirements):
DECLARE @x XML
SET @x='<p>
<div id="content">
<a name="ada-content" />
<div id="browseBox">
<div class="roundedBox_top-left" />
<div class="roundedBox_top-right" />
<div class="licenseTitle">
<h1>
<span id="_SE_FLD" _SE_FLD="tcm:Content/custom:Content/custom:Title[1]">Aboveground Storage Tank (AST) Permits (>1 Million Gallons Total)</span>
</h1>
<div class="licenseDescription">
<span id="_SE_FLD" _SE_FLD="tcm:Content/custom:Content/customBigGrinescription[1]">Minnesota Pollution Control Agency (MPCA)</span>
</div>
</div>
<div class="roundedBox_bottom-buff" />
</div>
<div class="roundedBox_bottom-left" />
<div class="roundedBox_bottom-right" />
<div id="licenseDetail">
<div id="licenseRightColumn">
<h3 style="margin:0;line-height:100%;">Agency contact information:</h3>
</div>
</div>
</div>
</p>'

;WITH cte AS
(
SELECT @x.query('//span') col
)
SELECT c.value('.','varchar(200)') span_result
FROM cte
CROSS APPLY col.nodes('span') T(c)

/* result set
Aboveground Storage Tank (AST) Permits (>1 Million Gallons Total)
Minnesota Pollution Control Agency (MPCA)
*/





Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question
How to post performance related questions
Links for Tally Table , Cross Tabs and Dynamic Cross Tabs , Delimited Split Function
Amy.G
Amy.G
SSC-Addicted
SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)

Group: General Forum Members
Points: 459 Visits: 290
LutzM (1/11/2011)
Here's a short example to query data like you've posted


I'm really impressed that out of that whole mess you were able to identify the only actual piece of data needed. I have been practicing and got the license title like you did, but then just as I felt I had it licked, a snag:


DECLARE @y XML
SET @y = '
<h1>
<p xmlns:tcm="http://www.tridion.com/ContentManager/5.0" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/1999/xhtml">
Facilities that have more than one million gallons capacity must obtain an individual permit from the MPCA according to Minnesota Rules Chapter <a href="http://www.revisor.leg.state.mn.us/arule/7001/">7001</a>.4200-4250.

</br>

</br>
Owners of Aboveground Storage Tanks larger than 1,100 gallons must notify the Minnesota Pollution Control Agency (MPCA) of the existence of these tanks. In addition, tank owners are required to notify the MPCA of change of product or change of status of ASTs.

</h1>
'
SELECT
y.data.value('p[1]', 'varchar(500)') as Fees
from @y.nodes('h1') y(data);



I think the "xmlns"tcm..." is the problem, b/c when I delete that part, it works fine. But going through the whole file deleting would go against the whole point of this. Do you know a way to ignore it?

Looking back, I guess I was hoping someone would've said "whoever sent you that file is an idiot, it's impossible to read!". Now I'm getting obsessed with it.

Amy
LutzM
LutzM
SSC-Insane
SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)

Group: General Forum Members
Points: 24677 Visits: 13559
Ok, then we'd need to add the namespace declaration (result as above).

Regarding your comment about the person sending you the file, here's what I'd say:
Whoever sent you that file should remember that SQL Server is still a RDBMS and not a HTML garbage cleaner. But it can do that, too, if there's absolutely nothing else to do.
But, if all you're asking SQL Server to do is to deal with this kind of stuff, you should consider getting a different tool... ;-)

The following code is like using a hammer to drive in a screw. Pinch So, it's for demonstration purposes only, NOT a recommended solution! The recommended solution would be to get clean XML data.

DECLARE @x XML
SET @x=' <p xmlns:tcm="http://www.tridion.com/ContentManager/5.0" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/1999/xhtml">
<div id="content">
<a name="ada-content" />
<div id="browseBox">
<div class="roundedBox_top-left" />
<div class="roundedBox_top-right" />
<div class="licenseTitle">
<h1>
<span id="_SE_FLD" _SE_FLD="tcm:Content/custom:Content/custom:Title[1]">Aboveground Storage Tank (AST) Permits (>1 Million Gallons Total)</span>
</h1>
<div class="licenseDescription">
<span id="_SE_FLD" _SE_FLD="tcm:Content/custom:Content/customBigGrinescription[1]">Minnesota Pollution Control Agency (MPCA)</span>
</div>
</div>
<div class="roundedBox_bottom-buff" />
</div>
<div class="roundedBox_bottom-left" />
<div class="roundedBox_bottom-right" />
<div id="licenseDetail">
<div id="licenseRightColumn">
<h3 style="margin:0;line-height:100%;">Agency contact information:</h3>
</div>
</div>
</div>
</p>'

;WITH cte AS
(
SELECT @x.query('
declare default element namespace "http://www.w3.org/1999/xhtml";
//span') col
)
SELECT c.value('.','varchar(200)') span_result
FROM cte
CROSS APPLY col.nodes('declare default element namespace "http://www.w3.org/1999/xhtml"; span') T(c)





Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question
How to post performance related questions
Links for Tally Table , Cross Tabs and Dynamic Cross Tabs , Delimited Split Function
Amy.G
Amy.G
SSC-Addicted
SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)SSC-Addicted (459 reputation)

Group: General Forum Members
Points: 459 Visits: 290
The following code is like using a hammer to drive in a screw. Pinch So, it's for demonstration purposes only, NOT a recommended solution! The recommended solution would be to get clean XML data.


I'm going to consider this permission to let it go and do other work.

And thanks for that article. It has proved very helpful.

Amy
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search