Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««23456»»»

Loading XML Data into SQL Server (SQL Spackle) Expand / Collapse
Author
Message
Posted Tuesday, January 11, 2011 4:24 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Today @ 2:55 AM
Points: 80, Visits: 337
XML is not going to work with HTML as content unless you enclose it with CDATA tags, like so...

<Content>
<![CDATA[
<div id="contentcenter">
<p>Perform a <b>hardware</b> check with the utility to assure the quality of the system driver, etc., etc.</p>
</div>
]]>
</Content>

Steve
Post #1045790
Posted Tuesday, January 11, 2011 7:53 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, January 10, 2014 9:03 AM
Points: 103, Visits: 289
Your sample mainly contains display information. Get the data in XML format (without the display information) and you're all set with Waynes article.

Lutz,

Thank you for that link on the difference btw xml and html. Exactly what I needed.

There was actual data included in the xml file I was sent, but it was mixed in with all that other stuff. The person who sent me the file seemed pretty perplexed that I didn't know how to deal with it and load it into our database. With the comments I have received, I'm still not exactly sure.
Post #1045910
Posted Tuesday, January 11, 2011 9:28 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, July 28, 2011 11:02 AM
Points: 5, Visits: 20
Sorry for the delay guys:

DECLARE @x XML
SELECT @x= CONVERT(xml, BulkColumn, 2) FROM
OPENROWSET(BULK 'C:\data.xml', SINGLE_BLOB) AS x

Post #1045974
Posted Tuesday, January 11, 2011 9:34 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, January 10, 2014 9:03 AM
Points: 103, Visits: 289
SQL-DBA (1/11/2011)
XML is not going to work with HTML as content unless you enclose it with CDATA tags, like so...

<Content>
<![CDATA[
<div id="contentcenter">
<p>Perform a <b>hardware</b> check with the utility to assure the quality of the system driver, etc., etc.</p>
</div>
]]>
</Content>

Steve


Steve,
Do you mean I need to go into the file and type in [CDATA[... where all HTML appears?

Amy
Post #1045981
Posted Tuesday, January 11, 2011 10:18 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Friday, October 17, 2014 7:46 AM
Points: 5,364, Visits: 8,948
weharrelson (1/11/2011)
Sorry for the delay guys:

DECLARE @x XML
SELECT @x= CONVERT(xml, BulkColumn, 2) FROM
OPENROWSET(BULK 'C:\data.xml', SINGLE_BLOB) AS x



Ahh, I see. You loading the file in as I showed in the article, except that you're taking it straight to a variable. Then you're using that variable in sp_xml_preparedocument.


Wayne
Microsoft Certified Master: SQL Server 2008
If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
Links: For better assistance in answering your questions, How to ask a question, Performance Problems, Common date/time routines,
CROSS-TABS and PIVOT tables Part 1 & Part 2, Using APPLY Part 1 & Part 2, Splitting Delimited Strings
Post #1046003
Posted Tuesday, January 11, 2011 10:28 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Friday, October 17, 2014 7:46 AM
Points: 5,364, Visits: 8,948
Amy.G (1/11/2011)
Steve,
Do you mean I need to go into the file and type in [CDATA[... where all HTML appears?

Amy


Well, run this little test:
declare @xml1 XML, @xml2 XML;
set @xml1 = '<Content>
<![CDATA[
<div id="contentcenter">
<p>Perform a <b>hardware</b> check with the utility to assure the quality of the system driver, etc., etc.</p>
</div>
]]>
</Content>';

set @xml2 = '<Content>
<div id="contentcenter">
<p>Perform a <b>hardware</b> check with the utility to assure the quality of the system driver, etc., etc.</p>
</div>
</Content>';
select @xml1, @xml2;

@xml1 has the <![CDATA[ ... ]]>, and @xml2 doesn't.
Notice that in @xml2, that all of the HTML has become tokenized. From what I understand about HTML (very little), there are some tokens where the closing token is optional. If you have any of those in the XML file, SQL is probably going to throw an error.

As is stands, since the HTML is tokenized, it might not get in the way of extracting the data. You'd have to test it to see. Basically, if it can be put into an XML datatype (either variable or table column), then you will probably be able to work with it without having to add the CDATA stuff.


Wayne
Microsoft Certified Master: SQL Server 2008
If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
Links: For better assistance in answering your questions, How to ask a question, Performance Problems, Common date/time routines,
CROSS-TABS and PIVOT tables Part 1 & Part 2, Using APPLY Part 1 & Part 2, Splitting Delimited Strings
Post #1046011
Posted Tuesday, January 11, 2011 2:17 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: 2 days ago @ 1:52 PM
Points: 6,840, Visits: 13,354
Here's a short example to query data like you've posted (please note that I added a few end tags to change it into a valid xml structure as per SQL requirements):
DECLARE @x XML
SET @x='<p>
<div id="content">
<a name="ada-content" />
<div id="browseBox">
<div class="roundedBox_top-left" />
<div class="roundedBox_top-right" />
<div class="licenseTitle">
<h1>
<span id="_SE_FLD" _SE_FLD="tcm:Content/custom:Content/custom:Title[1]">Aboveground Storage Tank (AST) Permits (>1 Million Gallons Total)</span>
</h1>
<div class="licenseDescription">
<span id="_SE_FLD" _SE_FLD="tcm:Content/custom:Content/custom:Description[1]">Minnesota Pollution Control Agency (MPCA)</span>
</div>
</div>
<div class="roundedBox_bottom-buff" />
</div>
<div class="roundedBox_bottom-left" />
<div class="roundedBox_bottom-right" />
<div id="licenseDetail">
<div id="licenseRightColumn">
<h3 style="margin:0;line-height:100%;">Agency contact information:</h3>
</div>
</div>
</div>
</p>'

;WITH cte AS
(
SELECT @x.query('//span') col
)
SELECT c.value('.','varchar(200)') span_result
FROM cte
CROSS APPLY col.nodes('span') T(c)

/* result set
Aboveground Storage Tank (AST) Permits (>1 Million Gallons Total)
Minnesota Pollution Control Agency (MPCA)
*/





Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question
How to post performance related questions
Links for Tally Table , Cross Tabs and Dynamic Cross Tabs , Delimited Split Function
Post #1046159
Posted Tuesday, January 11, 2011 2:37 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, January 10, 2014 9:03 AM
Points: 103, Visits: 289
LutzM (1/11/2011)
Here's a short example to query data like you've posted


I'm really impressed that out of that whole mess you were able to identify the only actual piece of data needed. I have been practicing and got the license title like you did, but then just as I felt I had it licked, a snag:

DECLARE @y XML
SET @y = '
<h1>
<p xmlns:tcm="http://www.tridion.com/ContentManager/5.0" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/1999/xhtml">
Facilities that have more than one million gallons capacity must obtain an individual permit from the MPCA according to Minnesota Rules Chapter <a href="http://www.revisor.leg.state.mn.us/arule/7001/">7001</a>.4200-4250.

</br>

</br>
Owners of Aboveground Storage Tanks larger than 1,100 gallons must notify the Minnesota Pollution Control Agency (MPCA) of the existence of these tanks. In addition, tank owners are required to notify the MPCA of change of product or change of status of ASTs.

</h1>
'
SELECT
y.data.value('p[1]', 'varchar(500)') as Fees
from @y.nodes('h1') y(data);

I think the "xmlns"tcm..." is the problem, b/c when I delete that part, it works fine. But going through the whole file deleting would go against the whole point of this. Do you know a way to ignore it?

Looking back, I guess I was hoping someone would've said "whoever sent you that file is an idiot, it's impossible to read!". Now I'm getting obsessed with it.

Amy


Post #1046166
Posted Tuesday, January 11, 2011 3:22 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: 2 days ago @ 1:52 PM
Points: 6,840, Visits: 13,354
Ok, then we'd need to add the namespace declaration (result as above).

Regarding your comment about the person sending you the file, here's what I'd say:
Whoever sent you that file should remember that SQL Server is still a RDBMS and not a HTML garbage cleaner. But it can do that, too, if there's absolutely nothing else to do.
But, if all you're asking SQL Server to do is to deal with this kind of stuff, you should consider getting a different tool...

The following code is like using a hammer to drive in a screw. So, it's for demonstration purposes only, NOT a recommended solution! The recommended solution would be to get clean XML data.

DECLARE @x XML
SET @x=' <p xmlns:tcm="http://www.tridion.com/ContentManager/5.0" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/1999/xhtml">
<div id="content">
<a name="ada-content" />
<div id="browseBox">
<div class="roundedBox_top-left" />
<div class="roundedBox_top-right" />
<div class="licenseTitle">
<h1>
<span id="_SE_FLD" _SE_FLD="tcm:Content/custom:Content/custom:Title[1]">Aboveground Storage Tank (AST) Permits (>1 Million Gallons Total)</span>
</h1>
<div class="licenseDescription">
<span id="_SE_FLD" _SE_FLD="tcm:Content/custom:Content/custom:Description[1]">Minnesota Pollution Control Agency (MPCA)</span>
</div>
</div>
<div class="roundedBox_bottom-buff" />
</div>
<div class="roundedBox_bottom-left" />
<div class="roundedBox_bottom-right" />
<div id="licenseDetail">
<div id="licenseRightColumn">
<h3 style="margin:0;line-height:100%;">Agency contact information:</h3>
</div>
</div>
</div>
</p>'

;WITH cte AS
(
SELECT @x.query('
declare default element namespace "http://www.w3.org/1999/xhtml";
//span') col
)
SELECT c.value('.','varchar(200)') span_result
FROM cte
CROSS APPLY col.nodes('declare default element namespace "http://www.w3.org/1999/xhtml"; span') T(c)





Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question
How to post performance related questions
Links for Tally Table , Cross Tabs and Dynamic Cross Tabs , Delimited Split Function
Post #1046186
Posted Tuesday, January 11, 2011 3:28 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, January 10, 2014 9:03 AM
Points: 103, Visits: 289
The following code is like using a hammer to drive in a screw. So, it's for demonstration purposes only, NOT a recommended solution! The recommended solution would be to get clean XML data.


I'm going to consider this permission to let it go and do other work.

And thanks for that article. It has proved very helpful.

Amy
Post #1046187
« Prev Topic | Next Topic »

Add to briefcase «««23456»»»

Permissions Expand / Collapse