|
|
|
Valued Member
      
Group: General Forum Members
Last Login: Thursday, May 09, 2013 1:13 AM
Points: 73,
Visits: 270
|
|
XML is not going to work with HTML as content unless you enclose it with CDATA tags, like so...
<Content> <![CDATA[ <div id="contentcenter"> <p>Perform a <b>hardware</b> check with the utility to assure the quality of the system driver, etc., etc.</p> </div> ]]> </Content>
Steve
|
|
|
|
|
SSC Journeyman
      
Group: General Forum Members
Last Login: Friday, May 10, 2013 9:35 AM
Points: 94,
Visits: 258
|
|
Your sample mainly contains display information. Get the data in XML format (without the display information) and you're all set with Waynes article.
Lutz,
Thank you for that link on the difference btw xml and html. Exactly what I needed.
There was actual data included in the xml file I was sent, but it was mixed in with all that other stuff. The person who sent me the file seemed pretty perplexed that I didn't know how to deal with it and load it into our database. With the comments I have received, I'm still not exactly sure.
|
|
|
|
|
Forum Newbie
      
Group: General Forum Members
Last Login: Thursday, July 28, 2011 11:02 AM
Points: 5,
Visits: 20
|
|
Sorry for the delay guys:
DECLARE @x XML SELECT @x= CONVERT(xml, BulkColumn, 2) FROM OPENROWSET(BULK 'C:\data.xml', SINGLE_BLOB) AS x
|
|
|
|
|
SSC Journeyman
      
Group: General Forum Members
Last Login: Friday, May 10, 2013 9:35 AM
Points: 94,
Visits: 258
|
|
SQL-DBA (1/11/2011) XML is not going to work with HTML as content unless you enclose it with CDATA tags, like so...
<Content> <![CDATA[ <div id="contentcenter"> <p>Perform a <b>hardware</b> check with the utility to assure the quality of the system driver, etc., etc.</p> </div> ]]> </Content>
Steve
Steve, Do you mean I need to go into the file and type in [CDATA[... where all HTML appears?
Amy
|
|
|
|
|
SSCertifiable
       
Group: General Forum Members
Last Login: Yesterday @ 8:32 PM
Points: 6,367,
Visits: 8,224
|
|
weharrelson (1/11/2011) Sorry for the delay guys:
DECLARE @x XML SELECT @x= CONVERT(xml, BulkColumn, 2) FROM OPENROWSET(BULK 'C:\data.xml', SINGLE_BLOB) AS x
Ahh, I see. You loading the file in as I showed in the article, except that you're taking it straight to a variable. Then you're using that variable in sp_xml_preparedocument.
Wayne Microsoft Certified Master: SQL Server 2008 If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it! Links: For better assistance in answering your questions, How to ask a question, Performance Problems, Common date/time routines, CROSS-TABS and PIVOT tables Part 1 & Part 2, Using APPLY Part 1 & Part 2, Splitting Delimited Strings
|
|
|
|
|
SSCertifiable
       
Group: General Forum Members
Last Login: Yesterday @ 8:32 PM
Points: 6,367,
Visits: 8,224
|
|
Amy.G (1/11/2011) Steve, Do you mean I need to go into the file and type in [CDATA[... where all HTML appears?
Amy
Well, run this little test:
declare @xml1 XML, @xml2 XML; set @xml1 = '<Content> <![CDATA[ <div id="contentcenter"> <p>Perform a <b>hardware</b> check with the utility to assure the quality of the system driver, etc., etc.</p> </div> ]]> </Content>';
set @xml2 = '<Content> <div id="contentcenter"> <p>Perform a <b>hardware</b> check with the utility to assure the quality of the system driver, etc., etc.</p> </div> </Content>'; select @xml1, @xml2; @xml1 has the <![CDATA[ ... ]]>, and @xml2 doesn't. Notice that in @xml2, that all of the HTML has become tokenized. From what I understand about HTML (very little), there are some tokens where the closing token is optional. If you have any of those in the XML file, SQL is probably going to throw an error.
As is stands, since the HTML is tokenized, it might not get in the way of extracting the data. You'd have to test it to see. Basically, if it can be put into an XML datatype (either variable or table column), then you will probably be able to work with it without having to add the CDATA stuff.
Wayne Microsoft Certified Master: SQL Server 2008 If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it! Links: For better assistance in answering your questions, How to ask a question, Performance Problems, Common date/time routines, CROSS-TABS and PIVOT tables Part 1 & Part 2, Using APPLY Part 1 & Part 2, Splitting Delimited Strings
|
|
|
|
|
SSCertifiable
       
Group: General Forum Members
Last Login: Wednesday, April 24, 2013 3:17 PM
Points: 6,731,
Visits: 12,131
|
|
Here's a short example to query data like you've posted (please note that I added a few end tags to change it into a valid xml structure as per SQL requirements):
DECLARE @x XML SET @x='<p> <div id="content"> <a name="ada-content" /> <div id="browseBox"> <div class="roundedBox_top-left" /> <div class="roundedBox_top-right" /> <div class="licenseTitle"> <h1> <span id="_SE_FLD" _SE_FLD="tcm:Content/custom:Content/custom:Title[1]">Aboveground Storage Tank (AST) Permits (>1 Million Gallons Total)</span> </h1> <div class="licenseDescription"> <span id="_SE_FLD" _SE_FLD="tcm:Content/custom:Content/custom:Description[1]">Minnesota Pollution Control Agency (MPCA)</span> </div> </div> <div class="roundedBox_bottom-buff" /> </div> <div class="roundedBox_bottom-left" /> <div class="roundedBox_bottom-right" /> <div id="licenseDetail"> <div id="licenseRightColumn"> <h3 style="margin:0;line-height:100%;">Agency contact information:</h3> </div> </div> </div> </p>'
;WITH cte AS ( SELECT @x.query('//span') col ) SELECT c.value('.','varchar(200)') span_result FROM cte CROSS APPLY col.nodes('span') T(c)
/* result set Aboveground Storage Tank (AST) Permits (>1 Million Gallons Total) Minnesota Pollution Control Agency (MPCA) */
Lutz A pessimist is an optimist with experience.
How to get fast answers to your question How to post performance related questions Links for Tally Table , Cross Tabs and Dynamic Cross Tabs , Delimited Split Function
|
|
|
|
|
SSC Journeyman
      
Group: General Forum Members
Last Login: Friday, May 10, 2013 9:35 AM
Points: 94,
Visits: 258
|
|
LutzM (1/11/2011) Here's a short example to query data like you've posted
I'm really impressed that out of that whole mess you were able to identify the only actual piece of data needed. I have been practicing and got the license title like you did, but then just as I felt I had it licked, a snag:
DECLARE @y XML SET @y = ' <h1> <p xmlns:tcm="http://www.tridion.com/ContentManager/5.0" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/1999/xhtml"> Facilities that have more than one million gallons capacity must obtain an individual permit from the MPCA according to Minnesota Rules Chapter <a href="http://www.revisor.leg.state.mn.us/arule/7001/">7001</a>.4200-4250. </br> </br> Owners of Aboveground Storage Tanks larger than 1,100 gallons must notify the Minnesota Pollution Control Agency (MPCA) of the existence of these tanks. In addition, tank owners are required to notify the MPCA of change of product or change of status of ASTs. </h1> ' SELECT y.data.value('p[1]', 'varchar(500)') as Fees from @y.nodes('h1') y(data);
I think the "xmlns"tcm..." is the problem, b/c when I delete that part, it works fine. But going through the whole file deleting would go against the whole point of this. Do you know a way to ignore it?
Looking back, I guess I was hoping someone would've said "whoever sent you that file is an idiot, it's impossible to read!". Now I'm getting obsessed with it.
Amy
|
|
|
|
|
SSCertifiable
       
Group: General Forum Members
Last Login: Wednesday, April 24, 2013 3:17 PM
Points: 6,731,
Visits: 12,131
|
|
Ok, then we'd need to add the namespace declaration (result as above).
Regarding your comment about the person sending you the file, here's what I'd say: Whoever sent you that file should remember that SQL Server is still a RDBMS and not a HTML garbage cleaner. But it can do that, too, if there's absolutely nothing else to do. But, if all you're asking SQL Server to do is to deal with this kind of stuff, you should consider getting a different tool... 
The following code is like using a hammer to drive in a screw. So, it's for demonstration purposes only, NOT a recommended solution! The recommended solution would be to get clean XML data.
DECLARE @x XML SET @x=' <p xmlns:tcm="http://www.tridion.com/ContentManager/5.0" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/1999/xhtml"> <div id="content"> <a name="ada-content" /> <div id="browseBox"> <div class="roundedBox_top-left" /> <div class="roundedBox_top-right" /> <div class="licenseTitle"> <h1> <span id="_SE_FLD" _SE_FLD="tcm:Content/custom:Content/custom:Title[1]">Aboveground Storage Tank (AST) Permits (>1 Million Gallons Total)</span> </h1> <div class="licenseDescription"> <span id="_SE_FLD" _SE_FLD="tcm:Content/custom:Content/custom:Description[1]">Minnesota Pollution Control Agency (MPCA)</span> </div> </div> <div class="roundedBox_bottom-buff" /> </div> <div class="roundedBox_bottom-left" /> <div class="roundedBox_bottom-right" /> <div id="licenseDetail"> <div id="licenseRightColumn"> <h3 style="margin:0;line-height:100%;">Agency contact information:</h3> </div> </div> </div> </p>'
;WITH cte AS ( SELECT @x.query(' declare default element namespace "http://www.w3.org/1999/xhtml"; //span') col ) SELECT c.value('.','varchar(200)') span_result FROM cte CROSS APPLY col.nodes('declare default element namespace "http://www.w3.org/1999/xhtml"; span') T(c)
Lutz A pessimist is an optimist with experience.
How to get fast answers to your question How to post performance related questions Links for Tally Table , Cross Tabs and Dynamic Cross Tabs , Delimited Split Function
|
|
|
|
|
SSC Journeyman
      
Group: General Forum Members
Last Login: Friday, May 10, 2013 9:35 AM
Points: 94,
Visits: 258
|
|
The following code is like using a hammer to drive in a screw.  So, it's for demonstration purposes only, NOT a recommended solution! The recommended solution would be to get clean XML data.
I'm going to consider this permission to let it go and do other work.
And thanks for that article. It has proved very helpful.
Amy
|
|
|
|