Thanks jeff. Yes I just need the string with no HTML tags. IF there is new line I do not need another line, but need one string for each html string. Hope I am making sense. Thanks again.
Understood. Let's nail the requirements right to the floor now. If you have the following string, what precisely do you want to get back?
<data1>somedata1a</data1><data2>somedata2a</data2>
<data1>somedata1b</data1><data2>somedata2b</data2>
<data1>somedata1c</data1><data2>somedata2c</data2>
--Jeff Moden
Change is inevitable... Change for the better is not.
Jeff Moden (10/30/2011)
Understood. Let's nail the requirements right to the floor now. If you have the following string, what precisely do you want to get back?
<data1>somedata1a</data1><data2>somedata2a</data2>
<data1>somedata1b</data1><data2>somedata2b</data2>
<data1>somedata1c</data1><data2>somedata2c</data2>
If these strings are loaded in three rows then I need three rows with no html tags... I need to replace the html tags with '' on a column which has html text. The data type is ntext for that sql column.
Thanks a lot.
Got it. Thanks.
--Jeff Moden
Change is inevitable... Change for the better is not.
I refactored the function that I found at the following URL... look for the answer posted by "dudeNumber4"...
CREATE FUNCTION dbo.StripHtmlTags
(@HtmlText XML )
RETURNS NVARCHAR(MAX)
AS
BEGIN
RETURN (
SELECT contents.value('.', 'nvarchar(max)')
FROM (
SELECT contents = chunks.chunk.query('.') FROM @HtmlText.nodes('/') AS chunks(chunk)
) doc
)
END
;
I tried turning it into an iTVF instead of a scalar function and it took twice as long to run.
--Jeff Moden
Change is inevitable... Change for the better is not.
Thanks a lot Jeff. This is a good solution for well formatted html. Thanks a lot again Jeff.
Jeff Moden (10/30/2011)
I tried turning it into an iTVF instead of a scalar function and it took twice as long to run.
I also tried and it took half as long....
CREATE FUNCTION dbo.StripHtmlTags2
(@HtmlText XML )
RETURNS TABLE
AS
RETURN
(
SELECT
@HtmlText.value('(.)[1]', 'nvarchar(max)') AS result
)
Is this how yours ended up?
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
Jeff Moden (10/30/2011)
I refactored the function that I found at the following URL... look for the answer posted by "dudeNumber4"...
CREATE FUNCTION dbo.StripHtmlTags
(@HtmlText XML )
RETURNS NVARCHAR(MAX)
AS
BEGIN
RETURN (
SELECT contents.value('.', 'nvarchar(max)')
FROM (
SELECT contents = chunks.chunk.query('.') FROM @HtmlText.nodes('/') AS chunks(chunk)
) doc
)
END
;
I tried turning it into an iTVF instead of a scalar function and it took twice as long to run.
This is pretty slick. Of course for must of the stuff our marketing department puts together it will fail because they seem incapable of putting together actual valid html. They routinely leave off closing tags which will cause this to fail parsing the html string.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
mister.magoo (10/31/2011)
Jeff Moden (10/30/2011)
I tried turning it into an iTVF instead of a scalar function and it took twice as long to run.I also tried and it took half as long....
CREATE FUNCTION dbo.StripHtmlTags2
(@HtmlText XML )
RETURNS TABLE
AS
RETURN
(
SELECT
@HtmlText.value('(.)[1]', 'nvarchar(max)') AS result
)
Is this how yours ended up?
No... that's even more clever than the one I found and posted. I'll have to give that I try. Thanks, Magoo.
--Jeff Moden
Change is inevitable... Change for the better is not.
Found this code for removing HTML tags from my data, but I am ashamed to say, I have no clue how it works. Could someone point me in the right direction so I can understand what it is doing?
create function dbo.StripHTML( @text varchar(max) ) returns varchar(max) as
begin
declare @textXML xml
declare @result varchar(max)
set @textXML = @text;
with doc(contents) as
(
select chunks.chunk.query('.') from @textXML.nodes('/') as chunks(chunk)
)
select @result = contents.value('.', 'varchar(max)') from doc
return @result
end
go
What the heck is chunks.chunk.query?? I consider myself to be a Sr. DBA but this has me stumped. I figure it is a xml function of some type but can't seem to find any information on it. Any help greatly appreicated!!!
Jim
I know it has been quite some time since this was posted, but I have a question. The code is looking for an EXACT match to the start tag. However, if you have additional info (e.g. <span style=...> or <div class=...> and </span> or </div> you get a parsing error.
Is there any way to avoid that? I've found some code that parses character by character but we know that is, like
Jim Youmans-439383 (4/6/2012)
Found this code for removing HTML tags from my data, but I am ashamed to say, I have no clue how it works. Could someone point me in the right direction so I can understand what it is doing?create function dbo.StripHTML( @text varchar(max) ) returns varchar(max) as
begin
declare @textXML xml
declare @result varchar(max)
set @textXML = @text;
with doc(contents) as
(
select chunks.chunk.query('.') from @textXML.nodes('/') as chunks(chunk)
)
select @result = contents.value('.', 'varchar(max)') from doc
return @result
end
go
What the heck is chunks.chunk.query?? I consider myself to be a Sr. DBA but this has me stumped. I figure it is a xml function of some type but can't seem to find any information on it. Any help greatly appreicated!!!
Jim
chunks.chunk is the name of the derived table and column name created in the FROM clause (as chunks(chunk)), that's all.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 20 total)
You must be logged in to reply to this topic. Login to reply