Hidden carriage returns

  • My front end is a .Net web application. Users can enter text (which will subsequently be used as the 'body' text in an email) in a custom 'rich text box' in the browser. Something like the box I am typing in now - where you can format the text with html markup.

    The custom box in question - let's call it HTMLBox1 - has these methods ... HTMLBox1.Text and HTMLBox.Plain

    The first method gives you the html of what is entered in the box. The second method is supposed to give you plain text.

    I save the html text and the plain text in separate text fields in a SQL Server 2008 database.

    For example: The data entered in the rich text box might be ...

    Dear Customer,

    We'd like to invite you to a seminar etc.

    In the database field that stores the HTML version of the text, this might get stored as

    <p>Dear Customer,<br />We'd like to invite you to a seminar</p>

    In the database field that stores the PLAIN version of the text, this is getting stored as:

    Dear Customer, & n b s p; We'd like to invite you to a seminar

    The custom Rich Text Box is outputting the html markup for a space (which I have put spaces in above so it shows) in what is supposed to be the PLAIN text output. Later a web service retrieves the data and sends emails using the data as the body of the email. (The .Net smtpClient sends two versions of the email, an HTML version and an alternate PLAIN version.) Most people who receive the email, who view it as Plain text, seem to see it okay. Most email clients seem to translate the html markup for a space into a carriage return.

    But, some users - who are viewing their emails as Plain text - are complaining that they see the html markup.

    So, my question is, when the 'Plain Text' version of whatever is entered in the Rich Text box is being written to the database, how can I replace the html markup for a space with a carriage return that, when the data is subsequently retrieved and sent as the body of an email, will contain a carriage return that all email clients are happy with?

    As an aside, I note that when I store data entered into a <textarea> in the front end into a text field in SQL Server, if I look at the text (using the example above) the data in the SQL Server Text field looks like this:

    Dear Customer, We'd like to invite you to a seminar

    If I retrieve this data and display it in a web browser, it displays like this:

    Dear Customer,

    We'd like to invite you to a seminar

    If you View Source in the browser there is a '<br />' after the comma in the first line - to give you your line feed.

    But, what is being stored in SQL Server that is being returned to a front end, that means the front end says 'Ahh, I need to put a '<br />' here. If you look in the SQL Server text field there are no characters showing that would indicate a carriage return (or line feed) is being stored. I'm baffled. How does the front end know there is a carriage return (or line feed) embedded in what is being returned from the database.

    Thanks for any help.

  • i'm pretty sure there is a CHAR(13) + CHAR(10) in there, that you are not seeing, but is part of the data...THAT is what is getting replaced witha < br /> tag, and not a space.

    this is especially true in SSMS when looking at data in the grid: the CrLf is in the data, but is NOT used in the gridview, because you cannot scroll DOWN in a cell, so it is stripped out for presentation purposes in SSMS; switching to Text View(control+T) would prove this.

    for example, i'm sure your data looks like this:

    Dear Customer, & n b s p; [CrLf] We'd like to invite you to a seminar[CrLf]

    and when converted to html changes or appends a < br /> tag to the end of the [CrLf]

    if you use a different text editor, for example Editplus, you can use it to see spaces,tabs and CrLf a lot easier. most Notepad replacements like Editplus, UltraEdit, NotePad++ and many others do the same.

    see how a space is a floating dot, a CrLf is the paragraph sign and a tab is double arrow thingy...makes it much easier to visualize using a better text editor like this.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Hi, and thank you for your reply.

    I ran 'Select TextPlain FROM DistListEmailTbl' and looked at it in Text View. The text looked like this:

    Dear Committee Member,

    We'd like to invite you to etc.

    But no sign of a carriage return or (char) 13 ... or anything else. Would you expect that these would still be invisible?

    This proves there is some sort of a return in there, I'd like to know exactly what it is. Users are pasting data into the Rich Text box from Outlook where they use Word as a text editor. What gets pasted in there is truly awful. Loads of Office specific tags, references to style sheets that don't exist outside Office etc etc. The plain text generated by the Rich Text box's HTMLTextBox1.Plain method, is not accurate - so I want to write a routine to strip all the HTML tags out of the text - to create my own 'plain text' version. What should I embed as a carriage return that will go into the Sql Server field and, when retrieved, provide carriage returns in email clients that only interpret plain text and which, for example, are running on UNIX - not Windows.?

    Thanks again.

  • I'm sure the CrLf is in there, it's just not visible based ont he tools you are using;

    Proof: you said it looks like this:

    Dear Committee Member,

    We'd like to invite you to etc.

    two lines means there is a CrLf there.

    I've used this code to strip HTML out.

    try it on one of your columns, view it in TextMode, and confirm the results span more than one line;

    if it does not, you might want to repalce

    tags with CHAR(13) +CHAR(10), which will make it plain text readable for both Win and Unix(which uses just CHAR(10) i believe for a new line)

    --===== Replace all HTML tags with nothing

    WHILE CHARINDEX(' CHARINDEX('<',@HTMLText)

    SELECT @HTMLText = STUFF(@HTMLText,

    CHARINDEX('<',@HTMLText),

    CHARINDEX('>',@HTMLText)-CHARINDEX('<',@HTMLText)+1,

    '')

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Again, thanks very much for your help. I'll give that a try.

    I'm wondering if there a way to strip out all html tags but leave in ones that begin with <a

    Plain text seems to handle hyperlinks okay and I'll need to leave them in there.

    Cheers

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply