Regex Expression to remove RTF tags

  • Hi There,

    I have a file that I get from pulling out values from a Microsoft Lync 2013 conversation that has RTF formatting tags. An example file would be like:

    Here is my file:

    {\rtf1\fbidis\ansi\ansicpg1254\deff0ouicompat\deflang1055{\fonttbl{\f0\fnil\fcharset162 Segoe UI;}{\f1\fnil\fcharset238 Segoe UI;}{\f2\fnil Segoe UI;}}

    {\colortbl ;\red0\green0\blue0;}

    {\*\generator Riched20 15.0.4420}{\*\mmathPr\mwrapIndent1440 }\viewkind4\uc1

    \pard\cf1\embo\f0\fs20 emaillerini\embo0 \embo al \embo0 \f2\par

    {\*\lyncflags rtf=1}}

    I want to remove RTF tags and and just pull out the text of the conversation. So the result of my function should be:

    emaillerini al

    BTW, I been using Microsoft SQL Server Report Builder for that. I have a expression like below and but it's not working.

    =SWITCH(Fields!ContentType.Value = "text/rtf",Code.ConvertRtfToTextRegex(Fields!Body.Value),

    Fields!ContentType.Value = "text/plain",Fields!Body.Value,

    Fields!ContentType.Value = "text/html",System.Text.RegularExpressions.Regex.Replace(Fields!Body.Value, "\<[^\>]+\>", ""),

    Fields!ContentType.Value <> "", Fields!Body.Value

    )

    Thanks & Regards,

  • Any suggestions?

  • Sel what I've done myself is to create a CLR function that returns RTF2Text, or vice versa;

    that's the best way i know of, because the RichTextBox from windows.forms handles ALL the rules you might miss with more complex RTF documents, by just doing string manipulation.

    If it's an option, i can post an example project.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Lowell (3/13/2013)


    Sel what I've done myself is to create a CLR function that returns RTF2Text, or vice versa;

    that's the best way i know of, because the RichTextBox from windows.forms handles ALL the rules you might miss with more complex RTF documents, by just doing string manipulation.

    If it's an option, i can post an example project.

    First of all thanks for your response. can you please give me an example project? Secondly how do I accommodate to my project your project? Please clarify.

    My project is here : https://skydrive.live.com/#cid=2FA6294B3E381151&id=2FA6294B3E381151!128

    Thanks,

  • Any suggestions ?

  • I'm assuming this reply is too late for your use but in case anyone else needs...

    This works well for me:

    \\\w+|\{.*?\}|}

  • erik 25824 (4/19/2013)


    I'm assuming this reply is too late for your use but in case anyone else needs...

    This works well for me:

    \\\w+|\{.*?\}|}

    Like you said , I've changed regex expression. But I've having same problem. Can you send me the report file which is rdl extension that you are using?

    Regards and thanks,

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply