Discussion:
HTML to plain text
(too old to reply)
k***@gmail.com
2006-12-01 13:38:44 UTC
Permalink
Hello,

I need an easy way to convert HTML email to plain text in Delphi. In VB
it's easy with CDONTS and AutoGenerateTextBody property (so even an
example with CreateOLEObject("cdo.message") will do).
The problem with simple functions and free components you find on the
internet is that they don't support everything they should (HTML4,
style sheets, cutting javescript code etc)

In VB it's:
Set msg = CreateObject("cdo.message")
msg.AutoGenerateTextBody = True

then msg.TextBody is your friend. How do I do this with Delphi?

M.K.
Dennis Passmore
2006-12-01 15:33:03 UTC
Permalink
Just go to http://delphi.icm.edu.pl/newl/d40/f014_002.htm
and download htmlprsr.zip as it contains a HTMparser that will let you
save the Text to file.
k***@gmail.com
2006-12-01 19:40:56 UTC
Permalink
Why this code doesn't show anything?

var oMSG, oStm: OleVariant;

oMSG:=CreateOleObject('CDO.Message');
oStm:=oMSG.GetStream;
ostm.LoadFromFile('D:\test.eml');
s:=omsg.HTMLBody;
ShowMessage(s);


Unfortunately, it's a bit more complex problem then just removing tags
with a parser. Ok, it's not too hard to exclude scripts and styles
using a parser, but there's much more junk..like spaces are at times
ingored by the browsers, columns, different layouts...
I'm not saying cdonts is perfect but it seems it does more then just
tags removing (though Microsoft describes it as a tag remover)

M.K.
Post by Dennis Passmore
Just go to http://delphi.icm.edu.pl/newl/d40/f014_002.htm
and download htmlprsr.zip as it contains a HTMparser that will let you
save the Text to file.
Ralf Junker - http://www.yunqa.de/delphi/
2006-12-02 10:55:04 UTC
Permalink
Post by k***@gmail.com
I need an easy way to convert HTML email to plain text in Delphi. In VB
it's easy with CDONTS and AutoGenerateTextBody property (so even an
example with CreateOLEObject("cdo.message") will do).
The problem with simple functions and free components you find on the
internet is that they don't support everything they should (HTML4,
style sheets, cutting javescript code etc)
DIHtmlParser (http://www.yunqa.de/delphi/) recognizes everything known to HTML,
plus a bit more (ASP, PHP, etc.). There is a demo project to extract plain text
from HTML. Unicode is supported, too.

Regards,

Ralf

---
The Delphi Inspiration
http://www.yunqa.de/delphi/

Loading...