Chilkat Python HTML Conversion Library is a unique library designed for the purpose of transforming HTML into well-formed XML.
Once HTML is converted to XHTML (i.e. well-formed XML), any existing XML parsing API can be leveraged to extract data.
Chilkat Python HTML Conversion Library converts XML to the best possible plain-text representation.
Here are some key features of "Chilkat Python HTML Conversion Library":
· File-to-file HTML to XML conversion.
· Memory-to-memory HTML to XML conversion.
· File-to-file HTML to plain-text conversion.
· Memory-to-memory HTML to plain-text conversion.
· Convert character encoding during conversion process.
· Flexibility in controlling how HTML entities are handled.
· Automatically convert HTML entities to corresponding 8-bit characters.
· Optionally drop all text formatting tags from the output.
· Drop/undrop specific tags from the output.
Limitations:
· 30 days trial.
What`s New in This Release: [ read full changelog ]
· (Email Object) The AspUnpack and AspUnpack2 methods were fixed to prevent the creation of duplicate HTML files.
· (Zip) On Linux systems only, the AppendFiles method failed in a rare specific circumstance.
· (TAR) Fixed Base256 internal decoding problem to support TAR archives larger than 8GB.
· (Email, MIME, Crypt2) Internal PKCS7 signed-data issue fixed for cases where the signed data contained no authenticated attributes. (??