Extract PDF
- From: "Mark Storer" <MStorer (at) cardiff.com>
- Date: Mon, 13 Nov 2006 17:08:30 -0800
How do you expect to manipulate the PDF? Text in particular can be unlike anything you might expect:
Custom encodings (byte values may have nothing to do with ASCII, Unicode, or anything else... encodings are often whipped up on the fly)
Absent encodings (raw glyph indexes into a font with no character mapping info)
Images (line or raster art)
The order elements on a PDF page appear in is dictated solely by Z-order. All the instances of a particular image might be drawn in succession, followed by all the characters in font X, then font Y, and so on. It's legal (though unheard of) to draw all the 'a's, then all the 'b's...
The PDFs resources (fonts, images, color definitions, many others) are accessible, but may not be terribly useful. As I mentioned earlier, a font in a PDF can be stripped of everything but the bare necessities needed to display the glyphs within that PDF.
--Mark Storer
Senior Software Engineer
Cardiff.com
#include <disclaimer>
typedef std::Disclaimer<Cardiff> DisCard;
> -----Original Message-----
> From: itext-questions-bounces (at) lists.sourceforge.net
> [mailto:itext-questions-bounces (at) lists.sourceforge.net]On Behalf Of S3
> Sent: Monday, November 13, 2006 4:16 PM
> To: itext-questions (at) lists.sourceforge.net
> Subject: [iText-questions] Extract PDF
>
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Is there any standard utility to extract everything
> in a PDF to an XML file (and images and fonts
> in separate files) for easy manipulation?
> (If not, I should write one!)
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.5-ecc0.1.6 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFFWQrRxzVgPqtIcfsRAvAbAJ41ubS9m+lEQ+PjK8Rq7C0K9xyFBACeNggb
> PhFIKcIUKhyeNlwF/V+Q4UQ=
> =u9lg
> -----END PGP SIGNATURE-----
>
> --------------------------------------------------------------
> -----------
> Using Tomcat but need to do more? Need to support web
> services, security?
> Get stuff done quickly with pre-integrated technology to make
> your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on
> Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&
dat=121642
_______________________________________________
iText-questions mailing list
iText-questions (at) lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
iText-questions mailing list
iText-questions (at) lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions