Extract PDF
- From: S3 <stein (at) duvel.ir.iit.edu>
- Date: Mon, 13 Nov 2006 19:36:01 -0600
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
I realize that there are plenty of strange things that
it could potentially do, but I bet in most common circumstances
it is not an issue. I _DON'T_ want it in a format like what
the PDF was made from (like turning it into LaTeX for example).
I just want the raw data contained in the PDF.
Since you can usually select and copy and paste text
I am guessing that the text is usually in lines.
So, for example, instead of having
Dictionaries (hashes) in angle brackets: << >>
It could just be an XML tag:
<dictionary>
<value key="pdfname">...</value>
...
</dictionary>
Similarly for the other data structures.
Also, it would be useful to extract the various files.
So, for example, JPEGs could be extracted as plain JPEG files.
I don't know much about the representation, just that there are
the layers of filters. So, would extracting the file basically
be undoing all but the last filter?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5-ecc0.1.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFFWR2BxzVgPqtIcfsRAu3KAJ9elZzha6hvEMfZ+4XCsPwapZihNwCbBe3a
APK4NHEVkAVEjNlnzQGNeT0=
=Ude+
-----END PGP SIGNATURE-----
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
iText-questions mailing list
iText-questions (at) lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions
- Follow-Ups:
- Extract PDF
- From: Leonard Rosenthol <leonardr (at) pdfsages.com>
- References:
- Extract PDF
- From: "Mark Storer" <MStorer (at) cardiff.com>