By Date: <-- -->
By Thread: <-- -->

Extract PDF



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I realize that there are plenty of strange things that
it could potentially do, but I bet in most common circumstances
it is not an issue.  I _DON'T_ want it in a format like what
the PDF was made from (like turning it into LaTeX for example).
I just want the raw data contained in the PDF.
Since you can usually select and copy and paste text
I am guessing that the text is usually in lines.

So, for example, instead of having
Dictionaries (hashes) in angle brackets: << >>
It could just be an XML tag:
<dictionary>
  <value key="pdfname">...</value>
...
</dictionary>

Similarly for the other data structures.
Also, it would be useful to extract the various files.
So, for example, JPEGs could be extracted as plain JPEG files.
I don't know much about the representation, just that there are
the layers of filters.  So, would extracting the file basically
be undoing all but the last filter?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5-ecc0.1.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFWR2BxzVgPqtIcfsRAu3KAJ9elZzha6hvEMfZ+4XCsPwapZihNwCbBe3a
APK4NHEVkAVEjNlnzQGNeT0=
=Ude+
-----END PGP SIGNATURE-----

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
iText-questions mailing list
iText-questions (at) lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions