| By Date: | <-- --> |
| By Thread: | <-- --> |
I am developing a program to convert HTML source to PDF.
I searched mailing list and I found that HTMLWorker and HTMLParser class.
HTMLParser may not support CJK string(I tested HTMLParser but all CJK strings became blanks.) and I decided to use HTMLWorker.
I made the code as followings; (I used iTextSharp 3.1.5)
===============================================================================
Private Sub Test_HTMLWorker()
Dim fs As New FileStream("test.html", FileMode.Open, FileAccess.Read, FileShare.ReadWrite
)
Dim sr As New StreamReader(fs, System.Text.Encoding.Default)
Dim sReader As New StringReader(sr.ReadToEnd)
sr.Close()
fs.Close()
Dim document As Document = New Document(A4, 20, 20, 20, 20)
PdfWriter.GetInstance(document, New FileStream("test_output.pdf", FileMode.Create))
FontFactory.Register("c:\\windows\\fonts\\gulim.ttc")
Dim st As StyleSheet = New StyleSheet
st.LoadTagStyle("body", "face", "Gulim")
st.LoadTagStyle("body", "encoding", "Identity-H")
st.LoadTagStyle("body", "leading", "12,0")
document.Open()
Dim worker As html.simpleparser.HTMLWorker = New html.simpleparser.HTMLWorker(document)
Dim p As ArrayList = worker.ParseToList(sReader, st)
For k As Integer = 0 To p.Count - 1
document.Add(p.Item(k))
document.Add(New Paragraph(vbCrLf))
Next
document.Close()
sReader.Close()
End Sub
=================================================================================
This code works fine at the HTML sources that are composed of only texts.
But, it does not work at the HTML sources with img tags; in detail, the layout of generated PDF files are different from original HTML sources.
Also, if I does not use width and height attributes at img tag, that images do not inserted at the generated PDF file.
I think that this problem results from HTMLWorker may not consider the space of image - especially the img tag within <p> tag.
Then, I tried to insert the space that was equal to the height of image but the position of image was not updated (I succeeded in finding the chunk objects with image).
I attached sample HTML file and generated PDF files for your test.
If you could take a few minutes to answer my questions, I would really appreciate it.
Best regards,
S. H. Park
Test HTML sample
Test HTML sample
Test HTML sample
|
1 |
2 |
3 |
4 |
|
asdf |
sdf |
sdf |
sdf |
|
dfdf |
dfdf |
dfdf |
|
|
|
|
|
|
Attachment:
test_output.pdf
Description: Adobe PDF document
------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________ iText-questions mailing list iText-questions (at) lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions