[Logo] smithproject.org
  [Search] Search   [Recent Topics] Recent Topics   [Members]  Member Listing   [Groups] Back to home page 
CFDocument, baby!  XML
Forum Index -> Advanced Techniques
Author Message
Calphool



Joined: 18/02/2008 08:49:05
Messages: 29
Offline

Hey all you happy Smith users...

I've got something I think folks will enjoy.

I've roughed out an implementation of CFDocument. It uses the Mozilla Gecko engine (MozSwing) and iText to render the PDF (can you say perfect fidelity?!)

I haven't yet got all the extra attributes and subtags working, and the renderer is only set up to size the PDF to the browser DOM tree content, which isn't ideal (creates weird sized "paper"), but it's fully functional, looks better than Adobe's CFDocument, and it's faster than you might expect. (I'm using an object pool of stand-by Gecko renderers to keep it speedy.)

I'm still learning about the smith engine internals, but I think I'm beginning to understand how tags are implemented. It would be nice if someone could create a little explanation document for the template.cfg file. It's not exactly intuitive -- you basically break stuff, look at the exceptions, and keep fixing the template until you get something working. Still, it hasn't been too bad. Kind of fun actually.

So, I'm going to try to get at least the media handling (paper size) stuff fixed before I upload the code for everybody. I need to make a small list of to-do's as well... lots of little things to clean up before it's production quality.

Cheers!

--jr
orcus



Joined: 22/01/2007 16:10:52
Messages: 136
Offline

Hi Calphool, that's great news!
Calphool



Joined: 18/02/2008 08:49:05
Messages: 29
Offline

Got it coded and at least ready for beta-testing... I'll write up some step-by-step instructions for the build... more to follow....
Calphool



Joined: 18/02/2008 08:49:05
Messages: 29
Offline

Ok, I've got all the source code packaged up with instructions for how to install all the dependencies (necessary jars for mozswing and itext mostly). Where do I put the code? I can't seem to do attachments here in the forums (blows up with a 500 error message).

Can someone direct me to where I can send this code?

Thanks!
orcus



Joined: 22/01/2007 16:10:52
Messages: 136
Offline

Hi Calphool,

you can send the code to Smith SourceForge patch tracker. It also supports attachments... (look for "Submit New" link just below the menu).

Best regards,
orcus
Calphool



Joined: 18/02/2008 08:49:05
Messages: 29
Offline

I accidentally attached it to the "Bug Tracker" one. Hopefully that's ok. I didn't include all the .jars that make up MozSwing, or the tools.jar. Hopefully you can work around that (download MozSwing yourself and put the jars where necessary).
orcus



Joined: 22/01/2007 16:10:52
Messages: 136
Offline

"Bug Tracker" is not a problem...

Thanks for your effort!
orcus
MatthewReinbold



Joined: 06/11/2007 00:43:59
Messages: 23
Offline

This is extremely exciting and one of the last 'hurdles' that I needed in Smith for a major project. How soon can we see this in the builds? How can we get this up and running?
orcus



Joined: 22/01/2007 16:10:52
Messages: 136
Offline

Hi Matthew,

I hope it will be included in the next build of Smith. It's hard to say exactly when that's going to be. Smith team tries to make a new build monthly, but if busy with other projects, it make take a bit longer.

Note that Calphool has provided very detailed instructions on how to merge his implementation with Smith, so you can do it yourself if you have short deadlines for this.

Go to the SourceForge tracker, scroll down and, in "Attached Files" section, look for "Download" link on the right-hand side of the page. You'll find "Smith_CFDocument_Install_Instructions.rtf" in the zip file with detailed instructions. As Calphool stated above, you'll also need to download mozswing, but it's all explained in the details.

Best regards,
orcus
Calphool



Joined: 18/02/2008 08:49:05
Messages: 29
Offline

In the interest of "full disclosure", I think I should share one of the limitations with the approach that was taken with CFDOCUMENT.

Essentially, there is only one open source "pure java" HTML rendering engine out there right now, the Cobra browser engine. While Cobra is a nice project, and they make gradual improvements to the engine from time to time, it's just not ready for prime time. It has tons of rendering issues right now (it can't even render its own project home page right...which is kind of ironic). If Cobra were working well, I would rather have integrated with Cobra than Mozilla, and I'll explain why here in a minute.

Instead, I used Mozilla/Gecko's rendering engine. Mozilla/Gecko is built around XPCOM, which is a lot like Microsoft COM/COM+. In fact, you could say that the Mozilla team basically created a cross platform version of COM, and then built something called XUL, which is sort of a XPCOM framework for creating user interfaces, and then built Gecko on top of XUL.

Why is that uncool you say?
Well, while you can embed Gecko by virtue of using XPCOM, as far as I can tell there's no way to feed Gecko a java.awt.Graphics2D class. So, since you can't do that, you can't get scalable vectorized fonts and such. (You can't give Gecko a rendering context to draw in -- it controls its entire rendering surface internally). So, even though iText supports acquiring a java.awt.Graphics2D compatible rendering context for PDF, we can't really use it the way we'd like to (feed it to Gecko as its rendering surface).

So, if we have that limitation, why is Calphool talking about pure fidelity and how good the PDFs look?

Well, the approach that was taken was to basically feed the HTML to Gecko, let Gecko do its thing in a hidden window, and then snatch the contents of Gecko's rendering surface into an image in memory. Then, using simple iText image manipulation we put the image into the PDF per the margintop,marginleft, scaling, parameters, etc. Finally, we walk Gecko's DOM and acquire the locations of all links in the source HTML, and we use iText's Annotation functionality to duplicate those links into the PDF. It looks great, but it's not ideal for two reasons: 1) As you zoom into the PDF, the text becomes blocky rather than smooth, and 2) The text cannot be selected for copy/paste operations (because as far as the PDF engine is concerned, it's not text -- it's an image).

So, what we have is an almost perfectly rendered representation of the HTML (Gecko's rendering engine is superb -- it looks way better than CF8, which uses the proprietary [and somewhat nasty] IceBrowser renderer). However, we do have a bit of a limitation as mentioned above.

Some day, as Cobra matures, we will probably want to revisit this implementation strategy (maybe even give CFDOCUMENT a "RenderingEngine" parameter to let you decide which way you prefer -- pure fidelity vs. vectorized fonts with poorer HTML processing fidelity).

Cheers everyone! I'm off to dig into CFEXECUTE.... shouldn't be that hard...
Calphool



Joined: 18/02/2008 08:49:05
Messages: 29
Offline

FYI everyone, CFExecute patch is out on sourceforge now.
 
Forum Index -> Advanced Techniques
Go to:   
Powered by JForum 2.1.6 © JForum Team