[sword-devel] XSLT vs. C++
greg.hellings at gmail.com
Wed Dec 1 08:19:23 MST 2010
On Wed, Dec 1, 2010 at 8:13 AM, Jonathan Morgan <jonmmorgan at gmail.com> wrote:
> Speaking as a BPBible developer, I would tend to prefer C++ filters to
> XSLT. Here are some reasons why:
> 1. It works now (well, OK, it doesn't always work as well as one might like,
> but it does work).
It works for our historical collection of modules, but the current
implementations of some of the filters are rigid and very difficult to
update or modify. But yes, it more or less works now.
> 2. It is (fairly) readily able to be customised by application developers
> using the magic of inheritance. BPBible at least takes advantage of this,
> and 0.4.7 contained about 800 lines of Python in our filter code. For 0.5
> the OSIS filter has doubled in size. By contrast, if we were to maintain an
> app-specific XSLT file, we would probably need to duplicate the whole file
> and then make changes to it, and any changes made to the base XSLT file
> would have to be manually merged in. Bye-bye to the idea of having only one
> lot of library source to maintain.
XSLT is easily extensible. SAX is easily extensible.
In XSLT I can import another XSL file and provide overrides - no need
to merge in changes from someone else and maintain identical copies,
etc. When I'm creating my current set of modules I have 2 XSL files
that go from the proprietary SGML to HTML and ThML. Obviously there
is a lot of overlap between those two. The ThML stylesheet simply
imports the HTML stylesheet and overrides a few of the templates to
produce <scripRef> and other ThML-specific elements. That way, if
there is a bug in how I translate a table display, for instance, I can
change it in the HTML stylesheet and I get the fix for free in my ThML
without touching anything.
SAX is simply an API in any desired language. If I want to override
the behavior of a single element, I just override the processing
method and check something like
if(is element to override)
All the discussion for XSL above applies to SAX processing as well.
> 3. It allows developers to use sources that are outside the document being
> transformed. This has had some issues for us (from memory, the filter code
> isn't re-entrant), but we use this functionality to do things like expanding
> a list of cross-references in the user's locale, looking up the headwords
> for Strong's Numbers, and looking up the text in the current version for a
> passage in a harmony. By contrast, unless we have some good way to call
> into C++/Python from XSLT we will not be able to use sources outside the
> current document unless we do some complex post-processing. If we do have
> such a way it could just increase complexity.
A SAX model, of course, is able to handle the full range of what your
programming language of choice has, so you're all set there.
XSL has many ways of bringing in data from the outside. Arguements
and variables can be passed in by the caller (man xsltproc and you'll
see the argument --param PARAMNAME PARAMVALUE. Programmatic invocation
of XSL can use the same parameter mechanism), values can be pulled out
of static XML files which the XSL can include, and there is a rather
straightforward way of creating custom functions in your invoking
language. When I am using XSL to parse my SGML files, I have a number
of custom functions written in Python which I invoke from XSL to do
any type of processing raw XSL can't handle (example: transforming
inline RTF styles into inline CSS styles).
Increasing complexity? That really depends on the methods used and
whether they are appropriate.
> 4. It allows us to share common functionality between the ThML filters and
> the OSIS filters (which we do). I think this proposal would have us still
> using C++ ThML filters while moving the OSIS filters to XSLT, which would
> make the results further apart.
The same can be done with XSL simply be factoring the shared
functionality into a single stylesheet which the ThML and
OSIS-specific stylesheets include.
SAX... well, I think you get the idea there.
> 5. I would be concerned if performance dropped at all, as I suspect it would
> (especially if calls into C++ were involved as well).
Calls into the parent language don't really slow down XSL unless they
invoke a method which is excruciatingly slow. Of course, no one
really has implementations of both technologies currently in place for
us to compare SWORD's performance at present. apt-cache showpkg
libxml2 shows me around 1000 libraries and applications in Ubuntu
which currently depend directly on libxml2 including things as diverse
as Pidgin, PHP, Postgres, VMWare, rpm2html, strigi, nautilus, Gnome,
gstreamer, abiword, xscreensaver and so on. Performance of that
library is apparently good enough for some people. There are even two
sets of bindings in Python (python-libxml2 and python-lxml) both of
which I have used with great success.
To give an idea of how quickly it processes, I am able to load, parse,
render and write out XML files up to a few megs each faster than my
harddrive can keep up. And that is using the Python bindings and XSL
- both of which are slower than directly invoking a SAX processor
through C or C++. Slower than SWORD's filters? Maybe - I don't know
how to directly test that at present. Slow enough to matter,
especially for the size of a chapter or even just a single book?
> 6. Currently our rendering works on a verse-by-verse basis. I'm not sure
> what it would look like if we were trying to do something like a chapter at
> once. Do we run through the chapter in one go? What kind of well formed
> OSIS document can we get from a single verse or collection of verses to pass
> into an XSLT? Is there much cost to fire up an XSLT engine just for the one
> verse we have in our search preview? What would you do if you wanted to
> have a discontinuous range of verses or to show versions in parallel
> verse-by-verse? We also surround each verse and a rendered section with
> other extra stuff which varies depending on the context. I'm not sure where
> this would fit in the XSLT (if at all).
Creating a wellf-formed and valid OSIS document for a single verse,
chapter, book or the entire work is no different. Create the header,
add the verses, write the footer. JSword already does it. And then
the XSL knows how to handle each element it encounters - a SAX
processor would need to be so programmed if that was selected instead
- with no real care for whether it is seeing one verse or 50,000 of
them. Discontinuous is also no problem. As well as the document is
well formed and valid, the XML processing can handle it.
More information about the sword-devel