[sword-devel] XSLT vs. C++

Wed Dec 1 07:13:53 MST 2010

Speaking as a BPBible developer, I would tend to prefer C++ filters to
XSLT.  Here are some reasons why:
1. It works now (well, OK, it doesn't always work as well as one might like,
but it does work).

2. It is (fairly) readily able to be customised by application developers
using the magic of inheritance.  BPBible at least takes advantage of this,
and 0.4.7 contained about 800 lines of Python in our filter code.  For 0.5
the OSIS filter has doubled in size.  By contrast, if we were to maintain an
app-specific XSLT file, we would probably need to duplicate the whole file
and then make changes to it, and any changes made to the base XSLT file
would have to be manually merged in.  Bye-bye to the idea of having only one
lot of library source to maintain.

3. It allows developers to use sources that are outside the document being
transformed.  This has had some issues for us (from memory, the filter code
isn't re-entrant), but we use this functionality to do things like expanding
a list of cross-references in the user's locale, looking up the headwords
for Strong's Numbers, and looking up the text in the current version for a
passage in a harmony.  By contrast, unless we have some good way to call
into C++/Python from XSLT we will not be able to use sources outside the
current document unless we do some complex post-processing.  If we do have
such a way it could just increase complexity.

4. It allows us to share common functionality between the ThML filters and
the OSIS filters (which we do).  I think this proposal would have us still
using C++ ThML filters while moving the OSIS filters to XSLT, which would
make the results further apart.

5. I would be concerned if performance dropped at all, as I suspect it would
(especially if calls into C++ were involved as well).

6. Currently our rendering works on a verse-by-verse basis.  I'm not sure
what it would look like if we were trying to do something like a chapter at
once.  Do we run through the chapter in one go?  What kind of well formed
OSIS document can we get from a single verse or collection of verses to pass
into an XSLT?  Is there much cost to fire up an XSLT engine just for the one
verse we have in our search preview?  What would you do if you wanted to
have a discontinuous range of verses or to show versions in parallel
verse-by-verse?  We also surround each verse and a rendered section with
other extra stuff which varies depending on the context.  I'm not sure where
this would fit in the XSLT (if at all).

In short, as a BPBible developer I much prefer implementation in C++ because
it allows us to do things we want to do much more easily than with XSLT
(though if Troy or anyone else wants to improve the present implementation
they are welcome to).  I cannot speak for the pros and cons from a module
creator point of view.

Jon

On Wed, Dec 1, 2010 at 6:08 AM, Troy A. Griffitts <scribe at crosswire.org>wrote:

> Having finally returned from a hectic 2 weeks of conferences, and lots
> to do before leaving for Christmas, I'm not sure I'm up for a heated,
> passionate debate about technologies right now, but by all means, please
> commence the public discussion.
>
> Let me start by saying that everyone (I believe) agrees that we would
> like to have an HTML output from the engine which is more generic and
> would allow CSS to be applied if a frontend would like to do this.
> Currently HTMLHREF output from the engine is used by the widest number
> of frontends (to my knowledge) and would benefit everyone involved by
> becoming much more generic. e.g.,
>
> <title> -> <h1>
> rather than
> <title> -> <b><br />
>
> <transChange type="added"> -> <span class="tcAdded">
> rather than
> <transChange type="added"> -> <i>
>
> etc.
>
> I believe this will solve a number of issues and possibly get the BT and
> MacSword teams onboard to using the same HTML output filters as the
> other projects involve (or at least subclassing them and using the
> majority of their functionality).
>
>
> Now, as to the other issue of using XSLT internally in the engine to
> process OSIS -> HTML
>
> I will throw a few melons into the air for target practice, and let the
> shooting commence.
>
> _____________________________
> *Multiple Language*
>
> XSLT is a programming language in the same sense that C++ is a
> programming language.
>
> The SWORD Project C++ engine is written in C++.  It is not a Python
> engine; it is not a Perl engine; it is not a Java engine; it is C++.
>
> One might say, "Well, you can use XSLT from C++.  Doesn't JSword do this
> from Java?"  Well, yes, of course you can, and DM can comment, if he
> feels the desire to recommend his decision to encorporate an XSLT engine
> into the JSword logic flow.  But simply because one CAN doesn't mean one
> SHOULD.  We COULD encorporate a Perl text processing engine in our C++
> code, or an Awk processing engine...  that doesn't mean we SHOULD.  I'm
> sure some would say we SHOULD.  And obviously DM has thought he SHOULD
> encorporate XSLT processing for JSword, so I'm not intending to say it
> is a BAD decision, just that it is not a decision I would make; in the
> same way as our projects each chose C++ vs. Java to implement our
> objective.
>
> _______________________
> *XSLT better than C++*
>
> One might say, "well, XSLT is better suited to process XML than C++."
> That's a loaded and unquantified statement.
>
> Certainly the C++ language specification doesn't include facilities to
> easily process XML, but that doesn't mean a plethora of C++ libraries
> don't exists for assisting in this task.
>
> The SWORD engine includes classes like XMLTag and SWBasicFilter which
> implement a SAX processing model.
>
> The current filters do not all use SWBasicFilter, nor XMLTag.  They've
> been written over 15 years and many before these classes existed.  Some
> are ugly and need to be rewritten for readability, certainly.  But not
> necessarily in a different programming language.
>
> ________________________
> *COMPLEXITY*
>
> The task of enumerating all types of OSIS <title> tags, and deciding
> what to do with each, and how to classify all <title> tags from all
> possible OSIS documents into our enumeration is still going to be a
> complex task using XSLT.  <title> is a complex example, but certainly
> not the most complex.
>
> It is a tall task to generalize all elements of all documents from all
> publishers into one conceptual model with one chosen output for a
> frontend-- whether that be for an audience on the Desktop, web-based, or
> a handheld.
>
> The complex processing required by the engine will require long, complex
> XSLT-- which likely will encorporate callbacks to C++.  It will not be
> more simple-- only mixed language.
> _______________________
> *Semantic vs. Display*
>
> Some will say (and have), "well, let everything be display oriented and
> let the publisher decide".  Fine, then you lose 2 things: the ability to
> display differently per user preference, per display device; and you
> also give up the promise to actually do any interesting research on the
> text.  When you lose semantic markup, then you lose all interesting
> information about WHAT is being marked up.
>
> _______________________
> *More than a Rending Engine*
>
> The SWORD C++ Engine is more than simply a text rendering engine-- it is
> a Biblical text research engine.
>
> If I'd like to know the morphology of word 3 in 2Thes 2.13 of the WHNU
> Greek text, the entire program to do such is:
>
> SWMgr library;
> SWModule *whnu = library.getModule("WHNU");
> whnu->setKey("2th.2.13");
> whnu->RenderText();
>
> cout << "The morphology of word three is: " <<
> whnu->getEntryAttributes()["Word"]["003"]["Morph"] << endl;
>
>
> That reads nice (at least in my opinion).  I don't need to know about
> XML, XSLT, care what markup the WHNU module uses, I don't even have to
> know how to make a SWORD filter.  The current filters do all the work of
> breaking out these attributes and making them available in a nice and
> interesting map.
>
> ______________________
>
>
> And finally, if bullets aren't flying already, I'll stir the heat up
> with...
>
> XSLT sucks.  A good C++ programmer can do anything in C++ better than
> any XSLT programmer.
>
>
> :)
>
> *duck*
> Have fun.
>
> Troy
>
> PS.  In summary, I understand the current filters are sometimes overly
> complex and need cleanup, standardization, etc.  It comes down to the
> fact that they mostly work, and other things which don't get priority,
> so they don't get much attention.  But honestly, I think one might be
> oversimplifying the problem at hand without realizing it, if one simply
> thinks switching to XSLT will make things easier.
>
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20101202/c3996a40/attachment-0001.html>