[sword-devel] XSLT vs. C++

Tue Nov 30 12:08:44 MST 2010

Having finally returned from a hectic 2 weeks of conferences, and lots
to do before leaving for Christmas, I'm not sure I'm up for a heated,
passionate debate about technologies right now, but by all means, please
commence the public discussion.

Let me start by saying that everyone (I believe) agrees that we would
like to have an HTML output from the engine which is more generic and
would allow CSS to be applied if a frontend would like to do this.
Currently HTMLHREF output from the engine is used by the widest number
of frontends (to my knowledge) and would benefit everyone involved by
becoming much more generic. e.g.,

<title> -> <h1>
rather than
<title> -> <b><br />

<transChange type="added"> -> <span class="tcAdded">
rather than
<transChange type="added"> -> <i>

etc.

I believe this will solve a number of issues and possibly get the BT and
MacSword teams onboard to using the same HTML output filters as the
other projects involve (or at least subclassing them and using the
majority of their functionality).

Now, as to the other issue of using XSLT internally in the engine to
process OSIS -> HTML

I will throw a few melons into the air for target practice, and let the
shooting commence.

_____________________________
*Multiple Language*

XSLT is a programming language in the same sense that C++ is a
programming language.

The SWORD Project C++ engine is written in C++.  It is not a Python
engine; it is not a Perl engine; it is not a Java engine; it is C++.

One might say, "Well, you can use XSLT from C++.  Doesn't JSword do this
from Java?"  Well, yes, of course you can, and DM can comment, if he
feels the desire to recommend his decision to encorporate an XSLT engine
into the JSword logic flow.  But simply because one CAN doesn't mean one
SHOULD.  We COULD encorporate a Perl text processing engine in our C++
code, or an Awk processing engine...  that doesn't mean we SHOULD.  I'm
sure some would say we SHOULD.  And obviously DM has thought he SHOULD
encorporate XSLT processing for JSword, so I'm not intending to say it
is a BAD decision, just that it is not a decision I would make; in the
same way as our projects each chose C++ vs. Java to implement our objective.

_______________________
*XSLT better than C++*

One might say, "well, XSLT is better suited to process XML than C++."
That's a loaded and unquantified statement.

Certainly the C++ language specification doesn't include facilities to
easily process XML, but that doesn't mean a plethora of C++ libraries
don't exists for assisting in this task.

The SWORD engine includes classes like XMLTag and SWBasicFilter which
implement a SAX processing model.

The current filters do not all use SWBasicFilter, nor XMLTag.  They've
been written over 15 years and many before these classes existed.  Some
are ugly and need to be rewritten for readability, certainly.  But not
necessarily in a different programming language.

________________________
*COMPLEXITY*

The task of enumerating all types of OSIS <title> tags, and deciding
what to do with each, and how to classify all <title> tags from all
possible OSIS documents into our enumeration is still going to be a
complex task using XSLT.  <title> is a complex example, but certainly
not the most complex.

It is a tall task to generalize all elements of all documents from all
publishers into one conceptual model with one chosen output for a
frontend-- whether that be for an audience on the Desktop, web-based, or
a handheld.

The complex processing required by the engine will require long, complex
XSLT-- which likely will encorporate callbacks to C++.  It will not be
more simple-- only mixed language.
_______________________
*Semantic vs. Display*

Some will say (and have), "well, let everything be display oriented and
let the publisher decide".  Fine, then you lose 2 things: the ability to
display differently per user preference, per display device; and you
also give up the promise to actually do any interesting research on the
text.  When you lose semantic markup, then you lose all interesting
information about WHAT is being marked up.

_______________________
*More than a Rending Engine*

The SWORD C++ Engine is more than simply a text rendering engine-- it is
a Biblical text research engine.

If I'd like to know the morphology of word 3 in 2Thes 2.13 of the WHNU
Greek text, the entire program to do such is:

SWMgr library;
SWModule *whnu = library.getModule("WHNU");
whnu->setKey("2th.2.13");
whnu->RenderText();

cout << "The morphology of word three is: " <<
whnu->getEntryAttributes()["Word"]["003"]["Morph"] << endl;

That reads nice (at least in my opinion).  I don't need to know about
XML, XSLT, care what markup the WHNU module uses, I don't even have to
know how to make a SWORD filter.  The current filters do all the work of
breaking out these attributes and making them available in a nice and
interesting map.

______________________

And finally, if bullets aren't flying already, I'll stir the heat up with...

XSLT sucks.  A good C++ programmer can do anything in C++ better than
any XSLT programmer.

:)

*duck*
Have fun.

Troy

PS.  In summary, I understand the current filters are sometimes overly
complex and need cleanup, standardization, etc.  It comes down to the
fact that they mostly work, and other things which don't get priority,
so they don't get much attention.  But honestly, I think one might be
oversimplifying the problem at hand without realizing it, if one simply
thinks switching to XSLT will make things easier.