[sword-devel] XML DOM
dmsmith555 at yahoo.com
Fri Mar 9 12:28:30 MST 2007
DJ Ortley wrote:
> Looking through the source code, it seems to me (which are key words
> that indicate this is only an opinion, one which may not be worth
> much) that using a library such as Xerces or some sort of XML DOM like
> library would be of benefit.
> I was wondering if any thought had been given to that previously?
This is the approach that JSword uses. We actually use JAXP which is an
interface layer over a plug-in implementation of XML. So in some cases
we use Crimson and in others we use Xerces. It all depends upon what is
bundled with the user's JDK. SAX is a better model for most processing
than DOM, as most processing does not need an object representation of
That said, I think that there are significant advantages and also
disadvantages to using it.
To me the most significant advantages are that it is a full
implementation of an XML parser and we don't need to maintain it.
It is a full implementation of the XML parser. Sword doesn't need a full
implementation of the parser. Our documents have a well defined
vocabulary (i.e. the DTD specs) and we only need a parser sufficient to
parse that vocabulary.
Parsing serves two purposes: search/indexing, i.e. stripping out only
the text from the "verse" and display, i.e. converting the module raw
source into some kind of presentation source. The former benefits from
being very fast. Sword's "stripping" routines are built for speed. It
would be a huge performance loss to use a true XML parser. For the most
part, parsing for converting to a display representation can be slower
because it will likely be fast enough.
The other thing is that the Sword library has taken a least common
denominator approach to its requirements. It is targeted to small
handhelds (phones, pdas and the like) and to computers of all ages,
colors and creeds. Introducing a fairly large library would need to be
optional (like curl, icu4c and lucene) and it would still leave the need
for the current custom parsing.
Earlier I submitted a patch to make the parser more accurate and it was
rejected as a performance hit and too big/risky of a change. And these
were the reasons that I was given.
More information about the sword-devel