[sword-devel] XML DOM

Greg Hellings greg.hellings at gmail.com
Fri Mar 9 14:04:02 MST 2007


When I asked about this question in the past, specifically related to
the utilities as you are, is when I finally received my insight into
how the Sword library holds its files.  Due to he fact that most XML
parsers obfuscate the actual number of bytes that have been read, and
since the Sword library generates an index file for the module that
relies on the number of bytes into the data file a certain occurrence
is located, using a DOM or SAX parser, I was told, is not viable.

On your question about an api reference... I think that this might be
based off of the previous version of the source code, and it would be
nice if we could get an updated version of it for 1.5.9, but here is
the most complete reference that I've been able to locate so far:
http://www.crosswire.org/~mgruner/sword-apidoc/html/

Cheers and good luck!

--Greg

On 3/9/07, DJ Ortley <djortley at gmail.com> wrote:
> In regards to the lowest common denominator comment you made, that's one of
> the things I thought would probably come up.  Which is a thing I can
> understand.  I didn't know that there was a lot of focus on speed, but it
> makes sense.  I've been impressed with how fast Sword seems to work at
> times.
>
> I don't know much about how Sword stores its modules as there is no
> documentation I can find, and I've yet to actually ask whats going on.
> Right now I'm still working through the API trying to understand what's
> happening with the goal of finding a suitable way to implement access to the
> various Deuterocanons.  Looking through the archives, I come to the
> conclusion that implementing such changes, as long as they are done the
> right way, are mostly acceptable to the community.
>
> The thing that prompted me to ask about DOM support was when I was looking
> through the source in the utilities folder.  It seemed that a lot of work
> could be saved if some library were used.
>
> Maybe things could be broken into two parts.  The core API and the
> utilities, with the utilities having greater allowance for use of third
> party libraries that might not necessarily be suitable for a hand held...
> One isn't going to be using a handheld to make modules anyways (well,
> hopefully not at least.)
>
> Just a thought.  Maybe not a good one, but there it is.
>
> By the way, aside from poking around through the code, is there some sort of
> documentation or outline (aside from the API primer) of whats going on
> anywhere?  If not, could someone give me a quick and dirty sketch of some
> sort?
>
> Thanks.
>
> -DJ
>
>
>
> On 3/9/07, DM Smith <dmsmith555 at yahoo.com> wrote:
> > DJ Ortley wrote:
> > > Looking through the source code, it seems to me (which are key words
> > > that indicate this is only an opinion, one which may not be worth
> > > much) that using a library such as Xerces or some sort of XML DOM like
> > > library would be of benefit.
> > >
> > > I was wondering if any thought had been given to that previously?
> >
> > This is the approach that JSword uses. We actually use JAXP which is an
> > interface layer over a plug-in implementation of XML. So in some cases
> > we use Crimson and in others we use Xerces. It all depends upon what is
> > bundled with the user's JDK. SAX is a better model for most processing
> > than DOM, as most processing does not need an object representation of
> >
> > That said, I think that there are significant advantages and also
> > disadvantages to using it.
> > To me the most significant advantages are that it is a full
> > implementation of an XML parser and we don't need to maintain it.
> >
> > Disadvantages:
> > It is a full implementation of the XML parser. Sword doesn't need a full
> > implementation of the parser. Our documents have a well defined
> > vocabulary (i.e. the DTD specs) and we only need a parser sufficient to
> > parse that vocabulary.
> >
> > Parsing serves two purposes: search/indexing, i.e. stripping out only
> > the text from the "verse" and display, i.e. converting the module raw
> > source into some kind of presentation source. The former benefits from
> > being very fast. Sword's "stripping" routines are built for speed. It
> > would be a huge performance loss to use a true XML parser. For the most
> > part, parsing for converting to a display representation can be slower
> > because it will likely be fast enough.
> >
> > The other thing is that the Sword library has taken a least common
> > denominator approach to its requirements. It is targeted to small
> > handhelds (phones, pdas and the like) and to computers of all ages,
> > colors and creeds. Introducing a fairly large library would need to be
> > optional (like curl, icu4c and lucene) and it would still leave the need
> > for the current custom parsing.
> >
> > Earlier I submitted a patch to make the parser more accurate and it was
> > rejected as a performance hit and too big/risky of a change. And these
> > were the reasons that I was given.
> >
> > _______________________________________________
> > sword-devel mailing list: sword-devel at crosswire.org
> > http://www.crosswire.org/mailman/listinfo/sword-devel
> > Instructions to unsubscribe/change your settings at above page
> >
>
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
>



More information about the sword-devel mailing list