[sword-devel] Ideas for using ThML for general books

Harry Plantinga sword-devel@crosswire.org
Mon, 23 Apr 2001 07:35:34 -0400

> Does anyone have any suggestions for how this format ought to be done?
> Should we design a format that subdivides a text into arbitrarily small
> divisions?  The Early Church Fathers* series could be broken down into set,
> volume, section, & chapter for example.  Schaff's History of the Christian
> Church into volume, chapter, & section.  Etc.  So the divisions meet the
> needs to the text rather than making the text conform to a set division
> structure like we have with Bibles, where the text must break down to book,
> chapter, & verse.  From there, we just create index files similar to the
> bss, css, & vss files but for as many ranks of division as the book
> requires, and set the value of the smallest division equal to the start
> position & length of the section of text within the data file
> (identical the
> way all the other module formats are done).
> Ideas/comments?

ThML has the <div1> <div2> etc tags which are intended for this purpose.
Every division or section is a <divn>.

Here's a suggestion for how you might use ThML for "general books" at a
basic level:

- Get any of the metadata that you might need from the ThML.head section

- Use the stylesheets (internal and external) from the ThML.head section

- Split the ThML text at the <divn> ... </divn> tags.  Those are your text
subdivisions. Use the title= attribute for the name of the section.

- Build a table of contents out of the <divn> tags you have found

- Convert each of the sections you found into straight HTML and display
with an HTML widget. For the most part, the sections are already straight
HTML, except that they also have <note>, <scripRef>, <pb>, and maybe
a few other ThML tags.  So you will want to find the <note> tags and
handle them appropriately, find the <scripRef> tags and link the scripture
reference, find the <pb> tags and put in some kind of notation showing that
this is a new page of the print edition, etc.

- If you want to get fancier, build indexes out of the <scripRef>, <index>,
<name>, etc. tags.


Note that the majority of the ThML texts on the CCEL site are not validated.
I'm working in that direction, but if you require validation, you will have
a much smaller set of texts you can work with -- or extra time validating the
texts you want.

You might want to build your own, looser "parser" that acts as described
above and feeds the resulting HTML sections to an HTML widget.


-Harry Plantinga