[sword-devel] Bible Chapter Titles?

Greg Hellings greg.hellings at gmail.com
Wed Jul 9 14:22:44 MST 2008

On Tue, Jun 17, 2008 at 6:46 AM, DM Smith <dmsmith555 at yahoo.com> wrote:
> On Jun 17, 2008, at 1:26 AM, Greg Hellings wrote:
>> On Mon, Jun 16, 2008 at 9:42 PM, DM Smith <dmsmith555 at yahoo.com>
>> wrote:
>>> On Jun 16, 2008, at 9:23 PM, Greg Hellings wrote:
>>>> I'm looking through the mod2osis.cpp file, trying to bring its
>>>> output
>>>> closer into the form of the module inputs (basing it off of the
>>>> result
>>>> of running the tool as compared to the KJV input files).  So far I
>>>> seem to have the following problems - I can't seem to find where (or
>>>> if) the following information is maintained and retrieved from the
>>>> Sword API:
>>> I don't think mod2osis has been kept current with the changes to osis
>>> nor with osis2mod.
>>> mod2osis, if I understand, will also create osis output for
>>> plaintext,
>>> gbf and ThML modules. I don't think these filters are robust.
>> Right now, all of the problems appear to be on the mod2osis side,
>> since the module that I'm working from was an OSIS source.  However,
>> I've only been hammering away at the first few discrepancies.  So far
>> the most common discrepancies that I have encountered are inverted
>> order of the morph= and lemma= attributes when they occur on a <w ....>
>> tag as well as switching up the order of such attributes as type="x-p"
>> marker="¶" (sometimes with a subType="x-added" also) on the
>> <milestone...> element.
>> The order of attributes is something beyond the scope of the mod2osis
>> and needs to be updated/changed in the filters themselves.
> Order of attributes is unimportant in xml. Every xml processor is free
> to re-arrange attributes as they see fit. It is also permissible for
> an xml processor to remove non-required attributes that match the
> default or add those attributes with their default if the attribute
> was missing.
>>  Right now
>> I'm running a basic python script on the output of mod2osis to
>> manually reorder those, since I don't believe that the XML will really
>> be affected by that (and also because I have combed through the OSIS
>> filters and cannot figure out how to make that order change - anyone
>> know how to do that?  Currently the order is lemma-morph and it needs
>> to be morph-lemma as well as the x-p things need to be type-marker
>> instead of marker-type).
> What requires the order? That program needs to be re-written to not
> require it.

Currently, diff requires that ordering, since that's what I'm using to
find divergences between the input in the kjv.xml file and the output
of mod2osis.  Hence, I'm doing the reordering with a Python script at
the moment.  This is not a bug in either osis2mod, OSIS or mod2osis -
it's just a nuisance since diff doesn't understand anything other than
"This isn't identical to that."

>> I consider that to be trivial changes which don't affect the actual
>> functioning of the tool, versus the fact that it was producing invalid
>> osisID attributes for chapters a books (a problem which was relatively
>> simple to work out).
>>> Since you are talking about being able to round trip a module created
>>> with osis2mod, I'll mention what it does.
>>>> 1) Where is the equivalent information from the OSIS block below
>>>> maintained?  Is it maintained?
>>> osis2mod takes an xml file which is presumed to be valid OSIS and
>>> based upon that assumption, looks for testament, book, chapter and
>>> verse content.
>>> It ignores everything in the header element.
>>>> There is brief mention of Strongs data
>>>> and such in the .conf file, but is that enough to go off of to
>>>> recreate this information in general?
>>> There is not quite enough info in the conf to recreate the header.
>>> Specifically, there are several variants of the work prefix for
>>> Strong's numbers and for morphology. Without digging into the module,
>>> it is not possible to know what the work ids are. It is possible for
>>> us to have a generic header that encodes all the possibilities.
>>> Also, the conf does not encode the scope of the work, which is a
>>> typical part of the header. To get it exact, one would have to dig
>>> into the module.
>> These are things which an XSLT could remedy.  The XSLT could produce a
>> .conf from the OSIS document that does include those things and has
>> blank lines on the other absolutely necessary .conf entries.  A module
>> maintainer/creator could run the XSLT to auto-create the .conf file
>> and then manually fill in the additional fields which are not normally
>> part of the OSIS file (or which were missing from the OSIS file).  If
>> we do that, then we can preserve this information for mod2osis to
>> recreate.
>>>> Perhaps this information should
>>>> be part of a standard .xsl file which we include in tools avialable
>>>> for module creators to run.  Have it output a basic .conf file with
>>>> the information from the OSIS document and preserve information like
>>>> this in it somewhere?
>>>> <   <work osisWork="strong">
>>>> <     <refSystem>Dict.Strongs</refSystem>
>>>> <   </work>
>>>> <   <work osisWork="robinson">
>>>> <     <refSystem>Dict.Robinsons</refSystem>
>>>> <   </work>
>>>> <   <work osisWork="strongMorph">
>>>> <     <refSystem>Dict.strongMorph</refSystem>
>>>> <   </work>
>>>> 2. Chapter titles?
>>>> How do you test for the presence of a chapter title?
>>> There are testament, book and chapter titles. These have special
>>> notations using 0 as the index.
>>> For example John 1:0 is the chapter title for chapter 1 and John 0:0
>>> is the book title.
>>> In osis2mod, the content of these are determined by the placement of
>>> the text. To simplify: If it stands after the opening of a book but
>>> before the opening of a chapter, then it is a book title. If it
>>> stands
>>> after the opening of a chapter, but before the beginning of a verse,
>>> it is a chapter title.
>> This is the least cumbersome way I can figure out to try and access
>> this - however, it seems to be having some issues (which I added to
>> mod2osis, starting right after the sprintf call on line 165 or so,
>> that produces the <div type="book" ...> tag):
>> [code]
>> *char* name = new char(100);
>> strcpy(name, tmpKey.getOSISBookName());
>> name = strcat(name, "0:0");
>> inModule->setKey(new VerseKey(name));
>> SWBuf title = inModule->getRawEntry();
>> inModule->setKey(tmpKey);
>> if(strlen(title.c_str()) > 0) sprintf(buf, "\t<title
>> type=\"main\">%s</title>\n", title.c_str());
>> [/code]
>> That is my attempt to grab the book title and print it out.  However,
>> what I'm getting out is the title tag surrounding the OSIS output of
>> chapter 1, verse 1 of the book, instead of the title.  Then, the
>> intrigue mounts as, just a few lines later, the program segfaults on
>> this line:
>> [code]
>> if ((vkey->Chapter() != lastChap) || newBook) {
>> [/code]
>> Does anyone else have a less cumbersome way of doing this or, more
>> importantly, know how to work that so that it does not segfault at the
>> next block of code?
> Ahh, this is not Java, so I cannot "readily" help :)

True.  maybe someone can give me a hand with this?  The segmentation
fault has been overcome, and now the problem is the following... This
			char newref[16];
			sprintf(newref, "%s 0:0", vkey->getOSISBookName());
			*vkey = newref;
			SWBuf title = inModule->getRawEntry();
results in the "title" object having the contents of chapter 1 and
verse 1.  The same happens if I assign newref the value of "%s %i:0"
and give it the book name and chapter value.  I'm not sure, but it
seems to me this shouldn't be happening such.  Am I doing something
wrong here?  The information is displayed happily by Bibletime, so I
know it's hiding in there, somewhere.

>>> We can also have titles that are between verses. These are pre-pended
>>> to the verse content and marked as pre-verse.
>> It sounds like those are irrecoverable as titles, then, with that type
>> of setup, or did I misunderstand you?
>>>> In the following
>>>> block, the chapter title itself is easy enough to recreate but at
>>>> the
>>>> expense of portability to someone else who wants to give
>>>> chapterTitle="The E Creation Tale" or some such thing, but I can't
>>>> find access to the information maintained in the <title...> tag.  Is
>>>> this information maintained, and if so, how is it accessed?
>>> The only thing that is maintained is the actual content of the verse,
>>> chapters, books, ..., but not of those elements themselves.
>> In the case of the KJV module that you've created, the content of the
>> chapterTitle= attribute on the chapters is identical to the content of
>> the <title...> element that immediately follows it, at least near the
>> beginning of Genesis.
> This is a bit of a tug-of-war between the OSIS spec and what we
> actually do in osis2mod. The OSIS spec gives 2 ways to encode a title.
> The KJV OSIS uses both, but osis2mod ignores the attribute.
>>  It appears that, if we aren't going to be
>> utilizing the chapterTitle= attribute, then we can afford to lose
>> track of it in the *2mod->mod2osis trip.
> True. I think your goal should be the following transformation:
> osis module -> osis xml -> osis module -> osis xml
> such that the osis modules are identical and the xml files are
> identical.

That's what I'm working toward.  Within the semantics of XML, they are
very close to identical now with just the few changes I've already
made.  If I could persuade the titles to come out, then I'd be very
nearly there.

>>>> It seems
>>>> like it would be useful to have, as many Bible editors insert
>>>> information like this into the the flow of the text.
>>>> < <title type="main">THE FIRST BOOK OF MOSES CALLED GENESIS</title>
>>>> < <chapter osisID="Gen.1" chapterTitle="CHAPTER 1.">
>>>> 3. Milestoneable verse boundaries?
>>>> It doesn't seem that mod2osis has any support for milestone verse
>>>> tags, is this correct?
>>> I'm not sure I understand. The module contains no notion of verse
>>> tags, milestoned or otherwise. In reconstructing the module, it is
>>> important to know as one outputs the content of a verse whether it is
>>> well-formed, in and of itself, or not. And since OSIS requires that
>>> if
>>> the milestoned form is used in one location, it is used consistently
>>> everywhere, the only safe output from mod2osis for a verse tag is
>>> milestoned.
>>>> How would one programaticly detect this, as
>>>> well as other milestone elements?  Somewhere, though, it's producing
>>>> output like this:
>>>> <milestone type="x-extra-p"/>
>>>> Is that coming from the markup filter?  That's the only
>>>> explanation I
>>>> can find for it.  However, I'm not sure that there's an example of
>>>> milestone-support in the KJV document which can be used for testing
>>>> that support.
>>> osis2mod in order to construct well-formed verses takes the <p>
>>> element (which is the only container element in OSIS that cannot be
>>> milestoned) and replaces it with <lb type="x-paragraph-begin"/> and
>>> <lb type="x-paragraph-end"/> (I am doing this from memory, so the
>>> attribute value might be a bit different.)
>> Currently the KJV has the <verse...> *some text* </verse> syntax,
>> which is maintained by mod2osis.  However, it does use <milestone.../>
>> for some things (currently the most prevalent appears to be
>> type="x-p", to the point that I haven't encountered any others, though
>> I haven't gotten very far into the text yet).  It seems safe, at least
>> for now, that, if we're going to only accept <verse>...</verse> syntax
>> and not allow the <p>...</p> syntax, it's not a problem.  However, I
>> thought that the purpose was to force people to use <p>...</p>, which
>> can often break the <verse>...</verse> syntax, due to editorial
>> choices.  Why have we gone the exact opposite way?
> This would make for a good separate thread, but let me see if I can
> summarize.
> Verse numbers were added late in Christian history (about 1000 years
> ago), even chapters and paragraphs are not original. In the original
> Greek manuscripts, even lower case letters, spaces, diacritics and
> punctuation were absent.
> Some argue that the proper OSIS structure is that of a document upon
> which verses are imposed. This would be Book, Chapter, Section, and
> Paragraph.
> Others would argue that this is secondary to the ability of software
> to process the document in a manner that users want to use the Bible.
> Most of our applications require a verse to be well-formed and
> meaningful in isolation (such as a search result list; parallel view).
> This is especially true of applications that render HTML.
> Most users still want to see verse numbers and think of the Bible as
> being structured by verses.
> The osis2mod process is tasked with taking all valid input and
> creating a module that works for all SWORD engine processes. It will
> transform it as needed into structures which may not be particularly
> good OSIS, but will be valid OSIS. As a process, osis2mod is not able
> to handle "all" valid input in this fashion. Our coding is reactive,
> it is sufficient for what we have encountered so far.
> I fall into the camp that believes that the verse is the key
> structure. Since I wrote the KJV OSIS file, and improved osis2mod, you
> will see that I did not structure the file by paragraph.
> Also, the KJV is not structured by paragraph. In the KJV, some books
> have paragraph markers. The rest don't. Also in the KJV, there are no
> quote marks, but there are quotes.

The decision of the use of Chapter/Verse or Section/Paragraph ought to
lie, in my opinion, completely with the translator and module creator.
 My own preference is for Section/Paragraph and that is how I do my
own translation work, modeling it largely after the Jerusalem Bible's

However, users will, almost always, look forward to using the
Chapter/Verse, thus the software needs that flexibility.  My concern
with updating mod2osis is to be able to recreate markup which is
either identical (if possible) or semantically equivalent (if
identical is not possible) at least as far as Sword's understand of
OSIS is concerned.

It sounds that identical may not be possible if the original import
format had milestoned verses in favor of paragraph containers, but
perhaps that could be information which was maintained by the modules
once the move to the VerseTreeKey is complete.  Then, the structure of
the Tree could be used to reflect the wishes of the module
creator/translator and a simple switch to indicate which was used on
import could be added somewhere (.conf file?).

For the meantime, since I'm basing the "correctness" of the mod2osis
output on its similarity to the KJV and probably most other OSIS
Bibles are similar (?) that we are dealing with, verse-as-container is
proper for the job.

>>> Hope that helps.
>>>> I'll pass along other questions as I see them.
>>> Looking forward to them.
>>> You might want to look at JSword's
>>> org.crosswire.jsword.examples.BibleToOSIS that I used to re-create
>>> the
>>> KJV OSIS from the module when I was working on the current version of
>>> the KJV module. Currently, it just wraps the raw text, with minor
>>> modifications to product the module. However,  with a simple change
>>> this can be tied to very robust filters for GBF, PlainText, ThML and
>>> TEI.
>>> In Him,
>>>       DM
>>> _______________________________________________
>>> sword-devel mailing list: sword-devel at crosswire.org
>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>> Instructions to unsubscribe/change your settings at above page
>> _______________________________________________
>> sword-devel mailing list: sword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page

More information about the sword-devel mailing list