[sword-devel] Bible Chapter Titles?

Greg Hellings greg.hellings at gmail.com
Mon Jun 16 22:26:43 MST 2008


On Mon, Jun 16, 2008 at 9:42 PM, DM Smith <dmsmith555 at yahoo.com> wrote:
>
> On Jun 16, 2008, at 9:23 PM, Greg Hellings wrote:
>
>> I'm looking through the mod2osis.cpp file, trying to bring its output
>> closer into the form of the module inputs (basing it off of the result
>> of running the tool as compared to the KJV input files).  So far I
>> seem to have the following problems - I can't seem to find where (or
>> if) the following information is maintained and retrieved from the
>> Sword API:
>
> I don't think mod2osis has been kept current with the changes to osis
> nor with osis2mod.
>
> mod2osis, if I understand, will also create osis output for plaintext,
> gbf and ThML modules. I don't think these filters are robust.

Right now, all of the problems appear to be on the mod2osis side,
since the module that I'm working from was an OSIS source.  However,
I've only been hammering away at the first few discrepancies.  So far
the most common discrepancies that I have encountered are inverted
order of the morph= and lemma= attributes when they occur on a <w ...>
tag as well as switching up the order of such attributes as type="x-p"
marker="¶" (sometimes with a subType="x-added" also) on the
<milestone...> element.

The order of attributes is something beyond the scope of the mod2osis
and needs to be updated/changed in the filters themselves.  Right now
I'm running a basic python script on the output of mod2osis to
manually reorder those, since I don't believe that the XML will really
be affected by that (and also because I have combed through the OSIS
filters and cannot figure out how to make that order change - anyone
know how to do that?  Currently the order is lemma-morph and it needs
to be morph-lemma as well as the x-p things need to be type-marker
instead of marker-type).

I consider that to be trivial changes which don't affect the actual
functioning of the tool, versus the fact that it was producing invalid
osisID attributes for chapters a books (a problem which was relatively
simple to work out).

>
> Since you are talking about being able to round trip a module created
> with osis2mod, I'll mention what it does.
>
>>
>>
>> 1) Where is the equivalent information from the OSIS block below
>> maintained?  Is it maintained?
>
> osis2mod takes an xml file which is presumed to be valid OSIS and
> based upon that assumption, looks for testament, book, chapter and
> verse content.
>
> It ignores everything in the header element.
>
>
>> There is brief mention of Strongs data
>> and such in the .conf file, but is that enough to go off of to
>> recreate this information in general?
>
> There is not quite enough info in the conf to recreate the header.
> Specifically, there are several variants of the work prefix for
> Strong's numbers and for morphology. Without digging into the module,
> it is not possible to know what the work ids are. It is possible for
> us to have a generic header that encodes all the possibilities.
>
> Also, the conf does not encode the scope of the work, which is a
> typical part of the header. To get it exact, one would have to dig
> into the module.

These are things which an XSLT could remedy.  The XSLT could produce a
.conf from the OSIS document that does include those things and has
blank lines on the other absolutely necessary .conf entries.  A module
maintainer/creator could run the XSLT to auto-create the .conf file
and then manually fill in the additional fields which are not normally
part of the OSIS file (or which were missing from the OSIS file).  If
we do that, then we can preserve this information for mod2osis to
recreate.

>
>
>> Perhaps this information should
>> be part of a standard .xsl file which we include in tools avialable
>> for module creators to run.  Have it output a basic .conf file with
>> the information from the OSIS document and preserve information like
>> this in it somewhere?
>>
>> <   <work osisWork="strong">
>> <     <refSystem>Dict.Strongs</refSystem>
>> <   </work>
>> <   <work osisWork="robinson">
>> <     <refSystem>Dict.Robinsons</refSystem>
>> <   </work>
>> <   <work osisWork="strongMorph">
>> <     <refSystem>Dict.strongMorph</refSystem>
>> <   </work>
>>
>>
>> 2. Chapter titles?
>> How do you test for the presence of a chapter title?
>
> There are testament, book and chapter titles. These have special
> notations using 0 as the index.
>
> For example John 1:0 is the chapter title for chapter 1 and John 0:0
> is the book title.
>
> In osis2mod, the content of these are determined by the placement of
> the text. To simplify: If it stands after the opening of a book but
> before the opening of a chapter, then it is a book title. If it stands
> after the opening of a chapter, but before the beginning of a verse,
> it is a chapter title.

This is the least cumbersome way I can figure out to try and access
this - however, it seems to be having some issues (which I added to
mod2osis, starting right after the sprintf call on line 165 or so,
that produces the <div type="book" ...> tag):
[code]
*char* name = new char(100);
strcpy(name, tmpKey.getOSISBookName());
name = strcat(name, "0:0");
inModule->setKey(new VerseKey(name));
SWBuf title = inModule->getRawEntry();
inModule->setKey(tmpKey);
if(strlen(title.c_str()) > 0) sprintf(buf, "\t<title
type=\"main\">%s</title>\n", title.c_str());
[/code]
That is my attempt to grab the book title and print it out.  However,
what I'm getting out is the title tag surrounding the OSIS output of
chapter 1, verse 1 of the book, instead of the title.  Then, the
intrigue mounts as, just a few lines later, the program segfaults on
this line:
[code]
if ((vkey->Chapter() != lastChap) || newBook) {
[/code]

Does anyone else have a less cumbersome way of doing this or, more
importantly, know how to work that so that it does not segfault at the
next block of code?

>
> We can also have titles that are between verses. These are pre-pended
> to the verse content and marked as pre-verse.

It sounds like those are irrecoverable as titles, then, with that type
of setup, or did I misunderstand you?

>
>
>>  In the following
>> block, the chapter title itself is easy enough to recreate but at the
>> expense of portability to someone else who wants to give
>> chapterTitle="The E Creation Tale" or some such thing, but I can't
>> find access to the information maintained in the <title...> tag.  Is
>> this information maintained, and if so, how is it accessed?
>
> The only thing that is maintained is the actual content of the verse,
> chapters, books, ..., but not of those elements themselves.

In the case of the KJV module that you've created, the content of the
chapterTitle= attribute on the chapters is identical to the content of
the <title...> element that immediately follows it, at least near the
beginning of Genesis.  It appears that, if we aren't going to be
utilizing the chapterTitle= attribute, then we can afford to lose
track of it in the *2mod->mod2osis trip.

>
>>  It seems
>> like it would be useful to have, as many Bible editors insert
>> information like this into the the flow of the text.
>>
>> < <title type="main">THE FIRST BOOK OF MOSES CALLED GENESIS</title>
>> < <chapter osisID="Gen.1" chapterTitle="CHAPTER 1.">
>>
>>
>> 3. Milestoneable verse boundaries?
>> It doesn't seem that mod2osis has any support for milestone verse
>> tags, is this correct?
>
> I'm not sure I understand. The module contains no notion of verse
> tags, milestoned or otherwise. In reconstructing the module, it is
> important to know as one outputs the content of a verse whether it is
> well-formed, in and of itself, or not. And since OSIS requires that if
> the milestoned form is used in one location, it is used consistently
> everywhere, the only safe output from mod2osis for a verse tag is
> milestoned.
>
>>  How would one programaticly detect this, as
>> well as other milestone elements?  Somewhere, though, it's producing
>> output like this:
>> <milestone type="x-extra-p"/>
>> Is that coming from the markup filter?  That's the only explanation I
>> can find for it.  However, I'm not sure that there's an example of
>> milestone-support in the KJV document which can be used for testing
>> that support.
>
> osis2mod in order to construct well-formed verses takes the <p>
> element (which is the only container element in OSIS that cannot be
> milestoned) and replaces it with <lb type="x-paragraph-begin"/> and
> <lb type="x-paragraph-end"/> (I am doing this from memory, so the
> attribute value might be a bit different.)


Currently the KJV has the <verse...> *some text* </verse> syntax,
which is maintained by mod2osis.  However, it does use <milestone.../>
for some things (currently the most prevalent appears to be
type="x-p", to the point that I haven't encountered any others, though
I haven't gotten very far into the text yet).  It seems safe, at least
for now, that, if we're going to only accept <verse>...</verse> syntax
and not allow the <p>...</p> syntax, it's not a problem.  However, I
thought that the purpose was to force people to use <p>...</p>, which
can often break the <verse>...</verse> syntax, due to editorial
choices.  Why have we gone the exact opposite way?

--Greg

>
> Hope that helps.
>>
>>
>> I'll pass along other questions as I see them.
>>
>
> Looking forward to them.
>
> You might want to look at JSword's
> org.crosswire.jsword.examples.BibleToOSIS that I used to re-create the
> KJV OSIS from the module when I was working on the current version of
> the KJV module. Currently, it just wraps the raw text, with minor
> modifications to product the module. However,  with a simple change
> this can be tied to very robust filters for GBF, PlainText, ThML and
> TEI.
>
> In Him,
>        DM
>
>
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
>



More information about the sword-devel mailing list