[sword-devel] OSIS questions

DM Smith dmsmith at crosswire.org
Sat Jan 26 09:05:21 MST 2013

Others should chime in too. This just a partial answer.

Basic overview:
OSIS should be written to the OSIS specification. We recommend a document centric representation where Book, Chapter, Section, Paragraph, Line Group and Lines are dominant and verse elements are milestoned.
This is transformed by osis2mod into Book, Chapter, Verse representation.
Front-ends then use a SWORD renderer to transform module content into presentational output. This is driven by values in a module's conf.

So questions regarding OSIS fall into:
How should it be done in OSIS.
How does osis2mod handle it.
And how do SWORD renderers handle such and so content.

On Jan 26, 2013, at 10:13 AM, <araj at critos.co.uk> wrote:

> I'm trying to convert a number of USFM documents to OSIS using my own
> software, and then to Sword using osis2mod.  However, I'm relatively new to
> OSIS, and am struggling somewhat.  Don't know if anyone can comment on any
> of the following issues?

Peter and Chris maintain a converter for USFM to OSIS, written in Perl. You might find that helpful. Off hand I don't know where that is. You can get more information by searching this list or from them.

Also, Kahunapule Michael Johnson on this list has converters that he uses. He has many, many texts in USFM.

> *	<osisText osisIDWork={NAME} ...>: "Normalized name of the Bible
> version (Usually 3 letters for language, 3 for translation)".  Does this
> have any significance to anything (ie does it matter that I get it "right")?
> If so, I assume the first three characters are the ISO language code?  What
> should you do where you also have a variant code?  Does the "translation"
> portion of the name need to follow any particular convention?

The documentation that we've provided relates to an OSIS document used as a source for a SWORD module. Most of our answers will also be SWORD centric.

To SWORD, the value does not matter. Osis2mod strips out everything until the first meaningful <div> element. At some time in the future, osis2mod will examine the header info to provide a best guess for key conf elements.

> *	According to http://crosswire.org/wiki/OSIS_Bibles, the minimal
> document header consists of just <work osisWork={Name}/>.  This does not
> seem at first sight to square with Appendix L of the OSIS user manual
> (dealing with conformance requirements), which appears to require a scope
> definition, to which the document must conform.  Is the minimal header shown
> above in fact adequate?  And what do you lose by not giving a fuller header
> (for example, by not giving "scope")?

In a SWORD context, you lose nothing. At least until we improve osis2mod to suggest a conf. But it is a great place to document information that will be helpful to you or others later.

> *	An example text I have picked up includes the refSystem tag in a
> number of places.  I'd prefer to avoid using this if I can, since it is not
> always immediately apparent what versification schemes have been used in the
> texts I have available to me.  Is it a problem if the tag is not supplied?
> (If it must be supplied, then I presume it has to come from some predefined
> list of valid schemes?  Where do you get the details of these schemes?)

Reference systems are rather bothersome. Your goal is to determine the best fit of those that SWORD provides. IIRC, Greg has put together a script that'll help figure out which reference system (which osis2mod refers to as a "versification" system) is most encompassing.

Osis2mod is good in that it retains all verse material. But when you pick the wrong ref system, osis2mod will warn you that the verse is not in the versification and that it is being appended to the prior verse.

The details are rather cryptic (held in arrays in SWORD header files.) Basically most of us just try each one to see which has the least warnings. And from there try to understand those warnings.

Chris has defined these. So he might be able to clarify.

> *	http://crosswire.org/wiki/OSIS_Bibles includes <div
> type="bookGroup">.  I presume this enables you to group together, say OT or
> NT books?  What are the implications of having it (or of not having it)?

It is not needed for a SWORD module. All that osis2mod requires are the <div type="book">

Having it has no impact.

> *	OSIS appears to support a tableOfContents marker, which I believe
> corresponds to USFM toc/toc1/toc2/toc3.  How is it used?  Suppose I have the
> text for Matthew, and somewhere else I want a table of contents.  Does the
> marker go at the start of Matthew itself, to mark the place to which the
> table of contents should point?  Or does it go into the table of contents,
> to indicate that you want it to include a reference to Matthew?  And either
> way what does the full tableOfContents tag look like?  (I tried the former
> of the two options - putting <div type='tableofContents'>Matthew</div> into
> Matthew itself, but the only effect seemed to be that the word Matthew was
> output as part of the text.)

I don't know the answer to this specifically (how to encode it properly in OSIS), but can give insight into how it might be handled by SWORD.

The SWORD rendering of a module does not handle Table of Contents, but to output it as inline, unstructured text. (I think I have this right.)

Basically, notes and references are out of line content to SWORD. All else is in line. IIRC, if a SWORD render sees a tag it doesn't understand, it merely processes the contents as if the tag was not there.

What some of us do is have a fully specified OSIS file and then prune (using xslt) that which SWORD doesn't handle well, to produce input to osis2mod.

> *	In at least one of the vernaculars I'm dealing with, the translator
> has included in the USFM at the start of each book both a "h" and an "mt"
> tag (in that order, in case it's of any interest).  According to the OSIS
> user manual, both of these should give rise to title tags (of type "short"
> and "main" respectively, although I gather Sword ignores this).
> Unsurprisingly, I end up with two titles appearing at the start of the book.
> Clearly with a certain amount of effort I can address this by filtering the
> data while I'm generating the OSIS, but does OSIS itself (or Sword) have any
> convention as to what to do in these circumstances?

Osis2mod uses the type="main" on a title to indicate that a title between the start of a chapter and the first verse is a chapter title (and stuffed in a verse 0 slot) and not a verse title (stuffed in a verse 1 slot).

SWORD renders don't care.

> *	Some USFM tags appear to need to be converted into right-justified
> paragraphs.  Does OSIS support right justification?

I don't know if OSIS supports it (would need to look), but SWORD renderers do not. Same with JSword renderers.

> *	Occasionally we work with right-to-left languages in which the verse
> number needs to come at the end of the verse rather than the start, and in
> some cases also the verse number needs to be decorated in some manner.  Does
> OSIS cater for this at all?

Actually, it merely has the visual appearance to LtoR language readers that it is at the end of the verse. Actually, it is at the start, but appears at the right. This is handled by front-ends.

In the SWORD module's conf, it needs the Direction= attribute set to pick up the orientation. JSword doesn't use the value but uses the Language code.

In His Service,

> Thanks in advance,
> "Jamie" Jamieson
> <winmail.dat>_______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page

More information about the sword-devel mailing list