[sword-devel] TEI markup support

DM Smith dmsmith555 at yahoo.com
Mon May 12 20:35:42 MST 2008

On May 12, 2008, at 9:29 PM, Troy A. Griffitts wrote:

> My one concern about saying that we support TEI for dictionary  
> encoding
> is the confusion it might bring to our support of OSIS.
> From what I remember, the current OSIS plan is to include some set of
> TEI markup to support dictionary markup.  I wonder if things like  
> <ref>
> would be included, since OSIS already includes <reference  
> osisRef=...>.

My 2 cents:

 From what I can see there are a few differences between TEI and OSIS:
1) TEI has <ref target="xxxx"> while OSIS has <reference osisRef="xxxx">
Chris has suggested that we use OSIS markup for xxxx since TEI does  
not define the encoding of the target. With this, it would be simple  
and trivial to transform the one to the other.

Related to this is that <reference> in OSIS is not free to be place  
anywhere, it is only allowed in <notes>. TEI uses the <xr> element for  
similar containment. Such a thing would be appropriate to add to OSIS.

2) Both TEI and OSIS have the <hi> element. TEI uses the attribute  
rend and OSIS uses type to indicate the nature of highlighting. Again,  
a simple transformation would be sufficient.

3) TEI has rich content markup for dictionary elements, such as  
pronunciation, etymology, orthography, definition, senses.... From  
what I was told, OSIS plans to include a set of these, though what  
they are is not defined yet.

The container for a dictionary element in TEI is <entry> for  
structured entries, <entryFree> for any child elements allowed, and  
<superEntry> to collect entries into a larger one. In OSIS there is a  
type attribute value of "entry" for <div>.

My experience with dictionaries is limited. Chris' can correct me if I  
am wrong. I am of the impression that <entry>, because it is so highly  
structured, will find little practical use in any dictionaries we  
create. When I tried to encode Lockman's NAS Lexicons my first concern  
was to preserve Lockman's content as provided. At first I tried  
<entry> but I had to re-arrange some of what Lockman had and I  
eliminated everything that was not structural. I was able to create a  
style sheet that would produce what Lockman originally had.

The problem with this approach was that such a style sheet would be  
appropriate for Lockman's NAS Lexicons but probably not for other  
dictionaries. My guess is that we don't want to have stylesheets on a   
per module basis.

So I re-coded it using <entryFree> and even with no styling (i.e.  
using the PlainFilter) the text is exactly as Lockman had it.

4) TEI for dictionaries does not have milestoned container elements.  
It simply is not needed. OSIS allows the <div> entry to be milestoned  
with sID and eID. For dictionaries, it should be discouraged.

Some other random thoughts/opinions:
If you consider dictionaries such as are currently in TEI, I think it  
does not make sense to encode them into OSIS. For secular works such  
as Webster's Dictionary, I think it makes sense to encode it in TEI  
and make it available. And in the header of the xml, put our signature  
with a note that it can be freely used under such and so a license  
that requires attribution to be retained.

TEI is a well defined standard with a robust definition. My suggestion  
would be for OSIS to adopt Chris' TEI schema as a part of OSIS with  
minor changes.

I would also suggest that it be a separate stand-alone schema, as  
there is very little overlap between the elements in a dictionary and  
the elements in a Bible.

In His Service,

More information about the sword-devel mailing list