[sword-devel] TEI Dictionaries was Re: Bible Software Review

Chris Little chrislit at crosswire.org
Thu May 1 14:51:59 MST 2008



DM Smith wrote:
> The TEI filter does not do much styling. May I suggest that you add 
> styling for the elements I've used for the NASB lexicons.

Actually... I was looking at your encoding just now and comparing it to 
my own TEI. :)

We should probably figure out what we want to do regarding TEI now, 
rather than later, since there isn't yet any content in the wild, aside 
from the Perseus content in beta. So feel free to say, "no that's a bad 
idea" to any of my suggestions below.

First, I would recommend we support P4 for backward compatibility with 
Perseus content and a few other sites publishing P4 content. But I would 
recommend that we only produce P5 content ourselves when converting 
non-TEI content. P4 support by TEI will only extend until 2012, and P5 
itself has a number of improvements, in my opinion, not the least of 
which is that it is more in line with modern XML usage and schema 
validation.

The P5 dictionary reference is here: 
http://www.tei-c.org/release/doc/tei-p5-doc/en/html/DI.html

> Here are the ones that I use with the styling that I am using in 
> BibleDesktop:
> orth bold
> pron italic
> etym haven't decided
> def italic
> usg plain
> 
> Also TEI used rend and not type for the hi element.
> 
> Some of these may already be handled.

I don't know the condition of our TEI filters in Sword, but I suspect 
whatever is there remains rudimentary and I take from Karl's email 
regarding the NAS dictionaries that I only wrote RTF filters.

> For Strong's references, I am using <ref target="id">key text</ref>.
> For BDB references, I am using <xref doc="bdb" to="id (target)">text</xref>
> (I don't expect either of these to be handled except to have their text 
> shown, which is what SWORD does!)

I think the dictionary cross-reference element (P4 & P5) is just <xr>, 
whereas <ref> is a more generic element found in the core module.

The example from the manual for <xr> embeds a <ref> with an <xr> thus:

<entry>
  <form>
   <orth>lavage</orth>
  </form>
  <etym>[Fr. < <mentioned>laver</mentioned>;
    L. <mentioned>lavare</mentioned>,
    to wash; <xr>see <ref>lather</ref>
   </xr>].
  </etym>
</entry>

The only thing I would add to that is some typing. For "see" references, 
I've been using <xr type="see">. So I think (putting it all together, we 
would want: <xr type="see">see <ref target="lather">later</ref></xr>.

We can interpret TEI in basically the same way as we do OSIS, by 
expecting a "word:ref" type of target if the reference is outside the 
current document/module and expecting just "ref" if it is within the 
current document/module. So, in P5, I would encode your examples as:

<xr type="xref"><ref target="key text">key text</ref></xr> and
<xr type="xref"><ref target="BDB:id">text</ref></xr>

> Also, can you see a way that we can combine the Greek and Hebrew 
> lexicons into 1?

I thought we had decided to encode Greek and Hebrew keys as 
/[GH][0-9]{4}/, that is, with a G or H prefix and 4 digits with leading 
zeros.

> I also noticed the schema for TEI dictionaries on the wiki has osisID 
> and osisRef. I didn't study the schema, but at a glance I didn't see 
> where or how these are used. Would you shed some light?

I'm open to suggestions for the schema as well. I put osisID and osisRef 
within the att.global.linking attribute group, so they are present on 
all (or at least almost all) elements. I've been thinking about whether 
this is appropriate and think it may be better to only put osisRef on 
<ref> or within a more limited attribute group, such as att.pointing.

--Chris



More information about the sword-devel mailing list