[sword-devel] TEI Dictionaries was Re: Bible Software Review

DM Smith dmsmith555 at yahoo.com
Thu May 1 15:49:06 MST 2008

On May 1, 2008, at 5:51 PM, Chris Little wrote:

> DM Smith wrote:
>> The TEI filter does not do much styling. May I suggest that you add
>> styling for the elements I've used for the NASB lexicons.
> Actually... I was looking at your encoding just now and comparing it  
> to
> my own TEI. :)
> We should probably figure out what we want to do regarding TEI now,
> rather than later, since there isn't yet any content in the wild,  
> aside
> from the Perseus content in beta. So feel free to say, "no that's a  
> bad
> idea" to any of my suggestions below.
> First, I would recommend we support P4 for backward compatibility with
> Perseus content and a few other sites publishing P4 content. But I  
> would
> recommend that we only produce P5 content ourselves when converting
> non-TEI content. P4 support by TEI will only extend until 2012, and P5
> itself has a number of improvements, in my opinion, not the least of
> which is that it is more in line with modern XML usage and schema
> validation.
> The P5 dictionary reference is here:
> http://www.tei-c.org/release/doc/tei-p5-doc/en/html/DI.html

Either is fine with me. When I started a year or so ago, P5 looked  
like it was being firmed up. I'll make changes to the NASB lexicons to  
match P5.

>> Here are the ones that I use with the styling that I am using in
>> BibleDesktop:
>> orth bold
>> pron italic
>> etym haven't decided
>> def italic
>> usg plain
>> Also TEI used rend and not type for the hi element.
>> Some of these may already be handled.
> I don't know the condition of our TEI filters in Sword, but I suspect
> whatever is there remains rudimentary and I take from Karl's email
> regarding the NAS dictionaries that I only wrote RTF filters.

There are two filters: teiplain and teirtf.

Troy had suggested that we use the OSIS HTML filter, merely augmenting  
it as needed.

The plain filter handles the following:
p, entryFree, sense, div, and etym

The rtf filter handles the following:
Same as plain but also:
hi with attribute of rend allowing ital, bold, sup (I'm not sure if  
these are valid values).
pos, gen, case, gram, number, mood, tr (all in italic)
note with attributes type and swordFootnote.

For the <etym> tag, the content is surrounded by [ ... ]
I don't know if it is a good idea for us to generate this kind of  
presentation. At least not when in entryFree. If you look at your  
example below it would result in redundant [].

>> For Strong's references, I am using <ref target="id">key text</ref>.
>> For BDB references, I am using <xref doc="bdb" to="id  
>> (target)">text</xref>
>> (I don't expect either of these to be handled except to have their  
>> text
>> shown, which is what SWORD does!)
> I think the dictionary cross-reference element (P4 & P5) is just <xr>,
> whereas <ref> is a more generic element found in the core module.
> The example from the manual for <xr> embeds a <ref> with an <xr> thus:
> <entry>
>  <form>
>   <orth>lavage</orth>
>  </form>
>  <etym>[Fr. < <mentioned>laver</mentioned>;
>    L. <mentioned>lavare</mentioned>,
>    to wash; <xr>see <ref>lather</ref>
>   </xr>].
>  </etym>
> </entry>
> The only thing I would add to that is some typing. For "see"  
> references,
> I've been using <xr type="see">. So I think (putting it all  
> together, we
> would want: <xr type="see">see <ref target="lather">later</ref></xr>.

I saw the <xr> element in P4, but the NAS Lex source does not make it  
clear where the text referring to the reference begins or whether it  
is present. But the actual reference is clearly marked.

So, I was just using the <ref> element, but not within <xr>.

> We can interpret TEI in basically the same way as we do OSIS, by
> expecting a "word:ref" type of target if the reference is outside the
> current document/module and expecting just "ref" if it is within the
> current document/module. So, in P5, I would encode your examples as:
> <xr type="xref"><ref target="key text">key text</ref></xr> and
> <xr type="xref"><ref target="BDB:id">text</ref></xr>

Dictionaries, such as Naves, has Bible references.
Would that be <xr type="xref"><ref target="Bible:key">key text</ref></ 
xr> ?

I thought that we decided that the form was:

target="key" for a Bible reference, using an osisRef.
target="this:key" for a key in the same work. (I'm not sure if it was  
"this" or "same" or "self")
target="modname:key" for external works.

Since most of the references in a dictionary will be internal, I'd go  
with target="key" for internal and target="Bible:Matt.1.1" for biblical.

I don't really care what we decide or whether we change our mind, as  
we have not implemented it yet.

And all of it is a simple change to my perl scripts.

>> Also, can you see a way that we can combine the Greek and Hebrew
>> lexicons into 1?
> I thought we had decided to encode Greek and Hebrew keys as
> /[GH][0-9]{4}/, that is, with a G or H prefix and 4 digits with  
> leading
> zeros.

I do seem to recall that now. That would put the Greek before the  
Hebrew (because G < H). Is that OK?

By the way, the current SWORD engine will need to be modified to do  
lookups with the prefixed key.

Also, for the <ref> element would G4 be ok for the target? This is  
something that JSword already handles and IIRC, Sword does too.

>> I also noticed the schema for TEI dictionaries on the wiki has osisID
>> and osisRef. I didn't study the schema, but at a glance I didn't see
>> where or how these are used. Would you shed some light?
> I'm open to suggestions for the schema as well. I put osisID and  
> osisRef
> within the att.global.linking attribute group, so they are present on
> all (or at least almost all) elements. I've been thinking about  
> whether
> this is appropriate and think it may be better to only put osisRef on
> <ref> or within a more limited attribute group, such as att.pointing.

Since the OSIS folks said they plan to adopt TEI dictionary as their  
own, I think it would make sense to work in that direction.

I think that OSIS already has a robust <reference> element. If we go  
with your suggestion for <ref> above, then it is a simple migration as  
they are orthogonal.

Though, for SWORD, I don't think there's much point to OSIS supporting  
Dictionaries if TEI is fully supported.

More information about the sword-devel mailing list