[sword-devel] TEI dictionaries and front-ends: a suggestion for formatting entries

DM Smith dmsmith555 at yahoo.com
Mon May 19 04:35:22 MST 2008

On May 19, 2008, at 12:41 AM, Daniel Owens wrote:

> The HTML didn't come through very well. Here is a screenshot of the  
> Lexique-formatted entry:
> <entry from Lexique.jpg>
> Daniel
> Daniel Owens wrote:
>> I have been working on some TEI dictionaries, and (this is obvious,  
>> I know)
>> vanilla TEI produces very boring entries in the front-ends. I point  
>> this out as
>> a preface to offering a suggestion for front-end developers  
>> preparing to
>> introduce TEI support. Here is a typical TEI entry:
>>     <entry key="an toạ">
>>     <form><orth>an toạ</orth><pron>(phonetic representation)</ 
>> pron></form>
>>     <gramGrp><pos>verb</pos></gramGrp>
>>     <def>To take a seat, to be seated</def>
>>     <eg><q>mời các vị an toạ</q></eg><trans><tr>pray,  
>> everyone, take a
>>     seat</tr></trans>
>>     </entry>
>> Here is what it looks like in BibleTime:
>>     AN TOẠ  an toạ(phonetic representation)verb To take a seat,  
>> to be seated mời
>>     các vị an toạpray, everyone, take a seat
Is BibleTime built with the latest from SVN? If not, then it will use  
the Plaintext filter. The TEI filter does stylization.

>> I'm not meaning to pick on BibleTime--BibleCS only formats the part  
>> of speech in
>> italics.

The TEI filters could stand some improvement. They only style a few  
elements. But not <orth> <pron> ...  For example, <orth> could be  
bold, <pron> be italic, ....

I've made some suggestions and implemented them in BibleDesktop.

So take a look at BibleDesktop for an example of what can be done.

>> Here's the suggestion. Recently a friend of mine pointed me to an  
>> SIL-developed
>> program that can be used to create and publish lexicons. It's  
>> called Lexique
>> Pro, and you can download it at http://www.lexiquepro.com/download.htm 
>> . They use
>> a TeX-like method of tagging data, but there's no reason why what  
>> they have done
>> can't be applied to XML data. Here is the above example formatted  
>> by Lexique Pro:
>>     *an toạ* /verb. /[(phonetic representation)];To take a seat,  
>> to be seated.
>>     *mời các vị an toạ* pray, everyone, take a seat.
>> Notice that they have varied the font, font size, font color, bold,  
>> and italics
>> of each part of the entry so that it is easier to read. They have  
>> also added
>> punctuation to separate parts of the entry.

The problem with <entry> as opposed to <entryFree> is that it is  
difficult to encode the entry as found in the printed work.

<entry> is more like a database entry. The <entry> requires elements  
to be in a particular order and nested in a particular fashion and may  
not allow text in places one would want.

<entryFree> is more like a document.  The elements can come in any  
order, nested in any fashion and text can be interspersed as desired.  
With entry free, it is important not to add "punctuation" as one  
should assume that every "jot and tittle" is present.

When it comes to the SWORD engine (also JSword), our filters do not  
invent punctuation. Just styling. Also our filters do not reorder  
content. It merely dumps text content with styling base upon the  
element containing it.

To properly handle <entry> and <entryFree> it probably is necessary to  
note that and use it to decide on adding punctuation.

I think that as we transform e-texts into TEI that <entryFree> will be  
what's used. <entry> seems more appropriate for original works.

>>  Before I heard about the upcoming
>> TEI support I had put together a dictionary using THML, complete with
>> punctuation and line breaks to help make it easier to read the  
>> entry. That's not
>> the role of the TEI xml file, though. Lexique Pro's way of handling  
>> entries is
>> not the only way, but I suggest it as ONE useful way developed by  
>> people who
>> deal with lexicons daily.

I haven't looked at Lexique, but it sounds interesting.

In Him,
