[sword-devel] Open Hebrew Lexicon. (David Troidl), (Daniel Owens)

Daniel Owens dhowens at pmbx.net
Sun Sep 4 19:48:44 MST 2011


On 09/04/2011 01:12 PM, David Troidl wrote:
> Hi Aaron,
> On 9/4/2011 10:50 AM, Aaron Christianson wrote:
>> Daniel,
>> This does sound very much like what I am interested in doing, but 
>> unfortunately, you seem to be using WeSay, which appears to have some 
>> deficiencies in it's Linux versions that will make it unusable for 
>> editing a work of this kind (no support for non-latin scripts, and 
>> issues copying and pasting non-ascii characters).  I'm afraid that I 
>> use Linux exclusively, and my ability to contribute to this project 
>> would be severely limited.
Yes, for a Linux-only or a Mac person, this is a significant problem. I 
am a mainly-Linux person, and I am waiting eagerly for WeSay to be fully 
functional in Linux (without holding my breath). The reason I chose 
WeSay was to encourage non-techies with an easy-to-use application that 
supports structured collaboration using a version control system. It 
works great with unicode in Windows, handles multiple contributors 
easily, and is developed by people trained and experienced in creating 
lexica. One additional useful feature is that it offers the ability to 
add semantic domain information. However, for our purposes WeSay is 
basically limited to Windows at this point.
> I was going to write to Daniel privately, but maybe this is a topic 
> that needs to be brought up here.  My concern is the proliferation of 
> formats, trying to accomplish the same thing.  With Daniel's LIFT 
> dictionary, the SWORD TEI-based lexicon format, whatever you would use 
> and my ad hoc schema, all with similar goals, there could be a lot of 
> duplication of effort.
Yes, I also don't like the idea of duplicating efforts.
> I made my schema just to get into the work, and with the intention of 
> making it easy to transform to another format, when there was 
> something better.  I know that the TEI could handle all the 
> requirements, but it's huge and forbidding.  The SWORD format examples 
> I've seen appear dense and hard to understand.  I'm not certain if it 
> has all the capabilities my lexicon needs.  I was going to ask Daniel 
> if his LIFT dictionary could handle it all, and what would be required 
> to transform between the two.  Also if his setup could import 
> transformed entries.  Now if WeSay is a problem with Linux, is that 
> insurmountable?  Could the LIFT dictionary be used in another 
> context?  Or what other format would be better?
On formats: SWORD's implementation of TEI for a lexicon is probably not 
the best format. At least I have not considered it to be a good format 
for creating a lexicon. I chose LIFT XML because it is a format that 
several SIL programs use (WeSay and FieldWorks). It is designed for 
lexica, so I imagine it can handle anything we need. WeSay allows you to 
create custom fields, which makes it easy to work with. LIFT is just an 
XML standard, so there is nothing to prevent one from creating an 
application to write to a LIFT XML file.

On applications: I have been ruminating on the problem of WeSay being 
Windows-only and wondering if a browser-based solution written in PHP or 
something like that would be a "quick" solution for Mac and Linux users. 
The PHP code and LIFT file could reside on the contributor's machine 
with Mercurial negotiating the differences with the server. That would 
mean the PHP program would have to be written to work well with WeSay, 
which could be a job in itself. I just don't have the time or expertise 
to pull it off. But if someone could do that, it would open up 
possibilities for contributors.

Our project is moving so slowly that I am open to changing the way we do 
it. Data format questions aside, the following features are needed for 
an interface for developing a Hebrew lexicon:

    * Support RtoL Unicode
    * Easy to use for non-techies (virtually brainless, if possible)
    * Changes stored using a version control system allowing for
    * Support features that are commonly accepted as good linguistic
      practice, such as semantic domains
    * Customizable for our needs

So far WeSay works the best for that, but it is limited to Windows. I am 
open to new ideas.


More information about the sword-devel mailing list