[sword-devel] R/w CVS

Victor Porton sword-devel@crosswire.org
Thu, 06 Jun 2002 02:33:06 +0600

> We are happy to assist you in the use of the Sword API, but I think 
> giving us some more concrete explanations of what you're working on 
> would help us in that role.

Firstly what I do:

I write software which will allow conveniently classify every word of Hebrew 
Bible *in the form it is encountered*, not in the primary form (primary form 
is single, masculine, 1st face etc.) More exactly, I write the software for 
creating and using a dictionary of the words in Hebrew Bible in which the keys 
are the words as these are in Bible instead of customary primary forms as in 
most dictionaries. Note that I ignore vowels.

I'm going to incorporate it in the general purpose Bible study tools like 
BibleTime and GnomeSword, creating special widget sets for this. It will among 
other display the tree (as a tree widget) of all grammar forms of a given word 
in Hebrew Bible, allowing to see all the possible meanings of the word.

I thought about many possible file formats. My last variant (If no complaints 
on this, I will most probably stick with it even despite I already several 
times changed "final" decisions about the file format :-) ) is the following:

I create _one_ LD in which there are keys of _two_ kinds:

1. Just a Hebrew word (Note that I will use non-existing Hebrew words and 
non-existing grammar forms as examples, to not spend my time finding real 
examples). Example:

<form ref="LQWE:pqr@abc"/>
<form ref="LQWE:zzz@abc"/>

After the colon goes a "coded" (computer readable) description of a Hebrew 
syntax form (which may be decoded like "noun sing. masc.") So this means that 
"LQWE" can be translated as words with two different grammar forms: "pqr@abc" 
and "zzz@abc".

2. Entry for a Hebrew word in a specified syntax form:

<sense root="word" short="to word">
<!-- a HTML fragment here -->
<sense root="sense" short="being senseful">
<!-- a HTML fragment here -->

This would mean that "LQWE" can be translated as an adverb-like syntax form 
and has two senses: "to word" and "being senseful".

My question: do you consider this file format reasonable? Well, one problem 
with it exists: for enumerating all the Hebrew words presented in a 
dictionary, one would need to enumerate all the entries and throw away ones 
with colons, so spending CPU time etc. one enumerating unneeded entries.

I would use two separate LDs for just words and particular syntax forms of the 
words, but this would create two modules despite of conceptually these are a 
whole and should be always used together. Or may be I mistake: can one module 
include two LDs?
Victor Porton (porton@narod.ru)