[sword-devel] lemma html markup lack, and dictionary sorting

Karl Kleinpaste karl at kleinpaste.org
Thu Mar 19 08:20:08 MST 2009


I had a request this morning to add what could be a very nice feature in
Xiphos: When hovering or clicking on a lemma, look it up in my
dictionary module InvStrongsRealGreek.  This module is the conversion of
my StrongsRealGreek module to inverted keying, that is, keyed by word
instead of by Strong's ref.

In principle, this is a nifty idea, and seems very obvious in hindsight.
I've implemented a 1st pass of the feature -- it was trivial -- but I
have 2 problems.

[1] When looking up an ordinary Strong's ref, the URL that gets dumped
into the markup by the engine is of this form:
     passagestudy.jsp?action=showStrongs&type=Greek&value=423
Note the presence of "type=Greek".  The similar URL for a lemma is:
     passagestudy.jsp?action=showStrongs&type=&value=%E1%BC%80%CE%BD%CE%B5%CF%80%CE%AF%CE%BB%CE%B7%CF%80%CF%84%CE%BF%CF%82
Note "type=" with empty content.  There is no indication of the language
to be looked up.

Now, as it happens, right now we have no Hebrew modules with
lemmatization, so for the time being I've implemented the feature
aggressively so as to pick InvStrongsRealGreek alone, based purely on
noticing the lack of a language type, as the discriminant between a
normal Strong's numeric lookup and a lemma lookup.  This will need some
kind of enhancement in the future so as to make it possible to determine
what dictionary should be consulted, if we ever gain Hebrew with lemmas.
(I have InvStrongsRealHebrew, too.)

[2] The bigger problem is that the lookup is quite poor, because (it
appears to me, and I'm wide open to re-education on this) the nature of
the key sort in InvStrongsRealGreek is badly warped by, it seems, the
failure of diacriticals to sort well.  So in 1Tim 3:2, when looking up
references on "ανεπιλημπτον", the normal Strong's number reference
(above) works fine, and I get G423 as expected.  But when I then try to
get the lemma lookup, the dictionary reference into the inverted module
misses the mark, returning "ανεξικακος", which is actually G420.

This is a comparatively mild example -- at least the first few
characters match well.  But other match failures are much worse.
Looking at "λογος", G3056, the lookup of exactly that word coughs up
"Κως", G2972.  This is at best distracting and at worst outright
misleading.

Is there something I should have done differently when generating
InvStrongsRealGreek so as to get better keying, or is there a more
fundamental problem in how to get the correct sort in such keys?



More information about the sword-devel mailing list