[sword-devel] imp2ld and alphabetization

Chris Little chrislit at crosswire.org
Sun Oct 28 21:49:12 MST 2007


DM Smith wrote:
> I'm not sure if I am reading the Sword code correctly, but it appears  
> that it is sorting at a byte level and not a character level. That  
> isn't by code points.

I'm pretty sure you're right about what Sword is actually doing, but I 
believe it's also codepoint order, just by the nature of UTF-8 itself. I 
could be wrong.

> One simple way for any application to provide this is to create a  
> Lucene index similar to what we do for a Bible for the dictionary

I don't think mandating Lucene in order to access the contents of a 
module is a simple solution. We can't require Lucene without cutting off 
a number of supported platform. For example, it is unreasonable to 
require Lucene on handheld platforms like PocketPC and MacSword would be 
obligated to use Lucene just to read LD modules.

We might be able to do a lexicon with the GenBook driver and just keep 
every entry at the same level. I don't know how badly this would hurt 
key lookup.

> There are some related problems to this:
> A user may expect to be able to find a Hebrew word in a Hebrew  
> dictionary independent of the pointing of the word in the dictionary.  
> (i.e. a user may wish to search without specifying accents)

It's possible to have multiple keys share a single entry. So pointed and 
an unpointed keys can point to the same entry. We've done this 
experimentally with dictionaries in the past to permit lookup by a 
Strong's number or the lemma it represents.

> A user may expect to find a word by stem not just by prefix.

I'm not sure whether this is a sort order issue or lookup/search issue. 
Presumably a user would know the word they want and type it in with its 
prefix, even if it is sorted to group with other words sharing the same 
stem.

> A user may expect to be able to type "photos" (a transliteration) and  
> find the real Greek word in a Greek dictionary.

I'm willing to write these users off. We could transliterate back to 
Greek, but I don't think it's worth the effort or processor cycles. I 
don't believe that people who don't know how to read Greek use Greek 
lexicons other than as a novelty.

--Chris



More information about the sword-devel mailing list