[sword-devel] can't do lucene Hebrew searches in KJV

DM Smith dmsmith at crosswire.org
Thu Jan 20 10:30:22 MST 2011

On 01/20/2011 11:29 AM, Karl Kleinpaste wrote:
> BTW, belated thanx to Nic for pointing that out for us.
> I have to note that the Strong's content isn't zero-prefixed so as to
> generate exactly-5-digits entries, either.  Gen 1:1...
> |<w lemma="strong:H07225">In the beginning</w>  <w
> | lemma="strong:H0430">God</w>  <w lemma="strong:H0853 strong:H01254"
> | morph="strongMorph:TH8804">created</w>  <w lemma="strong:H08064">the
> | heaven</w>  <w lemma="strong:H0853">and</w>  <w lemma="strong:H0776">the
> | earth</w>.
> It's just an arbitrary, single, leading zero on all entries.  Even Gen
> 2:24's use of H1 is encoded as H01.
> "sed -e 's/strong:H0/strong:H/g'" has a salutory and satisfying effect.
> I've just replaced my KJV content with the result of doing so.  Much nicer.
> Interesting, that the similar encoding is not present for the NT Greek,
> so no such fix is needed.

I find this interesting as the keeper of the KJV module.

Going back to the baseline of the current effort (i.e. the KJV2003 
project) the encoding has not changed.

Since this is the first that I have heard of the problem, I'm guessing 
that a change in the SWORD engine has produced a regression? Looking at 
the code, I don't see anything out of the ordinary. To search, the user 
has to supply the Strong's number exactly as it is in the module. It 
looks like it has been this way "forever".

For a search to work, the search request and the stored key need to be 
the same. In JSword, we satisfy this by normalizing the Strong's number 
when constructing the Lucene index. We normalize the user's request the 
same way.

Also when displaying the Strong's number we apply a normalization too. 
No sense in the user seeing the internal representation.

So, it seems to me that the question is: What is the proper way to fix 
the problem?

In Him,

More information about the sword-devel mailing list