[sword-devel] can't do lucene Hebrew searches in KJV

Thu Jan 20 11:40:29 MST 2011

Historically, from the OLB days, Strongs Hebrew was prefixed with a '0'
to disambiguate the Hebrew from the Greek.  We started using G or H for
this purpose.  The KJV OT uses a source to which Larry Pierce does not
claim copyright on (he claimed copyright on the NT, thus our KJV2003
effort) but was produced initially for OLB.

I've not noticed this my self, since typically the frontend software I
work with provides Strongs searching as a clickable option, so the
correct key is simply pulled from the modules and works either way.  For
example:

http://crosswire.org/study/parallelstudy.jsp?del=all&add=KJV&add=WLC&add=NASB&add=LXX&key=Gen.1.1

Click on a word in any translation (except the WLC) and you'll be
offered a search option for the Strong's number.  Notice the difference
between the KJV and NASB which I'd never noticed before.

We should really normalize these numbers-- presumably removing the
legacy leading '0', but also should probably provide a way for the user
not to have to ever see a Strong's number unless they want to-- as
mentioned in previous emails on this thread.  My current means of
selecting per word doesn't solve the problem of searching for multiple
words, so I'm open to suggestions.

Nice catch Karl.

Troy

On 01/20/2011 05:30 PM, DM Smith wrote:
> On 01/20/2011 11:29 AM, Karl Kleinpaste wrote:
>> BTW, belated thanx to Nic for pointing that out for us.
>>
>> I have to note that the Strong's content isn't zero-prefixed so as to
>> generate exactly-5-digits entries, either.  Gen 1:1...
>>
>> |<w lemma="strong:H07225">In the beginning</w>  <w
>> | lemma="strong:H0430">God</w>  <w lemma="strong:H0853 strong:H01254"
>> | morph="strongMorph:TH8804">created</w>  <w lemma="strong:H08064">the
>> | heaven</w>  <w lemma="strong:H0853">and</w>  <w
>> lemma="strong:H0776">the
>> | earth</w>.
>>
>> It's just an arbitrary, single, leading zero on all entries.  Even Gen
>> 2:24's use of H1 is encoded as H01.
>>
>> "sed -e 's/strong:H0/strong:H/g'" has a salutory and satisfying effect.
>> I've just replaced my KJV content with the result of doing so.  Much
>> nicer.
>>
>> Interesting, that the similar encoding is not present for the NT Greek,
>> so no such fix is needed.
> 
> I find this interesting as the keeper of the KJV module.
> 
> Going back to the baseline of the current effort (i.e. the KJV2003
> project) the encoding has not changed.
> 
> Since this is the first that I have heard of the problem, I'm guessing
> that a change in the SWORD engine has produced a regression? Looking at
> the code, I don't see anything out of the ordinary. To search, the user
> has to supply the Strong's number exactly as it is in the module. It
> looks like it has been this way "forever".
> 
> For a search to work, the search request and the stored key need to be
> the same. In JSword, we satisfy this by normalizing the Strong's number
> when constructing the Lucene index. We normalize the user's request the
> same way.
> 
> Also when displaying the Strong's number we apply a normalization too.
> No sense in the user seeing the internal representation.
> 
> So, it seems to me that the question is: What is the proper way to fix
> the problem?
> 
> In Him,
>     DM
> 
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page