[jsword-devel] Term counts is double what it should be

Chris Burrell chris at burrell.me.uk
Thu Feb 7 11:38:22 MST 2013


Ah right - I'd just found that Gal.3.1 was returning the content of the
Romans 15.4! Will try the pull request.

Chris



On 7 February 2013 18:25, DM Smith <dmsmith at crosswire.org> wrote:

> Just put up a pull request.
>
> The code's bug was in !s.equals(termText). Also found in appropriate
> handling of errors.
>
> In Him,
> DM
>
> On Feb 7, 2013, at 12:33 PM, Chris Burrell <chris at burrell.me.uk> wrote:
>
> I see you've raised a bug in JIRA... Any pointers on the bug?
> http://www.crosswire.org/tracker/browse/JS-243
>
> I can tell from Luke, that this is the content:
>
> G3588 G1063 G3745 G4270 G1519 G2251 G1319 G4270 G2443 G2192 G1223 G3588
> G5281 G2532 G3588 G3874 G3588 G1124 G2192 G3588 G1680
>
> which bears very little resemblance to what OsisUtil#getStrongsNumbers
> returns:
>
> G5599 G453 G1052 G5101 G940 G5209 G3982 G3361 G3982 G3588 G225 G2596 G3739
> G3788 G2424 G5547 G4270 G4717 G1722 G5213
>
> Chris
>
>
>
> On 7 February 2013 14:52, DM Smith <dmsmith at crosswire.org> wrote:
>
>> There's a bug in StrongsNumberFilter. Looking at it now.
>> -- DM
>>
>> On Feb 7, 2013, at 9:03 AM, Chris Burrell <chris at burrell.me.uk> wrote:
>>
>> Yes - it is double.
>>
>> Gal.3.1 gives me an explanation of. When I try with my own lucene
>> indexes, they look alright. However, the pattern is not consistent. The
>> counts are often correct. Haven't got an idea of proportion yet. (see
>> explanation in screenshot below)
>>
>> <image.png>
>>
>>
>> On 7 February 2013 14:01, DM Smith <dmsmith at crosswire.org> wrote:
>>
>>> Does luke give you access to the counts? Is it double too? -- DM
>>>
>>> On Feb 7, 2013, at 8:22 AM, Chris Burrell <chris at burrell.me.uk> wrote:
>>>
>>> No doubt that would cause issue too, but my case here is actually for
>>> most words, even those not split.
>>>
>>> I think a term vector allows you to store the position/offsets of the
>>> terms in each document, so that you can accurately work out where it was in
>>> the original sentence/verse even though you may not have the original
>>> stored any longer.
>>>
>>> For the purpose of counts I don't think it's necessary, although I
>>> haven't tried without yet.
>>> Chris
>>>
>>>
>>>
>>> On 7 February 2013 13:12, DM Smith <dmsmith at crosswire.org> wrote:
>>>
>>>> Not sure if this is the problem:
>>>> In the KJV, there are a lot of splits of Greek as it translates into
>>>> English.
>>>>
>>>> For example, in Rev 22.5 look at φωτιζει αυτους which translates
>>>> directly into English as "gives light to them", but is translated in the
>>>> KJV as "giveth them light", so "them" splits "giveth light":
>>>> <verse osisID="Rev.22.5" sID="Rev.22.5"/>
>>>> <w src="1" lemma="strong:G2532 tr:και" morph="robinson:CONJ">And</w>
>>>> <w src="4" lemma="strong:G2071 tr:εσται"
>>>> morph="robinson:V-FXI-3S">there shall be</w>
>>>> <w src="3" lemma="strong:G3756 tr:ουκ" morph="robinson:PRT-N">no</w>
>>>> <w src="2" lemma="strong:G3571 tr:νυξ" morph="robinson:N-NSF">night</w>
>>>> <w src="5" lemma="strong:G1563 tr:εκει" morph="robinson:ADV">there</w>;
>>>> <w src="6" lemma="strong:G2532 tr:και" morph="robinson:CONJ">and</w>
>>>> <w src="9" lemma="strong:G2192 tr:εχουσιν"
>>>> morph="robinson:V-PAI-3P">they</w>
>>>> <w src="7" lemma="strong:G5532 tr:χρειαν"
>>>> morph="robinson:N-ASF">need</w>
>>>> <w src="8" lemma="strong:G3756 tr:ουκ" morph="robinson:PRT-N">no</w>
>>>> <w src="10" lemma="strong:G3088 tr:λυχνου"
>>>> morph="robinson:N-GSM">candle</w>,
>>>> <w src="11" lemma="strong:G2532 tr:και"
>>>> morph="robinson:CONJ">neither</w>
>>>> <w src="12" lemma="strong:G5457 tr:φωτος"
>>>> morph="robinson:N-GSN">light</w>
>>>> <w src="13" lemma="strong:G2246 tr:ηλιου" morph="robinson:N-GSM">of the
>>>> sun</w>;
>>>> <w src="14" lemma="strong:G3754 tr:οτι" morph="robinson:CONJ">for</w>
>>>> <w src="15" lemma="strong:G2962 tr:κυριος" morph="robinson:N-NSM">the
>>>> Lord</w>
>>>> <w src="16 17" lemma="strong:G3588 strong:G2316 tr:ο tr:θεος"
>>>> morph="robinson:T-NSM robinson:N-NSM">God</w>
>>>> <w src="18" lemma="strong:G5461 tr:φωτιζει" morph="robinson:V-PAI-3S"
>>>> type="x-split-3868">giveth</w>
>>>> <w src="19" lemma="strong:G846 tr:αυτους"
>>>> morph="robinson:P-APM">them</w>
>>>> <w src="18" lemma="strong:G5461 tr:φωτιζει" morph="robinson:V-PAI-3S"
>>>> type="x-split-3868">light</w>:
>>>> <w src="20" lemma="strong:G2532 tr:και" morph="robinson:CONJ">and</w>
>>>> <w src="21" lemma="strong:G936 tr:βασιλευσουσιν"
>>>> morph="robinson:V-FAI-3P">they shall reign</w>
>>>> <w src="22" lemma="strong:G1519 tr:εις" morph="robinson:PREP">for</w>
>>>> <w src="23 24" lemma="strong:G3588 strong:G165 tr:τους tr:αιωνας"
>>>> morph="robinson:T-APM robinson:N-APM">ever</w>
>>>> <w src="25 26" lemma="strong:G3588 strong:G165 tr:των tr:αιωνων"
>>>> morph="robinson:T-GPM robinson:N-GPM">and ever</w>.
>>>> <milestone type="x-strongsMarkup" resp="pdy 2003-12-31-00:30"/>
>>>> <verse eID="Rev.22.5"/>
>>>>
>>>> BTW, I'm not sure what a TermVector is nor how it would be used.
>>>>
>>>> In Him,
>>>>         DM
>>>>
>>>> On Feb 7, 2013, at 6:36 AM, Chris Burrell <chris at burrell.me.uk> wrote:
>>>>
>>>> > Hi
>>>> >
>>>> > Using Luke, and my own code to look at the indexes created by JSword
>>>> shows that the term count is double what it should be...
>>>> >
>>>> > Any ideas why that might be? I can't quite follow the logic in
>>>> StrongAnalyser but I attempted to work step/debug through it and it didn't
>>>> look like it was double counting. Might need to do that again.
>>>> >
>>>> > DM, haven't checked, but apparently the TermVector may not be what
>>>> I'm using..
>>>> >
>>>> > Chris
>>>> >
>>>> > _______________________________________________
>>>> > jsword-devel mailing list
>>>> > jsword-devel at crosswire.org
>>>> > http://www.crosswire.org/mailman/listinfo/jsword-devel
>>>>
>>>>
>>>
>>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/jsword-devel/attachments/20130207/a5eea2c7/attachment-0001.html>


More information about the jsword-devel mailing list