<div dir="ltr">No doubt that would cause issue too, but my case here is actually for most words, even those not split.<div><br></div><div style>I think a term vector allows you to store the position/offsets of the terms in each document, so that you can accurately work out where it was in the original sentence/verse even though you may not have the original stored any longer. </div>
<div style><br></div><div style>For the purpose of counts I don&#39;t think it&#39;s necessary, although I haven&#39;t tried without yet.</div><div style>Chris</div><div style><br></div></div><div class="gmail_extra"><br>
<br><div class="gmail_quote">On 7 February 2013 13:12, DM Smith <span dir="ltr">&lt;<a href="mailto:dmsmith@crosswire.org" target="_blank">dmsmith@crosswire.org</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Not sure if this is the problem:<br>
In the KJV, there are a lot of splits of Greek as it translates into English.<br>
<br>
For example, in Rev 22.5 look at φωτιζει αυτους which translates directly into English as &quot;gives light to them&quot;, but is translated in the KJV as &quot;giveth them light&quot;, so &quot;them&quot; splits &quot;giveth light&quot;:<br>

&lt;verse osisID=&quot;Rev.22.5&quot; sID=&quot;Rev.22.5&quot;/&gt;<br>
&lt;w src=&quot;1&quot; lemma=&quot;strong:G2532 tr:και&quot; morph=&quot;robinson:CONJ&quot;&gt;And&lt;/w&gt;<br>
&lt;w src=&quot;4&quot; lemma=&quot;strong:G2071 tr:εσται&quot; morph=&quot;robinson:V-FXI-3S&quot;&gt;there shall be&lt;/w&gt;<br>
&lt;w src=&quot;3&quot; lemma=&quot;strong:G3756 tr:ουκ&quot; morph=&quot;robinson:PRT-N&quot;&gt;no&lt;/w&gt;<br>
&lt;w src=&quot;2&quot; lemma=&quot;strong:G3571 tr:νυξ&quot; morph=&quot;robinson:N-NSF&quot;&gt;night&lt;/w&gt;<br>
&lt;w src=&quot;5&quot; lemma=&quot;strong:G1563 tr:εκει&quot; morph=&quot;robinson:ADV&quot;&gt;there&lt;/w&gt;;<br>
&lt;w src=&quot;6&quot; lemma=&quot;strong:G2532 tr:και&quot; morph=&quot;robinson:CONJ&quot;&gt;and&lt;/w&gt;<br>
&lt;w src=&quot;9&quot; lemma=&quot;strong:G2192 tr:εχουσιν&quot; morph=&quot;robinson:V-PAI-3P&quot;&gt;they&lt;/w&gt;<br>
&lt;w src=&quot;7&quot; lemma=&quot;strong:G5532 tr:χρειαν&quot; morph=&quot;robinson:N-ASF&quot;&gt;need&lt;/w&gt;<br>
&lt;w src=&quot;8&quot; lemma=&quot;strong:G3756 tr:ουκ&quot; morph=&quot;robinson:PRT-N&quot;&gt;no&lt;/w&gt;<br>
&lt;w src=&quot;10&quot; lemma=&quot;strong:G3088 tr:λυχνου&quot; morph=&quot;robinson:N-GSM&quot;&gt;candle&lt;/w&gt;,<br>
&lt;w src=&quot;11&quot; lemma=&quot;strong:G2532 tr:και&quot; morph=&quot;robinson:CONJ&quot;&gt;neither&lt;/w&gt;<br>
&lt;w src=&quot;12&quot; lemma=&quot;strong:G5457 tr:φωτος&quot; morph=&quot;robinson:N-GSN&quot;&gt;light&lt;/w&gt;<br>
&lt;w src=&quot;13&quot; lemma=&quot;strong:G2246 tr:ηλιου&quot; morph=&quot;robinson:N-GSM&quot;&gt;of the sun&lt;/w&gt;;<br>
&lt;w src=&quot;14&quot; lemma=&quot;strong:G3754 tr:οτι&quot; morph=&quot;robinson:CONJ&quot;&gt;for&lt;/w&gt;<br>
&lt;w src=&quot;15&quot; lemma=&quot;strong:G2962 tr:κυριος&quot; morph=&quot;robinson:N-NSM&quot;&gt;the Lord&lt;/w&gt;<br>
&lt;w src=&quot;16 17&quot; lemma=&quot;strong:G3588 strong:G2316 tr:ο tr:θεος&quot; morph=&quot;robinson:T-NSM robinson:N-NSM&quot;&gt;God&lt;/w&gt;<br>
&lt;w src=&quot;18&quot; lemma=&quot;strong:G5461 tr:φωτιζει&quot; morph=&quot;robinson:V-PAI-3S&quot; type=&quot;x-split-3868&quot;&gt;giveth&lt;/w&gt;<br>
&lt;w src=&quot;19&quot; lemma=&quot;strong:G846 tr:αυτους&quot; morph=&quot;robinson:P-APM&quot;&gt;them&lt;/w&gt;<br>
&lt;w src=&quot;18&quot; lemma=&quot;strong:G5461 tr:φωτιζει&quot; morph=&quot;robinson:V-PAI-3S&quot; type=&quot;x-split-3868&quot;&gt;light&lt;/w&gt;:<br>
&lt;w src=&quot;20&quot; lemma=&quot;strong:G2532 tr:και&quot; morph=&quot;robinson:CONJ&quot;&gt;and&lt;/w&gt;<br>
&lt;w src=&quot;21&quot; lemma=&quot;strong:G936 tr:βασιλευσουσιν&quot; morph=&quot;robinson:V-FAI-3P&quot;&gt;they shall reign&lt;/w&gt;<br>
&lt;w src=&quot;22&quot; lemma=&quot;strong:G1519 tr:εις&quot; morph=&quot;robinson:PREP&quot;&gt;for&lt;/w&gt;<br>
&lt;w src=&quot;23 24&quot; lemma=&quot;strong:G3588 strong:G165 tr:τους tr:αιωνας&quot; morph=&quot;robinson:T-APM robinson:N-APM&quot;&gt;ever&lt;/w&gt;<br>
&lt;w src=&quot;25 26&quot; lemma=&quot;strong:G3588 strong:G165 tr:των tr:αιωνων&quot; morph=&quot;robinson:T-GPM robinson:N-GPM&quot;&gt;and ever&lt;/w&gt;.<br>
&lt;milestone type=&quot;x-strongsMarkup&quot; resp=&quot;pdy 2003-12-31-00:30&quot;/&gt;<br>
&lt;verse eID=&quot;Rev.22.5&quot;/&gt;<br>
<br>
BTW, I&#39;m not sure what a TermVector is nor how it would be used.<br>
<br>
In Him,<br>
        DM<br>
<div><div class="h5"><br>
On Feb 7, 2013, at 6:36 AM, Chris Burrell &lt;<a href="mailto:chris@burrell.me.uk">chris@burrell.me.uk</a>&gt; wrote:<br>
<br>
&gt; Hi<br>
&gt;<br>
&gt; Using Luke, and my own code to look at the indexes created by JSword shows that the term count is double what it should be...<br>
&gt;<br>
&gt; Any ideas why that might be? I can&#39;t quite follow the logic in StrongAnalyser but I attempted to work step/debug through it and it didn&#39;t look like it was double counting. Might need to do that again.<br>
&gt;<br>
&gt; DM, haven&#39;t checked, but apparently the TermVector may not be what I&#39;m using..<br>
&gt;<br>
&gt; Chris<br>
&gt;<br>
</div></div>&gt; _______________________________________________<br>
&gt; jsword-devel mailing list<br>
&gt; <a href="mailto:jsword-devel@crosswire.org">jsword-devel@crosswire.org</a><br>
&gt; <a href="http://www.crosswire.org/mailman/listinfo/jsword-devel" target="_blank">http://www.crosswire.org/mailman/listinfo/jsword-devel</a><br>
<br>
</blockquote></div><br></div>