<div dir="ltr">Sounds good. A few notes, <div style><ul style><li style>the STEP interlinear functionality tries (tried?) to use this functionality to provide better interlinears. We currently don&#39;t use the x-split/or src, but could do either. </li>

</ul><ul style><li style>With H00, it was accepted that H00 almost always referred to the next tag, when triple tagged, it was the strong number next to H00 that was used; e.g. &quot;H00 H1 H2&quot; means that the next occurrence of H1 is split. Is there such a convention with x-split (there aren&#39;t that many occurrences of this, but there are a few all the same).</li>

</ul><ul style><li style>We were intending to include H00 in the the ESV that Tyndale are tagging. If that&#39;s not to be the case, can we decide the most appropriate way to do this? src sounds good, although sounds like it might be difficult to do properly for the KJV without a lot of manual work. We&#39;re hoping to have something ready-ish very soon.<br>

</li></ul><div style>How easy would it be to restore the data via x-split?</div><div style>Chris</div><div style><br></div></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On 4 January 2013 21:57, DM Smith <span dir="ltr">&lt;<a href="mailto:dmsmith@crosswire.org" target="_blank">dmsmith@crosswire.org</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div><div><div class="im"><div>On Jan 4, 2013, at 4:34 PM, Chris Burrell &lt;<a href="mailto:chris@burrell.me.uk" target="_blank">chris@burrell.me.uk</a>&gt; wrote:</div>

<br></div><div class="im"><blockquote type="cite"><div dir="ltr"><div>There are two separate issues here.</div><div><br></div><div>1- The fact that we retrieve the closest match to a strong number is IMHO rather obscure and confusing in itself. I&#39;ve hit this several times and found through rather laborious investigation that a module was using a bad strong number, or some piece of code hadn&#39;t quite formatted the number right, etc.</div>

</div></blockquote><div><br></div></div>This is a feature of a dictionary lookup. This will typically find the longest common prefix.</div><div><br></div><div>It&#39;d probably be good to mark some dictionaries as exact match only. Strong&#39;s, Robinson&#39;s, and maybe daily devotions seem like candidates.</div>

<div><div class="im"><br><blockquote type="cite"><div dir="ltr">

<div><br></div>2- H00: The KJV is the most obvious example of a module that has/had it. It looks like someone has removed them all in the KJV2006 project (<a href="http://www.crosswire.org/~dmsmith/kjv2006/index.html" target="_blank">http://www.crosswire.org/~dmsmith/kjv2006/index.html</a>). Version 2.3 of the module still has it. Did we replace this with something else? H00 was used to indicate that the first occurrence of the strong number was the same original word as the second one. We were going to put them into the ESV. <div>


<br></div><div>So for example Gen 2.9, used to read something like this:</div><div><br><div><div>&lt;div&gt;&lt;title type=&quot;x-gen&quot;&gt;Genesis 2:9&lt;/title&gt;</div><div>&lt;verse osisID=&quot;Gen.2.9&quot;&gt;</div>


<div><span style="white-space:pre-wrap">        </span>&lt;w lemma=&quot;strong:H04480&quot;&gt;And out&lt;/w&gt; </div><div><span style="white-space:pre-wrap">        </span>&lt;w lemma=&quot;strong:H0127&quot;&gt;of the ground&lt;/w&gt; </div>


<div><span style="white-space:pre-wrap">        </span><b>&lt;w lemma=&quot;strong:H00 strong:H06779&quot;&gt;made&lt;/w&gt; </b></div><div><span style="white-space:pre-wrap">        </span>&lt;w lemma=&quot;strong:H03068&quot;&gt;the &lt;seg&gt;&lt;divineName&gt;Lord&lt;/divineName&gt;&lt;/seg&gt;&lt;/w&gt; </div>


<div><span style="white-space:pre-wrap">        </span><b>&lt;w lemma=&quot;strong:H0430&quot;&gt;God&lt;/w&gt; </b></div><div><span style="white-space:pre-wrap">        </span>&lt;w lemma=&quot;strong:H06779&quot; morph=&quot;strongMorph:TH8686&quot;&gt;to grow&lt;/w&gt; </div>


<div>         [ ... ... ... some more stuff goes here ... ... ...]</div><div>&lt;/verse&gt;&lt;/div&gt;<br></div></div></div><div><br></div><div>In the above, this indicates that the translators split the word H06779 into &quot;made&quot; and into &quot;to grow&quot;. </div>


<div><br></div><div>It seems someone has removed all of these marks. However we don&#39;t have the &quot;src&quot; tag either so can anyone suggest how I can tell which bits go together and which bits go apart? What was the reasoning behind this change?</div>

</div></blockquote><br></div>I maintain the KJV. I couldn&#39;t find a purpose of H00. So I took it out as being wrong. If it is the splitting of words, we have a mechanism for that in the NT, which could be used. It uses src=&quot;XX&quot; (which for the NT ties back to the XX word in the verse in a particular Greek module), the type=&quot;x-split&quot; and subType=&quot;x-NN&quot; where NN is a unique number w/in the verse having a value greater than the greatest value of src=&quot;XX&quot;. I&#39;m not at all sure that subType is still needed. Both src and type are each sufficient to solve the problem.</div>

<div><br></div><div>A bit more exploring to do on the KJV...<div class="im"><br><div><br><div></div></div><blockquote type="cite"><div dir="ltr">

<div><br></div><div>Chris</div><div><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On 4 January 2013 21:07, DM Smith <span dir="ltr">&lt;<a href="mailto:dmsmith@crosswire.org" target="_blank">dmsmith@crosswire.org</a>&gt;</span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">H00 is not a valid Strong&#39;s number. The modules that have it should be re-done. Do you know which are the problem modules?<br>


<br>

The problem with allowing H00 is that it will not find an entry in a Strong&#39;s dictionary and will get the nearest one. Which is better? An error filling the console or confusing the user?<br>

<br>

I don&#39;t mind changing the regex to be simpler, but it should not create further problems.<br>

<br>

The part at the end is an optional extension. We have a module in the wings that has it.<br>

<br>

In Him,<br>

        DM<br>

<div><div><br>

On Jan 4, 2013, at 3:34 PM, Chris Burrell &lt;<a href="mailto:chris@burrell.me.uk" target="_blank">chris@burrell.me.uk</a>&gt; wrote:<br>

<br>

&gt; Hi<br>

&gt;<br>

&gt; Can I suggest a fix to the StrongNumberFilter, which currently relies on<br>

&gt; org.crosswire.jsword.book.study.StrongsNumber<br>

&gt;<br>

&gt; The regular expression used to match the Strong number is:<br>

&gt; private static final Pattern STRONGS_PATTERN = Pattern.compile(&quot;([GgHh])0*([1-9][0-9]*)!?([A-Za-z]+)?&quot;);<br>

&gt;<br>

&gt; Unfortunately, some texts use H00 as a strong number to indicate that the tagged word is in 2 places (i.e. this is only the first part of the tag).<br>

&gt;<br>

&gt; The above expression causes huge amounts of Logging to be output to the console.<br>

&gt;<br>

&gt; I suggest we change it to something like<br>

&gt;<br>

&gt; [GgHh][0-9]+<br>

&gt;<br>

&gt; Also, what&#39;s the stuff at the end of the regex? Haven&#39;t come across any like that...<br>

&gt;<br>

&gt; Chris<br>

&gt;<br>

</div></div>&gt; _______________________________________________<br>

&gt; jsword-devel mailing list<br>

&gt; <a href="mailto:jsword-devel@crosswire.org" target="_blank">jsword-devel@crosswire.org</a><br>

&gt; <a href="http://www.crosswire.org/mailman/listinfo/jsword-devel" target="_blank">http://www.crosswire.org/mailman/listinfo/jsword-devel</a><br>

<br>

</blockquote></div><br></div>

</blockquote></div></div><br></div></div></blockquote></div><br></div>