[sword-devel] KJV2006 and divineName

Fri Apr 21 04:38:04 MST 2006

Ted Walther wrote:
> On Sat, Mar 25, 2006 at 06:45:58PM -0700, Troy A. Griffitts wrote:
>> <w lemma="strong:1">word1 word2</w> <w lemma="strong:2">word3 word4 
>> word5</w> <w lemma="strong:3>word6</w> <w lemma="strong:4>word7 word8 
>> word8</w>
>>
>> Most printed Bibles with Strong's numbers merely insert numbers into
>> the text, imply the previous word or some number of words are related
>> to that number.  Our NT human tagging allowed us to be exact, even
>> non-contiguous.  We don't have this level of markup in the OT.
>>
>> Dude, I'm so excited about all this work you're putting into this data!
>> I'm sure so many projects (inside and outside of CrossWire) will be
>> blessed by this!
>
> Indeed.  I was just getting my project kicked off based on the KJV2003
> when I noticed some problems with the Strong's number markup in the OT.
> I don't really want to delay my project, but if KJV2006 is less than a
> few months away, I can wait.

The next beta release should be the last one with no changes until the 
final release. Look for an announcement of the final beta "any day now." 
You can get the current work at www.crosswire.org/~dmsmith/kjv2006.

When it is released really depends upon the ability of the windows 
version of SWORD to handle it. There are 4 software changes that need to 
be made before it is released. 3 are in the SWORD api and one is in 
osis2mod. These changes are being worked by my guess is that they will 
be completed after the Spring semester, which is soon.

If these are not changed in the code, then I can use xslt to transform 
the master document into one that works around these problems.

>
> Really, I don't think the connective words like "And the" should be
> included as part of the Strong's number.  They should be outside.

The approach has been fairly simple, the KJV uses italics in the printed 
copies to indicate what was added to the Greek or Hebrew. These are 
marked with <transChange>And the</transChange>. The remainder of the 
words are understood to be translation from the Greek and Hebrew. To 
that extent they should be surrounded with strongs numbers.

There are some empty strongs numbers as not everything in the original 
was needed to be translated.

As Troy noted the OT was programmatically tagged with the strong number 
that fell at a point in the text surrounded everything from the previous 
number that was not italic.

The NT was not programmatically tagged but done by people using a 
software tool. The result in the NT is that verse by verse every strongs 
number in the TR is present in the KJV NT. The empty ones are not 
necessarily at a good location.

All this to say, fixing the tagging is a manual, analytical exercise. It 
should be done, but is outside the scope of this effort.
>
>
> I've noticed some verses have the first word not surrounded by
> appropriate tags giving the strongs number, yet other words are labelled
> as "NIH" which is very convenient.  Could we have all words put inside
> tags like that for easier parsing?
At this time, the established goals of this effort have been reached. If 
there are specific, identified mistakes we can fix those, up until the 
release. However there are other things that can be done that have not 
been done.  Any other changes will be for a following effort. If there 
is a clear algorithm that can be applied, that does not swap one set of 
problems with another, I think it would make sense to make that change.

I will be checking the final beta into SVN and then it will be open to 
fixes such as these. When enough have accumulated then we can 
re-release. (At least that is my thought)