<html>

<body>

Dear Rob<br><br>

I've CC'd this to the Group so others can chip in. <br>

Here's a summary of where we are: <br><br>

We are trying to make a version of ESV with Strongs tagging, using the

tagged NASB text as a starting point. <br><br>

THe process we are attempting is: <br><br>

1) convert the NASB XML text to something which looks like a BibleWorks

exported text <br>

  (ie each verse on one line starting with a simple ref (eg Gen 1:1

In the beginning...)<br><br>

2) use the Word 2003+ text comparison tools (which are much superior to

Word 97) to compare the text of both versions producing something like: 

<dl>

<dd>Gen 1:2  <w H776>The earth</w> was

<b><s>formless</s> </b><w H8414><b>formless</b></w> <w

H922>and void</w>, and <w H2822>darkness </w> <w

H5921>was over</w> the <w H6440><b><s>sur</s></b>face

</w> <w H8415>of the deep</w><b><s>, and</s> . And

</b><w H7307>the Spirit </w> <w H430>of God</w>

was <w H7363!b><b><s>moving</s> hovering </b></w> <w

H5921>over</w> the <w H6440><b><s>sur</s></b>face

</w> <w H4325>of the waters.  </w>.<br><br>

</dl>3) create a site where human can easily correct this automatic

markup<br>

 - eg the proof of concept

<a href="http://www.slowley.com/tagger-proof-of-concept/example.html">

here</a>. <br><br>

4) merge the resultant text with the verb parsing in the tagged

KJV<br><br>

Since starting this, I've heard from Troy who originally organised the

team who tagged the NASB. He says his method is: <br><br>

<dl>

<dd>1) starts with a lemma tagged text, the KJV, and CrossWay's ESV data

in OSIS format.

<dd>2) the ESV module is iterated each verse at a time and is processed

as such:

<dd>3) the OSIS markup is stripped from the ESV text and positioning

information is retained

<dd>4) a word table is built from the KJV text:

<dd>       KJV Word 1   

|    Strongs #

<dd>       KJV Word 2   

|    Strongs #    

<dd>5) a second table is build from the ESV text:

<dd>       ESV Word 1    |

<dd>       ESV Word 2    |

<dd>6) these tables are passed to a function which is responsible solely

for the logic to fill in the second part of the second table with

Strong's numbers.

<dd>7) the returned table is used to reconstitute the the OSIS tags to

the ESV text including word-level Strong's markup.

<dd>See a screenshot for the community collaboration tool for KJV Strongs

markup project is at

<a href="http://crosswire.org/sword/kjv2003/#ss">

http://crosswire.org/sword/kjv2003/#ss</a>

<dd>We're hoping to convert it to a web application instead of a

standalone Java GUI, but that hasn't happened yet.<br>

<dd>I'd love to work together on this effort.  Please keep me posted

on any progress and let me know if I can help in anyway.<br>

<dd>Troy<br>

<br>

<br>

<br>

<br><br>

</dl>At 10:18 17/03/2011, Robert Slowley wrote:<br>

<blockquote type=cite class=cite cite="">So, presumably if you could

script it to break each chapter in to a<br>

separate file, do the comparisons, and then re-export as a single

file<br>

we could import that in to a tool like mine so a human could fix the<br>

errors and do the bits the auto-comparison failed to do.<br><br>

On Tue, Mar 15, 2011 at 8:19 AM, David Instone-Brewer<br>

<davidinstonebrewer@gmail.com> wrote:<br>

> From the automatic comparisons produced by Word, we get:<br>

><br>

> Gen 1:1  <w H7225>In the beginning,</w> <w

H430>God</w> <w<br>

> H1254!a>created</w> <w H8064>the heavens</w>

<w H776>and the earth </w>.<br>

> Gen 1:2  <w H776>The earth</w> was <w

H8414>without form</w> <w H922>and<br>

> void</w>, and <w H2822>darkness</w> <w

H5921>was over</w> the <w<br>

> H6440>face</w> <w H8415>of the deep</w>. And

<w H7307>the Spirit</w> <w<br>

> H430>of God</w> was <w H7363!b>hovering </w>

<w H5921>over</w> the <w<br>

> H6440>face</w> <w H4325>of the waters 

</w>.<br>

><br>

> - ie the first two verses are already perfectly tagged. In fact

there aren't<br>

> any problems in Gen.1 till we get to v.5:<br>

><br>

> Gen 1:5  <w H430>God</w> <w

H7121>called</w> <w H216>the light</w> <w<br>

> H3117>Day</w>, <w H2822>and the darkness</w>

<w H7121>he called</w> <w<br>

> H3915>Night.</w>. And <w H6153>there was

evening</w> <w H1242>and there was<br>

> morningthe first</w>, <w H259>one</w> <w

H3117>day</w>.<br>

><br>

> The problem is that Word gives up making these comparisons after a

few<br>

> chapters.<br>

> Some of these problems can be cleared up by macros.<br>

><br>

> David IB<br>

><br>

> At 00:43 15/03/2011, Robert Slowley wrote:<br>

><br>

>> I think I can produce a better text to produce something which

has less to<br>

>> correct.<br>

> What do you mean?<br>

><br>

>> It would be useful to have transliterated Hebrew and a

single-word meaning<br>

>> instead of the numbers.<br>

> I have an electronic copy of the stuff you get on popups on<br>

>

<a href="http://classic.net.bible.org/verse.php?search=Genesis%201:30&book=genesis&chapter=1&verse=30" eudora="autourl">

http://classic.net.bible.org/verse.php?search=Genesis%201:30&book=genesis&chapter=1&verse=30</a>

<br>

> for Strongs already - which I was planning to integrate. If the<br>

> numbers are replaced with 'transliterated Hebrew' or a

'single-word<br>

> meaning' what specifically would that mean?<br>

><br>

> For instance on<br>

>

<a href="http://classic.net.bible.org/verse.php?search=Genesis%201:30&book=genesis&chapter=1&verse=30" eudora="autourl">

http://classic.net.bible.org/verse.php?search=Genesis%201:30&book=genesis&chapter=1&verse=30</a>

<br>

> for the strongs reference h03651, which is the transliterated

hebrew,<br>

> and which is the single word meaning?<br>

><br>

>> It would be useful to divide the top line by the tagging, not by

any<br>

>> English<br>

>> parsing<br>

>>  eg Gen.1.30  || and to every thing (h3605 )||<br>

>>   instead of     || and to every

(h3605) ||  thing (h3605 ) ||<br>

> In the case of Genesis 1:30 the text behind it is:<br>

> NASB: ... <w H3605>and to every</w> <w

H3605>thing</w> ...<br>

><br>

> Presumably there is a reason for the text to have two separate sets

of<br>

> words both tagged individually with H3605? Or is it just a

markup<br>

> error?<br>

><br>

> Presumably in some cases it words should be merged if they have

the<br>

> same strongs and are next to each other, but in other cases,

this<br>

> isn't the case, e.g. Isa 6:3<br>

>

<a href="http://classic.net.bible.org/verse.php?search=isa%206:3&book=isa&chapter=6&verse=3" eudora="autourl">

http://classic.net.bible.org/verse.php?search=isa%206:3&book=isa&chapter=6&verse=3</a>

<br>

><br>

> Has:<br>

><br>

> <w H6918>Holy</w>, <w H6918>Holy</w>, <w

H6918>Holy</w>, is the <w<br>

> H3068>Lord</w> <w H6635>of hosts</w><br>

><br>

> because the Hebrew has swdq repeated 3 times, and I assume that

the<br>

> reader who understands Strong's gets this indication by it

being<br>

> repeated rather than there being <w H6918>Holy, Holy,

Holy</w>. Is<br>

> that right?<br>

><br>

>> It might be better to have the bottom line with a separate box

for very<br>

>> word. Sometimes we will want to divide things up

differently<br>

> As I see it we have 'phrases' (a set of one or more words) which

may<br>

> have one or more strongs references. In some cases a set of words

with<br>

> have a shared strongs reference, but in other cases like Isa 6:3

sets<br>

> of contiguous words may have the same strongs references but still

be<br>

> separate 'phrases'. As I see it there's no automatically working

this<br>

> out.<br>

><br>

> What I was thinking was to have some algorithm that tries to<br>

> automatically map the NASB strongs annotations on to the ESV

text,<br>

> similar to what I have already crudely done here. That can either

try<br>

> to group things as the NASB does (where a set of contiguous

words<br>

> share a strongs reference), or do what I have done here (which

is<br>

> easier) which is to automatically group words in to a 'phrase'

where<br>

> they share the same strongs references.<br>

><br>

> Either way not all of the ESV can be automatically annotated in

this<br>

> way, the annotation will be wrong in some cases, and the

automated<br>

> grouping may be wrong in some cases. So I was thinking of making

the<br>

> interface such that once the automated grouping has been attempted

the<br>

> end user can click on a box which will make it selected, then click

on<br>

> the next box to the left or right (and so on), when this is done

a<br>

> button for "merging in to a phrase" would appear - then if

this is<br>

> clicked they would be made in to a phrase and could have their

strongs<br>

> references assigned. Alternatively clicking on a box that represents

a<br>

> phrase of one or more words will cause a "demerge" button

to appear<br>

> that will separate out all the words. This will allow the end user

to<br>

> handle both types of situation.<br>

><br>

> I also thought some sort of "This verse is tagged

correctly" button<br>

> would be good. In some cases the program will annotate everything,

but<br>

> it will still need to be checked by a human - and a human may

wish<br>

> their annotation to be checked by someone else for quality

purposes.<br>

> When a verse is marked as correct, it can have a tick or

something,<br>

> and there can be a page of "verses that need work" which

it would<br>

> automatically be removed from. Does that sound sensible?<br>

><br>

> We have easy access to the SBLGNT (with apparatus) and

Leningrad<br>

> Codex. Is it worthwhile including those for each verse? I don't

know<br>

> what process an annotator would go through, and what level of<br>

> knowledge of the original languages they would use.<br>

><br>

> I worked a bit today on tidying up the classes I've written,

and<br>

> improving the processing of the text (in the next few weeks I'll

send<br>

> you a list of the suspicious stuff I found while processing your

files<br>

> ;-) ). I'm away next week for my 1st year's anniversary holiday -

but<br>

> after that can start work on making this in to an actual web app

that<br>

> would be useful rather than a static web page demo of the sort

of<br>

> thing I had in mind.<br>

><br>

> Any thoughts / comments / ideas appreciated!<br>

><br>

> It'd probably be a good idea to see if we can improve the

automatic<br>

> annotation of the ESV from the NASB if we can, as any progress

made<br>

> here before people start manually annotating / checking will

reduce<br>

> the amount of man hours needed to complete the task.<br>

><br>

> -Rob<br>

> --<br>

>

<a href="http://www.slowley.com/" eudora="autourl">

http://www.slowley.com/</a><br>

><br>

> "On two occasions, I have been asked [by members of

Parliament],<br>

> 'Pray, Mr. Babbage, if you put into the machine wrong figures,

will<br>

> the right answers come out?' I am not able to rightly apprehend

the<br>

> kind of confusion of ideas that could provoke such a

question."<br>

> -- Charles Babbage (1791-1871)<br><br>

<br><br>

-- <br>

<a href="http://www.slowley.com/" eudora="autourl">

http://www.slowley.com/</a><br><br>

"On two occasions, I have been asked [by members of

Parliament],<br>

'Pray, Mr. Babbage, if you put into the machine wrong figures, will<br>

the right answers come out?' I am not able to rightly apprehend the<br>

kind of confusion of ideas that could provoke such a question."<br>

-- Charles Babbage (1791-1871)<br>

</blockquote></body>

</html>