<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

    <title></title>

  </head>

  <body text="#000000" bgcolor="#ffffff">

    Nice job guys.  Just a point of clarification:<br>

    <br>

    On 03/19/2011 01:04 PM, David Instone-Brewer wrote:

    <br>

    <br>

    > 4) merge the resultant text with the verb parsing in the tagged

    KJV<br>

    <br>

    I'm confused a bit about where the NASB and KJV come into play with

    your tagging efforts.<br>

    <br>

    <br>

    > Since starting this, I've heard from Troy who originally

    organised the

    team who tagged the NASB.<br>

    > He says his method is: <br>

    <br>

    We did not tag the NASB.  We tagged the KJV.  I would not use the

    NASB markup if I was doing this project, to avoid any copyright

    infringement of Lockman's data.<br>

    <br>

    <br>

    Troy<br>

    <br>

    <br>

    <br>

    <br>

    On 03/19/2011 09:54 PM, David Instone-Brewer wrote:

    <blockquote cite="mid:4d85261d.5f90d80a.73ae.3b8c@mx.google.com"

      type="cite">

      Dear Rob<br>

      <br>

      I've been doing some experiments with Gen.1 to work out a system.

      <br>

      I've found a method which works really well - the whole tagging of

      Gen.1

      has been done correctly by automatic comparisons and it has only

      gone

      wrong in a few verses. <br>

      I've tried using Stanfords parsing engine at

      <a moz-do-not-send="true"

        href="http://nlp.stanford.edu:8080/parser/">

        http://nlp.stanford.edu:8080/parser/</a> <br>

      but this didn't fix it. I've attached a file listing my

      experiments and

      their results. <br>

      <br>

      I think what would fix it is a semantic domain dictionary. What's

      happened is that the two versions are too different in v. 11: <br>

      <br>

      ESV: And God said, "Let the earth sprout vegetation, plants

      yielding seed, and fruit trees <b>bearing fruit in which is their

        seed,

        each according to its kind, on the earth</b>." And it was so.<br>

      NASB: Then God said, "Let the earth sprout vegetation: plants

      yielding seed, <i>and </i>fruit trees <b>on the earth bearing

        fruit after

        their kind with seed in them</b>"; and it was so.<br>

      <br>

      The change in order in the words in bold makes it too difficult

      for the

      comparison program to match things up. <br>

      <br>

      I think we will need humans at these points, but I think we can

      highlight

      the likely places where problems exist. <br>

      <br>

      Tomorrow I'll have a go at producing the whole text of Genesis, so

      you

      have some data to play with<br>

      <br>

      David IB<br>

      <br>

      <br>

      ============<br>

      THe process we are attempting is: <br>

      <br>

      1) convert the NASB XML text to something which looks like a

      BibleWorks

      exported text <br>

        (ie each verse on one line starting with a simple ref (eg Gen

      1:1

      In the beginning...)<br>

      <br>

      2) use the Word 2003+ text comparison tools (which are much

      superior to

      Word 97) to compare the text of both versions producing something

      like:

      <dl>

        <dd>Gen 1:2  <w H776>The earth</w> was

          <b><s>formless</s> </b><w H8414><b>formless</b></w>

          <w

          H922>and void</w>, and <w H2822>darkness

          </w> <w

          H5921>was over</w> the <w H6440><b><s>sur</s></b>face

</w>

          <w H8415>of the deep</w><b><s>, and</s> . And

          </b><w H7307>the Spirit </w> <w H430>of

          God</w>

          was <w H7363!b><b><s>moving</s> hovering </b></w>

          <w

          H5921>over</w> the <w H6440><b><s>sur</s></b>face

</w>

          <w H4325>of the waters.  </w>.<br>

          <br>

        </dd>

      </dl>

      3) create a site where human can easily correct this automatic

      markup<br>

       - eg the proof of concept

      <a moz-do-not-send="true"

        href="http://www.slowley.com/tagger-proof-of-concept/example.html">

        here</a>. <br>

      <br>

      4) merge the resultant text with the verb parsing in the tagged

      KJV<br>

      <br>

      Since starting this, I've heard from Troy who originally organised

      the

      team who tagged the NASB. He says his method is: <br>

      <br>

      <dl>

        <dd>1) starts with a lemma tagged text, the KJV, and CrossWay's

          ESV data

          in OSIS format. </dd>

        <dd>2) the ESV module is iterated each verse at a time and is

          processed

          as such: </dd>

        <dd>3) the OSIS markup is stripped from the ESV text and

          positioning

          information is retained </dd>

        <dd>4) a word table is built from the KJV text: </dd>

        <dd>       KJV Word 1   

          |    Strongs # </dd>

        <dd>       KJV Word 2   

          |    Strongs #    </dd>

        <dd>5) a second table is build from the ESV text: </dd>

        <dd>       ESV Word 1    | </dd>

        <dd>       ESV Word 2    | </dd>

        <dd>6) these tables are passed to a function which is

          responsible solely

          for the logic to fill in the second part of the second table

          with

          Strong's numbers. </dd>

        <dd>7) the returned table is used to reconstitute the the OSIS

          tags to

          the ESV text including word-level Strong's markup. </dd>

        <dd>See a screenshot for the community collaboration tool for

          KJV Strongs

          markup project is at

          <a moz-do-not-send="true"

            href="http://crosswire.org/sword/kjv2003/#ss">

            http://crosswire.org/sword/kjv2003/#ss</a> </dd>

        <dd>We're hoping to convert it to a web application instead of a

          standalone Java GUI, but that hasn't happened yet.

        </dd>

        <dd>I'd love to work together on this effort.  Please keep me

          posted

          on any progress and let me know if I can help in anyway.

        </dd>

        <dd>Troy<br>

          <br>

          <br>

          <br>

          <br>

          <br>

        </dd>

      </dl>

      At 10:18 17/03/2011, Robert Slowley wrote:<br>

      <blockquote type="cite" class="cite" cite="">So, presumably if you

        could

        script it to break each chapter in to a<br>

        separate file, do the comparisons, and then re-export as a

        single

        file<br>

        we could import that in to a tool like mine so a human could fix

        the<br>

        errors and do the bits the auto-comparison failed to do.<br>

        <br>

        On Tue, Mar 15, 2011 at 8:19 AM, David Instone-Brewer<br>

        <a class="moz-txt-link-rfc2396E" href="mailto:davidinstonebrewer@gmail.com"><davidinstonebrewer@gmail.com></a> wrote:<br>

        > From the automatic comparisons produced by Word, we get:<br>

        ><br>

        > Gen 1:1  <w H7225>In the beginning,</w> <w

        H430>God</w> <w<br>

        > H1254!a>created</w> <w H8064>the

        heavens</w>

        <w H776>and the earth </w>.<br>

        > Gen 1:2  <w H776>The earth</w> was <w

        H8414>without form</w> <w H922>and<br>

        > void</w>, and <w H2822>darkness</w> <w

        H5921>was over</w> the <w<br>

        > H6440>face</w> <w H8415>of the

        deep</w>. And

        <w H7307>the Spirit</w> <w<br>

        > H430>of God</w> was <w H7363!b>hovering

        </w>

        <w H5921>over</w> the <w<br>

        > H6440>face</w> <w H4325>of the waters 

        </w>.<br>

        ><br>

        > - ie the first two verses are already perfectly tagged. In

        fact

        there aren't<br>

        > any problems in Gen.1 till we get to v.5:<br>

        ><br>

        > Gen 1:5  <w H430>God</w> <w

        H7121>called</w> <w H216>the light</w>

        <w<br>

        > H3117>Day</w>, <w H2822>and the

        darkness</w>

        <w H7121>he called</w> <w<br>

        > H3915>Night.</w>. And <w H6153>there was

        evening</w> <w H1242>and there was<br>

        > morningthe first</w>, <w H259>one</w>

        <w

        H3117>day</w>.<br>

        ><br>

        > The problem is that Word gives up making these comparisons

        after a

        few<br>

        > chapters.<br>

        > Some of these problems can be cleared up by macros.<br>

        ><br>

        > David IB<br>

        ><br>

        > At 00:43 15/03/2011, Robert Slowley wrote:<br>

        ><br>

        >> I think I can produce a better text to produce

        something which

        has less to<br>

        >> correct.<br>

        > What do you mean?<br>

        ><br>

        >> It would be useful to have transliterated Hebrew and a

        single-word meaning<br>

        >> instead of the numbers.<br>

        > I have an electronic copy of the stuff you get on popups on<br>

        >

        <a moz-do-not-send="true"

href="http://classic.net.bible.org/verse.php?search=Genesis%201:30&book=genesis&chapter=1&verse=30"

          eudora="autourl">

http://classic.net.bible.org/verse.php?search=Genesis%201:30&book=genesis&chapter=1&verse=30</a>

        <br>

        > for Strongs already - which I was planning to integrate. If

        the<br>

        > numbers are replaced with 'transliterated Hebrew' or a

        'single-word<br>

        > meaning' what specifically would that mean?<br>

        ><br>

        > For instance on<br>

        >

        <a moz-do-not-send="true"

href="http://classic.net.bible.org/verse.php?search=Genesis%201:30&book=genesis&chapter=1&verse=30"

          eudora="autourl">

http://classic.net.bible.org/verse.php?search=Genesis%201:30&book=genesis&chapter=1&verse=30</a>

        <br>

        > for the strongs reference h03651, which is the

        transliterated

        hebrew,<br>

        > and which is the single word meaning?<br>

        ><br>

        >> It would be useful to divide the top line by the

        tagging, not by

        any<br>

        >> English<br>

        >> parsing<br>

        >>  eg Gen.1.30  || and to every thing (h3605 )||<br>

        >>   instead of     || and to every

        (h3605) ||  thing (h3605 ) ||<br>

        > In the case of Genesis 1:30 the text behind it is:<br>

        > NASB: ... <w H3605>and to every</w> <w

        H3605>thing</w> ...<br>

        ><br>

        > Presumably there is a reason for the text to have two

        separate sets

        of<br>

        > words both tagged individually with H3605? Or is it just a

        markup<br>

        > error?<br>

        ><br>

        > Presumably in some cases it words should be merged if they

        have

        the<br>

        > same strongs and are next to each other, but in other

        cases,

        this<br>

        > isn't the case, e.g. Isa 6:3<br>

        >

        <a moz-do-not-send="true"

href="http://classic.net.bible.org/verse.php?search=isa%206:3&book=isa&chapter=6&verse=3"

          eudora="autourl">

http://classic.net.bible.org/verse.php?search=isa%206:3&book=isa&chapter=6&verse=3</a>

        <br>

        ><br>

        > Has:<br>

        ><br>

        > <w H6918>Holy</w>, <w

        H6918>Holy</w>, <w

        H6918>Holy</w>, is the <w<br>

        > H3068>Lord</w> <w H6635>of hosts</w><br>

        ><br>

        > because the Hebrew has swdq repeated 3 times, and I assume

        that

        the<br>

        > reader who understands Strong's gets this indication by it

        being<br>

        > repeated rather than there being <w H6918>Holy, Holy,

        Holy</w>. Is<br>

        > that right?<br>

        ><br>

        >> It might be better to have the bottom line with a

        separate box

        for very<br>

        >> word. Sometimes we will want to divide things up

        differently<br>

        > As I see it we have 'phrases' (a set of one or more words)

        which

        may<br>

        > have one or more strongs references. In some cases a set of

        words

        with<br>

        > have a shared strongs reference, but in other cases like

        Isa 6:3

        sets<br>

        > of contiguous words may have the same strongs references

        but still

        be<br>

        > separate 'phrases'. As I see it there's no automatically

        working

        this<br>

        > out.<br>

        ><br>

        > What I was thinking was to have some algorithm that tries

        to<br>

        > automatically map the NASB strongs annotations on to the

        ESV

        text,<br>

        > similar to what I have already crudely done here. That can

        either

        try<br>

        > to group things as the NASB does (where a set of contiguous

        words<br>

        > share a strongs reference), or do what I have done here

        (which

        is<br>

        > easier) which is to automatically group words in to a

        'phrase'

        where<br>

        > they share the same strongs references.<br>

        ><br>

        > Either way not all of the ESV can be automatically

        annotated in

        this<br>

        > way, the annotation will be wrong in some cases, and the

        automated<br>

        > grouping may be wrong in some cases. So I was thinking of

        making

        the<br>

        > interface such that once the automated grouping has been

        attempted

        the<br>

        > end user can click on a box which will make it selected,

        then click

        on<br>

        > the next box to the left or right (and so on), when this is

        done

        a<br>

        > button for "merging in to a phrase" would appear - then if

        this is<br>

        > clicked they would be made in to a phrase and could have

        their

        strongs<br>

        > references assigned. Alternatively clicking on a box that

        represents

        a<br>

        > phrase of one or more words will cause a "demerge" button

        to appear<br>

        > that will separate out all the words. This will allow the

        end user

        to<br>

        > handle both types of situation.<br>

        ><br>

        > I also thought some sort of "This verse is tagged

        correctly" button<br>

        > would be good. In some cases the program will annotate

        everything,

        but<br>

        > it will still need to be checked by a human - and a human

        may

        wish<br>

        > their annotation to be checked by someone else for quality

        purposes.<br>

        > When a verse is marked as correct, it can have a tick or

        something,<br>

        > and there can be a page of "verses that need work" which

        it would<br>

        > automatically be removed from. Does that sound sensible?<br>

        ><br>

        > We have easy access to the SBLGNT (with apparatus) and

        Leningrad<br>

        > Codex. Is it worthwhile including those for each verse? I

        don't

        know<br>

        > what process an annotator would go through, and what level

        of<br>

        > knowledge of the original languages they would use.<br>

        ><br>

        > I worked a bit today on tidying up the classes I've

        written,

        and<br>

        > improving the processing of the text (in the next few weeks

        I'll

        send<br>

        > you a list of the suspicious stuff I found while processing

        your

        files<br>

        > ;-) ). I'm away next week for my 1st year's anniversary

        holiday -

        but<br>

        > after that can start work on making this in to an actual

        web app

        that<br>

        > would be useful rather than a static web page demo of the

        sort

        of<br>

        > thing I had in mind.<br>

        ><br>

        > Any thoughts / comments / ideas appreciated!<br>

        ><br>

        > It'd probably be a good idea to see if we can improve the

        automatic<br>

        > annotation of the ESV from the NASB if we can, as any

        progress

        made<br>

        > here before people start manually annotating / checking

        will

        reduce<br>

        > the amount of man hours needed to complete the task.<br>

        ><br>

        > -Rob<br>

        > --<br>

        >

        <a moz-do-not-send="true" href="http://www.slowley.com/"

          eudora="autourl">

          http://www.slowley.com/</a><br>

        ><br>

        > "On two occasions, I have been asked [by members of

        Parliament],<br>

        > 'Pray, Mr. Babbage, if you put into the machine wrong

        figures,

        will<br>

        > the right answers come out?' I am not able to rightly

        apprehend

        the<br>

        > kind of confusion of ideas that could provoke such a

        question."<br>

        > -- Charles Babbage (1791-1871)<br>

        <br>

        <br>

        <br>

        -- <br>

        <a moz-do-not-send="true" href="http://www.slowley.com/"

          eudora="autourl">

          http://www.slowley.com/</a><br>

        <br>

        "On two occasions, I have been asked [by members of

        Parliament],<br>

        'Pray, Mr. Babbage, if you put into the machine wrong figures,

        will<br>

        the right answers come out?' I am not able to rightly apprehend

        the<br>

        kind of confusion of ideas that could provoke such a question."<br>

        -- Charles Babbage (1791-1871)</blockquote>

      <pre wrap="">

<fieldset class="mimeAttachmentHeader"></fieldset>

_______________________________________________

tyndale-devel mailing list

<a class="moz-txt-link-abbreviated" href="mailto:tyndale-devel@crosswire.org">tyndale-devel@crosswire.org</a>

<a class="moz-txt-link-freetext" href="http://www.crosswire.org/mailman/listinfo/tyndale-devel">http://www.crosswire.org/mailman/listinfo/tyndale-devel</a>

</pre>

    </blockquote>

    <br>

  </body>

</html>