[sword-devel] ESV Markup Challenge

Greg Hellings greg.hellings at gmail.com
Thu Sep 11 17:56:01 MST 2008

Sorry, I attached a version of the tarball that had the executable in
it and the list moderation caught it.  Here's the cleaned version.
See the detailed summary below.


On Thu, Sep 11, 2008 at 7:52 PM, Greg Hellings <greg.hellings at gmail.com> wrote:
> Troy,
> The task that I'm currently working on as research for my dissertation
> can possibly be leveraged.  We are attempting to sort out image
> annotations (in an effort to learn how to automatically create them).
> As such, we are given a list of terms which annotate the contents of
> an image - but we want to know how similar the semantics of some of
> the terms are.  Here is where I think parallels can be drawn:
> We use established semantic relatedness measurement techniques (see
> wn-similarity.sourceforge.net for some of the best tools currently
> available for that) to construct a graph connecting each term with all
> the other annotating terms, where the edge weight of the graph is the
> value of the average over all of the semantic measures that the
> WordNet Similarity measure returns (in time we will take a weighted
> average with all the values normalized between [0..1], since some
> measures only scale from [0..1/2] and others can take values up to
> 16,000 and more).  We then do some strange graph partitioning tricks,
> etc -- that's someone else's domain.
> However, you could possibly utilize the following modification of the
> technique.  For each term in the ESV, find the similarity between it
> and every term in the KJV.  If they are identical, set the value to 1,
> otherwise, use the WordNet::Similarity tools to produce a value.  Then
> weight the value of the link by their relative positions in the text
> (that way two occurrences of the same term can be differentiated), for
> example, divide by abs(position(ESV) - position(KJV)) or something
> similar.  Then assign the value for each term based on the word that
> it most closely resembles.
> This is very similar to what you're already doing, but not identical.
> I have modified the esvtag.cpp to use the included similarity.py to
> get the semantic distance from a few of the metrics that
> WordNet::Similarity uses (however, it scrapes a webpage to do so - you
> will do better, if you decide to use this system, to install the local
> Perl data and run the system locally) whenever the terms are not
> identical.  It continues to work for Gen 1:1, the program pegs out my
> processor and does not appear to have any intention of completing Gen
> 1:2 -- I don't know where the fault for that lies, but it does that
> both in your original version and in this version.  Obviously, the
> weighting I proposed would work best when the version being used
> maintains very similar phrase ordering and structuring to the KJV, but
> I suppose any metric we use will require human supervision anyway.
> As a bonus, I also have it sticking contiguous terms which are part of
> the same source -- "In the beginning" -- into the same <w> tag.
> --Greg
> P.S. The attached tarball will clobber any current esvtag directory
> that's a child of where you unpack it - so be careful about that.
> On Thu, Sep 11, 2008 at 4:02 PM, Troy A. Griffitts <scribe at crosswire.org> wrote:
>> Hey guys.  I have a fun and useful challenge for anyone wishing to show off
>> their prowess at problem solving and basic world domination.
>> We have morphological data for the KJV.  Lots of work by many people went
>> into this data, to markup each English word in the Bible text to the
>> corresponding Hebrew or Greek word in the original text.
>> We have many other Bibles with /similar/ wording to the KJV which are not
>> yet marked up.
>> Lane Dennis from Crossway (ESV publishers) is here at Tyndale House visiting
>> and we've talked in the past about helping them markup their ESV text to the
>> original.
>> I have done most all of the grunt work for you!
>> Attached is source for a program which attempts to insert <w> markup into
>> the ESV markup using the KJV data.
>> It is HEAVILY commented, requires latest SVN of the SWORD engine INSTALLED
>> on your system, both the KJV and ESV modules INSTALLED, and has an nice
>> little method:
>> void matchWords(...)
>> where you're given:
>> a word list from ESV
>> a word list from KJV
>> a map from KJV word to an XMLTag "<w...>"
>> and all you have to do is fill out the equivalent:
>> map from ESV word to an XMLTag.
>> As a sample, it current has a really silly algorithm that actually works for
>> Gen.1.1, so you have an example of the work you need to do.
>> All you have to do is add the real magic that figures out which words in the
>> ESV map to which words in the KJV (well, you get the idea).
>> Have fun!  And I'm sure you can see where this is going and how useful it
>> can be for future work!
>>        -Troy.
>> _______________________________________________
>> sword-devel mailing list: sword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
-------------- next part --------------
A non-text attachment was scrubbed...
Name: esvtag.tar.gz
Type: application/x-gzip
Size: 4978 bytes
Desc: not available
Url : http://www.crosswire.org/pipermail/sword-devel/attachments/20080911/1ec2ef69/attachment.gz 

More information about the sword-devel mailing list