mvnForum Homepage

Print at Nov 28, 2014 2:45:10 PM View all posts in this thread on one page
Posted by ChuckMcKnight at Feb 12, 2010 9:15:50 PM
cross-comparing parallel Bible tool idea
Whenever I do an in-depth study of any portion of the Bible, I like to compare multiple different versions. The best way to do this is through a parallel bible tool. The problem is, none of them that I've found seem to do it very well.

Xiphos comes pretty close, in that it has the verses stacked right on top of each other, so you can pretty much look between the verses and see the differences, but it could be so much better.

On the other hand, BibleDesktop has a pretty neat comparison feature that actually marks the differences, but it can only do two versions in one comparison, and it's not the most easily readable solution.

So, I brainstormed for a while, and I believe I have found a way of displaying the verses that makes it really easy for cross-comparisons.

You can view my sample mock-up here .

It would work by analyzing the different versions word-by-word, finding which words match, and lining all the words up truly parallel to each other. Then it changes the color of words that differ to make them stand out better. The words used by the majority of the selected versions would simply remain black.

It should be able to do other little things like pointing out capitalization or punctuation differences as well.

Preferably, the versions should be selectable and arrangeable by the user, as opposed to most other parallel tools that simply grab all the installed modules and order them alphabetically. Also, the colors should be user-selectable.

This is my concept. The problem is, while I am a computer guy, I'm not really much of a programmer. I can do some very basic stuff, but a project like this would be quite a bit beyond me.

So, if there's any interest, I would have to ask for some developers here to work on it.

This could be done either as a stand-alone program (preferred) or as an add-on to one of the other Sword-based programs.

BTW, the fact that the mock-up is html does not mean I think it should be a browser-based program. That was just the easiest way for me to demonstrate it.

Anywho, that's the idea. I think it would be an awesome tool if anyone would be interested in developing it. Feel free to ask any questions you might have.

Thanks for any help you might be able to offer!

Posted by mdbergmann at Feb 14, 2010 11:59:10 AM
Re: cross-comparing parallel Bible tool idea
I think that's a good idea.
It could be done for example as a library/framework which can be reused in a stand-alone program or integrated in our bible tools.

How far could you go in building an algorithm that would do this?


Manfred

Posted by ChuckMcKnight at Feb 14, 2010 1:26:34 PM
Re: cross-comparing parallel Bible tool idea
If by building an algorithm you just mean specifying the exact rules to arrange the words, I should be able to do all that pretty well; however, I don't know any C++ (I believe that is the language Sword programs are written in), so I would have to basically do it in pseudo-code and let someone else actually code it.

Posted by mdbergmann at Feb 15, 2010 1:57:07 AM
Re: cross-comparing parallel Bible tool idea
Yeah, pseudo-code would be totally fine I guess.
We can then code it in which ever language.

The SWORD library/engine is written in C++. The front-ends however that use this library are written in various languages (C, C++, Objective-C, Python, maybe others). We also have a SWORD engine based on Java called JSword where BibleDesktop or Alkitab are the most well known front-ends.


Manfred

Posted by jonmmorgan at Feb 15, 2010 4:49:48 AM
Re: cross-comparing parallel Bible tool idea
It's certainly an interesting idea, and you look like you have thought it out. I have entered it in the BPBible issue tracker as a reminder to myself with comments, though as it's marked Wish List it is really not going to happen anytime soon and may well never happen (http://code.google.com/p/bpbible/issues/detail?id=122).

A few comments:
1. I'm not sure how well this layout would work if any of the lines of text had to be wrapped, since it relies on comparing lines above and below. A lot of people, including me, would only have verse comparison taking up a small part of the screen, and so this would happen frequently.

2. I'm not sure how well any version comparison works if the texts are not from a reasonably similar textual tradition. I imagine for example that RSV and ESV would work quite well (as the AV and RV work in the traditional Interlinear Bible). ESV to KJV would probably work OK too. Once you have very few similar words to link it is unlikely to be very helpful (reading the full text of verses stacked as in Xiphos would be more helpful). I'm not sure how well the NET would compare with the ESV, for example, and as for comparing the Message with almost anything else it's not even worth thinking about.

3. The file you link to appears to have disappeared. Is this intentional?

4. Another possibly relevant suggestion is to to borrow the style of the 26 Version New Testament, which does it on a phrase by phrase basis and selects readings from a few versions that it thinks gives the most significant variations. This is not as complete as actually comparing all 26 versions, but it is probably easier to read and more useful as well. However, I'm not sure how well you could determine significant variations of wording in software, so I suspect such a thing would have to be done by hand (and it would probably have considerable copyright implications and require lots of publisher permission if you chose to include copyrighted versions in the comparison).

Posted by ChuckMcKnight at Feb 15, 2010 5:54:28 AM
Re: cross-comparing parallel Bible tool idea
It's certainly an interesting idea, and you look like you have thought it out. I have entered it in the BPBible issue tracker as a reminder to myself with comments, though as it's marked Wish List it is really not going to happen anytime soon and may well never happen (http://code.google.com/p/bpbible/issues/detail?id=122).

Thanks!

A few comments:
1. I'm not sure how well this layout would work if any of the lines of text had to be wrapped, since it relies on comparing lines above and below. A lot of people, including me, would only have verse comparison taking up a small part of the screen, and so this would happen frequently.

There are two possibilities I can think of to fix this. The most preferable would be to simply wrap all lines starting with the word that goes past the screen. I'm sure this would take a little extra programming to figure out, but since all the words are analyzed individually already, I would think it would be doable.

Alternatively, we could always just add a scroll-bar. It's not the most convenient solution, but it would work, and it should be pretty easy to implement.

2. I'm not sure how well any version comparison works if the texts are not from a reasonably similar textual tradition. I imagine for example that RSV and ESV would work quite well (as the AV and RV work in the traditional Interlinear Bible). ESV to KJV would probably work OK too. Once you have very few similar words to link it is unlikely to be very helpful (reading the full text of verses stacked as in Xiphos would be more helpful). I'm not sure how well the NET would compare with the ESV, for example, and as for comparing the Message with almost anything else it's not even worth thinking about.

That is true. This is primarily intended for formal equivalence translations. Some of the in-betweens such as NET would still be likely to work, while others like the NIV are quite a bit less likely. Dynamic equivalence translations such as the aforementioned Message almost certainly would not. That is simply a limitation with the tool that I would expect due the nature of how it works.

3. The file you link to appears to have disappeared. Is this intentional?

Hmm, it looks like it's still there for me. Try a hard-refresh?

4. Another possibly relevant suggestion is to to borrow the style of the 26 Version New Testament, which does it on a phrase by phrase basis and selects readings from a few versions that it thinks gives the most significant variations. This is not as complete as actually comparing all 26 versions, but it is probably easier to read and more useful as well. However, I'm not sure how well you could determine significant variations of wording in software, so I suspect such a thing would have to be done by hand (and it would probably have considerable copyright implications and require lots of publisher permission if you chose to include copyrighted versions in the comparison).


Interesting. I had not heard of the 26 Version New Testament. I will have to look that up some time.

I have started work on the algorithm now. I will post it once it is complete. (Hopefully my classes will allow me to finish it relatively quickly. tongue )

Posted by ChuckMcKnight at Feb 16, 2010 2:02:35 PM
Re: cross-comparing parallel Bible tool idea
Still working on getting the algorithm sorted out.

Meanwhile, I've added another sample to the mock-up page to demonstrate what a wrapped verse would look like. (I believe the longest verse in the Bible is called for here.)

The sample is a fake fixed-wrap, but hopefully the final product could be dynamic. If nothing else, we could just have the wrap points computed based on the size of the window at the time, and if the window is re-sized it could just be recomputed.

BTW, is the sample working for you now, jonmmorgan? I use a pretty cheap web hosting company, and their downtime can be bad at times, so it might just have been down when you tried to check it before.

Posted by ChuckMcKnight at Feb 20, 2010 5:55:29 PM
Re: cross-comparing parallel Bible tool idea
Sorry it took me so long to get back to this. As mentioned earlier, my classes are keeping me pretty busy right now.

Anywho, here's the first rough draft of the algorithm. I ended up not doing it in pseudo-code per se, but hopefully it should still be understandable. If anything seems unclear, just ask and I'll try to expound on it.

I also updated the samples page once again, this time actually running through the algorithm I made (previously, I had done the sample page by just looking at the verses and figuring out what seemed to match). The Romans 8 passage stayed exactly the same, and Genesis 2:16 had a slight change in structure, but still works just as well. Esther 8:9, however, had a bit of a hiccup (now underlined in red).

The algorithm will probably need to be adjusted a bit for such cases, and I'm sure we'll probably find quite a few more such anomalies, so this is definately very much a rough draft algorithm.

With all that said, here you go:


Take the verses for each version, and sort them into lists of words - probably delimited by the space character.

Start with just the lists from the first two versions.

Compare the first word from the second version with the first word of the first version to see if they are equivalent.*

- If they are equivalent, move on to the next word in both versions.

- If they are not equivalent, compare the first word from the second version with the second word from the first version to see if they are equivalent.

- - If they are equivalent, move all the words in the second version down the list by one, and insert a filler in the first slot.

- - If they are not equivalent, compare the second word from the second version with the first word from the first version to see if they are equivalent.

- - - If they are equivalent, move all the words in the first version down the list by one, and insert a filler in the first slot.

- - - If they are not equivalent, compare the first word from the second version with the third word from the first version...

- - - - Next compare third from second with first from first, then first from second with fourth from first, then fourth from second with first from first, etc.

- - - - - If, after comparing all the way through verses, there is no equivalent match, simply leave the first words of both versions together and move onto the next word from both versions.

[Post too long. Continued next.]

Posted by ChuckMcKnight at Feb 20, 2010 5:55:56 PM
Re: cross-comparing parallel Bible tool idea
[Post continued here.]

Move to the second word in both versions and repeat the above process.

- Continue moving through the verse in this manner until all words have been compared.

Once the first two versions have been compared and matched, add the third version.

- Repeat a similar process to the above, only comparing the third version with both the first two versions, based on their newly spaced out lists.

- If a word in the third version is equivalent with either the word in the first or second version, it still counts as a match.

Continue repeating this process until all the versions are compared and matched.

With all the versions compared and matched, look for differences to highlight (change to the color selected for that version).

- If all versions use the exact same word with the same punctuation and capitalization, leave the word black (or whatever the default color is).

- When words, punctuation, or capitalization do not all match, check if any of the versions do match for that word.

- - If there is a majority for one word, punctuation, or capitalization, leave that type black and highlight all other types.

- - If there are no matches or if there is an equal ammount of matches so that there is no one majority, highlight the difference in all versions.

- Check seperately for words, punctuation or capitalization.

- - Where the difference is the whole word, highlight the whole word, but the punctuation might stay black if it is the same across the versions or in a majority.

- - If the word itself is the same but punctuation is different, highlight the punctuation itself that is not in the majority.

- - If the word itself is the same but capitalization is different, highlight the first letter of words that are not in the majority.


*Determining equivalence where the word is not the same will probably be the more difficult part of the project.

The most effective way I can think to do it would be to have a thesaurus of possible equivalent words. For example: {you, thou, thee, ye, ...}, {LORD, Jehovah, Yahweh, ...}, {servant, bond-servant, bond-slave, slave, ...}, ...

The problem is that this could probably take quite a while to compile. If such a tool already exists somewhere and could be borrowed for this project, that would be great. Otherwise it might just have to be a slowly growing list that is compiled as people find more words that are equivalent. Any suggestions for how to solve this problem more efficiently would be greatly appreciated.

Posted by joshaven at Nov 30, 2010 6:59:20 PM
Re: cross-comparing parallel Bible tool idea
I've given this idea a bit of thought in the past. I would like to cross-compare Greek & an english translation... I though of using the strong's reference numbers as a join table so to speak...

After considering the issues jonmmorgan raised I wondered if the ideal way to stack sentences is not based upon a common reference point, strong's numbers (with greek) or same word (between same language), but rather based upon parts of speech.

This would require a method of identifying and align each part of speech rather then exact word matches. This would require a lookup table to find the part of speech for each word.

Aligning by parts of speech would allow for a yet unknown (at least to me) way of cross-comparing parallel verses... for instance you could cross-compare two of the same stories from the gospels within the same translation or even add more translations. Or compare two different verses on the same subject. This could be taken even further to compare completely different ideas.

If you aligned by parts of speech you could properly align text even in different languages which would not work with a word matching algorithm.