[sword-devel] verse parsing

Sat Mar 25 20:55:36 MST 2006

Troy,
My 2 cents.

I see this as a mapping of an external name (i.e. what the user know the 
verse as) and an internal name (what the engine knows the verse as).
As you pointed out there are a whole host of issues with taking user 
input and deciphering it into the internal name. Especially when you 
allow pretty sophisticated ranges and have taught the user that we can 
guess what they mean with great accuracy.

I think that there are two basic functions that the engine needs to provide:
    translation from the user input into a verse key
    translation from a verse key to a external representation that is 
appropriate for the work.

Today with the KJV, we number each book from Gen to Rev, in the order 
that it appears in the KJV.
Then we know how many chapters are in each book and how many verses are 
in each book.
We also know the ordinal value for each verse or can compute it readily.

But today, we assume that the internal and external names for chapters 
and verse are the same. In your example, they are not the same for 
verses. Everything is fine until verse 31, but after that we have 31a, 
32, 33, 33a, 33b, and 33c. Positionally these are 32, 33, 34, 35, 36, 
and 37.
What is needed is a mapping function External <=> Internal, 31a <=> 32, ....

I think you have pointed out that all the UIs will need to change no 
matter what. If that is the case, then perhaps you can increase the 
number of methods in VerseKey.
Essentially you would add
String getVerseName()
String getChapterName()

so
int x = vk.Verse()
would give 37 for 33c. That is the offset from the beginning of the chapter.
char* s = vk.getVerseName()
would give 33c.

As to parsing ranges, I think you may need another algorithm. The 
current one assumes a lot about KJV versification and its traditions, 
such as using roman numerals as in II Sam 2:1 and using numbers for 
chapters and verses.

Having fixed bugs in JSword's parsing of user input, I know how 
difficult it is.
As the following is allowed
B[[.C].V][-([[B.]C.]V] | B.[C[.V]]])
where B, C and V stand for book, chapter and verse respectively. And - 
represents the set of all allowable range indicators and . represents 
the set of allowable part separators (and the separators between B & C 
may be different than between C & V).
(I think this is at least close)
And as a shortcut "ff" is allowed, which means to the end of the parent 
unit.

This becomes difficult with multipart book names and book names that 
begin with numbers and where those numbers can be roman numerals or 
digits. As these cause the code to have to do a look ahead or look 
behind to determine whether it is the prefix or suffix of a book name.

If the code is going to be generally useful for a BCV kind of scheme, 
where C and V may not be integers, then it will require a new algorithm. 
So, when it is not KJV, it uses the newer one.

I would suggest that we add a V11N= key to the conf with the default of 
KJV. This could be used to get the appropriate algorithm.

Hope this helps,
    DM

P.S. My solution from the other day skirted this issue.

Troy A. Griffitts wrote:
> Hey guys (especially frontend writers),
>
>     I've been working on providing a VerseKey key interface for 
> traversing modules like the LXXM:
>
> http://crosswire.org/study/bookdisplay.jsp?mod=LXXM&gbsEntry=%2FJoshB%2F24%2F1 
>
>
>     I'm having some difficulty fitting this into the exposed VerseKey 
> interface.
>
>     Obviously, my goal was to save everyone as much modification as 
> possible, but there just doesn't seem like there is a good fit for 
> modules like these.
>
>     Here's a little background of what I was trying and were I ran 
> into troubles, and why I've come to this conclusion:
>
> First, I attempted to redo this module using OSIS book names for 
> everything, and discovered that there just wasn't a nice book list we 
> could display to the user.  For example, JoshB (from the link above) 
> seems to be the standard book of Joshua we'd all expect, but then 
> JoshA (browse to it using the left index) contains 3 chapters: 15, 18, 
> 19  Not sure exactly what these are, but I'm guessing they are 
> replacements or additions to Joshua or some other book.  Actually, I 
> just have no idea.
>
> The next thing I began to realize is that this module uses a,b,c type 
> suffixes on verses (click on the first link in this email again and 
> scroll to the bottom of the page).  This does not fit nicely into our 
> integer concept for verses.  I considered adding a 5th level: 
> Testament/Book/Chapter/Verse/Sub.  But this really breaks the whole 
> paradigm anyway, as sub will mostly be blank except when there might 
> be a letter tacked to the end.  It really doesn't solve any problems, 
> e.g. key.Verse(key.Verse()+1) still will break.  key++ would work, I 
> guess, but you'd have to always check if Sub was set to anything.  And 
> who knows what Sub really means.  Is it a replacement?  Is it really a 
> subdivision of the verse?  It just doesn't seem like it solves any 
> problems nicely.  It seems like the LXX really is sequentially 31, 
> 31a, 32, 33, 33a, 33b.  When I know that other Bibles and commentaries 
> mean the first part of 33 when they say 33a.  So adding Sub doesn't 
> seem like it gives us much except keeping Verse an integer.
>
>
>
>     So, I have a few ideas, and would like to hear from you.
>
> Basically, I think the way we present and display the LXXM with 
> swordweb (the link above) is actually pretty ok.  There are a few 
> deficiencies:
>
> The 'reference' is display like:
>
> /JoshB/24/1
>
> We could add a flag which says to display using a BK CH:VS format.  I 
> was thinking about adding a pattern, like letting the modules.conf 
> file specify something like:
> KeyDisplay=%1 %2:%3
> but I think this is more work for everyone than it benefits.  Besides, 
> other languages probably prefer other formats (BK CH.VS).  So I think 
> we'd like to just say something like KeyFormat=BCV
>
> The other problem is parsing...
> Currently VerseKey provides all the nice parsing functionality that 
> figures out:
>
> Ijn2-3:12
>
> It can do this because it has a set of books that it know about, along 
> with all kinds of abbreviations and translated into a number of 
> languages.  Our current parser also drops suffixed letters.
>
>
> Finally, if we solve these problems, and place an entry in LXXM: 
> Category=Biblical Texts, it will probably break most frontends which 
> expect all Biblical Texts to use a VerseKey.  I don't know how to 
> solve this problem.
>
> I also considered a major change to VerseKey which would make all 
> levels strings and not integers.  I realize many frontends use integer 
> spin controls to increase/decrease chapter and verse.  There may also 
> be linear logic regarding these things.
>
> I guess the real question is, would it be easier for everyone to add 
> parsing and display support to treekey and leave versekey alone?  This 
> is the direction I'm leaning right now.  Any thoughts to sway me would 
> be appreciated.
>
>
>     -Troy.
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
>
>