[jsword-devel] [sword-devel] Method to find if BibleBook is contained in a Book

Martin Denham mjdenham at gmail.com
Tue Apr 15 01:43:45 MST 2014


I took a stab at this
here<https://github.com/mjdenham/and-bible/blob/development/AndBible/src/net/bible/android/control/navigation/DocumentBibleBooks.java>.
 It was elegant until I catered for IBT module anomalies.

My initial experiments seem to show it works really well in being fast and
giving a quick 'heads-up' regarding which Bible books are in a module which
is useful not only for partial dc support which seems the norm, but also
for partial Bibles and commentaries e.g. NT only or developing modules.

I have integrated this into the Passage selector and also page prev/next.

Cheers
Martin


On 14 April 2014 23:50, DM Smith <dmsmith at crosswire.org> wrote:

> It still is manual. I think there's a fairly optimal way to compute this,
> but it is not perfect.
>
> The problem is that a module does not have to be laid down in order.
> Osis2mod has an "append" flag that allows for additional material to be
> appended to a module. This is useful for doing a book at a time. It it also
> useful to fix a verse and append the fix to the module. Both the old and
> the new will be in the module but only the new will be in the index.
>
> Also, if the module has books, chapters or verses out of order, these will
> be reassembled into the right order (it is the nature of the index file),
> but the data files will have the content in the order that is in the module.
>
> The following is true about the index and data files:
> Each verse in the data file is laid down in the order that it is read from
> the input file.
> The index contains the start of each verse in the data file.
> There are separate index files for the OT and the NT. DC when present is
> in one or the other.
>
> If the data is laid down in the proper order then we can use that
> knowledge to figure out if the book or chapter has content.
> The difference between the starts of the books (or chapters) can be used
> to guess what is present. For example, if Genesis has a start of 10 and an
> end of 4000, Exodus has a start and end of 0, and Lev has a start of 4000
> and end of 10000, then we can guess that Genesis and Lev exist but Exodus
> does not.
>
> Alternatively other sample points could be used. E.g. middle of the
> chapters.
>
> This is only a heuristic.
>
> We can also note that the OT files don't exist or the data file has 0
> size, then the module is NT alone. Or the other way around.
>
> I do think we need to make the module's conf be "immutable" as downloaded,
> but have a "sidecar" conf file with settings we want to have. I think once
> computed, it should be stored there. Maybe it can be computed on the server
> and stored there for download.
>
> -- dm
>
>
> On Apr 14, 2014, at 4:42 PM, Chris Burrell <christopher at burrell.me.uk>
> wrote:
>
> Hi
>
> What's the latest on this? At the moment, STEP looks up auto-suggestions
> based on versifications but this is annoying for Greek texts that do offer
> the OT, but the OSMHB (OSHB) or WLC don't.
>
> What I'm really looking for is to query a book for it's BibleBooks, rather
> than have to rely on the Versification. The versification is not great from
> that point of view. It tells the frontend what might be in the book, rather
> than what is in the book.
>
> If there's nothing there at the moment, I could settle for:
> 1. calculate once and store scope (as an OSIS, or read it from conf file).
> Then read the key and do some kind of parsing to get all books.
> 2. check for all Bk.1.1 on start-up/first call and check for that
> 3. Do a combination of both, i.e. calculate once and store on install (or
> store if not stored before), then use that to check for all Bk.1.1 first
> time round.
> 4. Store a number of flags such as Gen.1.1=true, Ex.1.1=true, etc.
>
>
> Bar 4, none of these options are efficient however. All of them require at
> least 66 lookups for a standard module. And on small devices, this may be
> an issue.
>
> Chris
>
>
>
> On 28 March 2014 20:50, DM Smith <dmsmith at crosswire.org> wrote:
>
>> It will be performant with Bibles.
>>
>> JSword is stable at the tip. I've just checked in the bug fix that Chris
>> supplied.
>>
>> This change will be stable.
>>
>> -- DM Smith
>>
>> On Mar 28, 2014, at 4:34 PM, Martin Denham <mjdenham at gmail.com> wrote:
>>
>> I was only thinking of using it with SwordBook/AbstractPassageBook but if
>> it is not performant then maybe it is not worth continuing and we should
>> look at Scope.  I thought that it was already being calculated in
>> ZVerseBackend.contains() using the idxRaf.
>>
>> btw is it safe to get the tip of JSword yet?
>>
>> Martin
>>
>>
>> On 28 March 2014 20:19, DM Smith <dmsmith at crosswire.org> wrote:
>>
>>> I think it would be good to support Scope formally, even if it never
>>> makes it into SWORD. As a different issue, we'll be changing JSword to keep
>>> a module's conf pristine and the things that we write to it, will be put
>>> into a side-car conf. This will be the perfect place for us to compute the
>>> value once for all time per module.
>>>
>>> The getRawTextLength is not as easy as I'd like. It's mostly done. A bit
>>> more to do. For a couple of module types, both compressed, it is not
>>> performant. It merely calls getRawText and then length. The problem is that
>>> one has to uncompress the text to see how long it is.
>>>
>>> -- DM
>>>
>>> On Mar 28, 2014, at 3:31 PM, Martin Denham <mjdenham at gmail.com> wrote:
>>>
>>> An alternative method might be to use the Scope value which IBT have
>>> placed in the .conf file, but I can't seem to get access to it via JSword.
>>>
>>> This is printed:
>>> WARNING: Extra entry in kaz of Scope
>>>
>>> And in ConfigEntryTable:
>>>     log.warn("Extra entry in {} of {}", internal, configEntry.getName());
>>>     extra.put(key, configEntry);
>>>
>>> But I can't see any way to get the value from the extra map?  Is it
>>> possible - I am a bit confused by the initialisation and retrieval of
>>> metadata and properties in JSword.
>>>
>>> *Example scopes from IBT modules*
>>>
>>> Scope for kaz:
>>> Scope=Gen-Josh.24.33 Judg-2Chr Ezra-Neh Esth-Ps.150 Prov.0-Prov.4.27
>>> Prov.5-Prov.13.25 Prov.14-Prov.18.24 Prov.19-Song Isa-Lam Ezek-Dan.3.33
>>> Dan.4-Dan.12 Hos-Mal Matt-Rev
>>>
>>> Scope for kylsc:
>>> Scope=Matt-Rev
>>>
>>> I don't know if the strings used are compatible with PassageKeyFactory
>>> but if we only look at the start and end of the scope we may be able to
>>> deduce all that is required because I think IBT are the only people who use
>>> scope.
>>>
>>> Martin
>>>
>>>
>>>
>>> On 28 March 2014 14:12, DM Smith <dmsmith at crosswire.org> wrote:
>>>
>>>> I'll add the method SwordBook.getRawTextLength(Key key), or something
>>>> like it. -- DM
>>>>
>>>> On Mar 26, 2014, at 6:47 PM, Martin Denham <mjdenham at gmail.com> wrote:
>>>>
>>>> Given the above explanations and that many users have already
>>>> downloaded such modules I have experimented with a work-around by adding
>>>> some extra logic to And Bible to specifically cater for the IBT Synodal
>>>> modules.  I did this by making the assumption that all the empty verses
>>>> start with: "<chapter eID=" which appears true and unique.  It is a
>>>> bit of a hack but it almost worked.
>>>>
>>>> The only problem is that after adding the extra getRawText checks it
>>>> takes too long, even on my Nexus 4, to load the book list for IBT modules.
>>>>  However, a simpler way to avoid the getRawText calls would be to add a
>>>>     public int SwordBook.getRawText*Length*(Key key)
>>>> which would be identical code to contains(Key key)
>>>> (->ZVerseBackend.contains) but return verse length instead of a boolean
>>>> (contains() calculates verse length to determine if a verse exists).  What
>>>> do you think?  This would help because IBT empty verse stubs are very short
>>>> and so normally the getRawText would not be required as part of the
>>>> elaborated contains() check in And Bible.
>>>>
>>>> *Note:*
>>>> I have discovered that this problem does not just affect
>>>> deuterocanonical books in IBT Synodal modules, it also affects OT books in
>>>> IBT NT-only modules e.g. KYLSC, which return text like "<chapter eID="gen4"
>>>> osisID="Gen.1"/>".
>>>>
>>>> Martin
>>>>
>>>>
>>>> On 26 March 2014 14:49, DM Smith <dmsmith at crosswire.org> wrote:
>>>>
>>>>> John,
>>>>>
>>>>> Putting this up on sword-devel, since that is a more appropriate
>>>>> location for the discussion to continue. This is really not about JSword,
>>>>> but rather about module making.
>>>>>
>>>>> The nature of osis2mod is to retain all markup except <verse> and
>>>>> </verse> (or their equivalent milestoned version.) This means that the
>>>>> markup for a chapter is put in the module's storage for that chapter and
>>>>> noted in the index. In the case of the chapter that is given below, it is
>>>>> split into 2 parts, Verse 0 and Verse 1.
>>>>> Verse 0 will get the preamble of the chapter:
>>>>> <chapter osisID="EpJer.1">
>>>>> Verse 1 will get:
>>>>> </chapter>
>>>>> (These will have been transformed into their milestoned versions.)
>>>>>
>>>>> Also, verse 2 to 72 will be "linked" to verse 1, meaning that in the
>>>>> index they are given the same location as verse 1.
>>>>>
>>>>> So, verse 0 has chapter start content and verse 1 to 72 have chapter
>>>>> end content.
>>>>>
>>>>> Also, osis2mod does not complain if a verse is missing. Never has,
>>>>> never will. It does "complain" of a verse being present that is not in the
>>>>> versification. Always has, always will.
>>>>>
>>>>> That emptyvss indicates that all verses are present means exactly
>>>>> that: All verses are present. This is not good if the module is in fact
>>>>> incomplete.
>>>>>
>>>>> That JSword indicates that these "empty" verses are present means that
>>>>> they have non-zero length in the module.
>>>>>
>>>>> JSword is graceful in handling this. It determines that the module has
>>>>> content for the verse by examining the index. What Martin is trying to do
>>>>> is find out which books, chapters and verses should be displayed to users
>>>>> in pick lists. The only way this can be done at this time, by either SWORD
>>>>> or JSword with the module in question, is to render each verse and
>>>>> determine that it renders nothing. This is far too expensive an operation
>>>>> to consider.
>>>>>
>>>>> The only way to efficiently determine scope is to examine the index
>>>>> for each verse and see if the length is 0. The Scope entry in the conf has
>>>>> been ruled out. It would have been computed using the reverse logic of
>>>>> emptyvss. Go through the v11n from first verse to last and rather than
>>>>> noting what is missing, note what is present.
>>>>>
>>>>> Today, most of our frontends display pick lists based on the v11n not
>>>>> on the module content. It has long been confusing to end users of modules
>>>>> that don't contain verses in the v11n.
>>>>>
>>>>> In my view, this is a module problem. It is far easier and faster to
>>>>> rebuild and redistribute a module. We can tell a user to upgrade to the
>>>>> most recent version of a module far easier than making and releasing a code
>>>>> change and having them get a new version of the program. When the change is
>>>>> a work-around for something that shouldn't be in module, I think we should
>>>>> avoid that. For example, the NET Bible has some bugs that should be fixed.
>>>>> But instead we have some special code that is essentially: if module is NET
>>>>> then fix such-and-so when it occurs.
>>>>>
>>>>> Together in His Service,
>>>>> DM Smith
>>>>>
>>>>>
>>>>>
>>>>> On Mar 25, 2014, at 11:43 PM, John Austin <gpl.programs.info at gmail.com>
>>>>> wrote:
>>>>>
>>>>> There has been a lot of discussion about how missing material in a
>>>>> v11n should be treated (the discussion of the meaning and use of Scope was
>>>>> part of that). Tools such as osis2mod generated warnings whenever OSIS
>>>>> files lacked any part of the chosen v11n. The Scope conf param was, for a
>>>>> time at least, the recommended method of describing what part of a v11n was
>>>>> covered by a module. For these reasons, many existing modules (IBT alone
>>>>> has at least 26 such modules) are currently encoded so as to encompass the
>>>>> entire v11n, returning empty-string verse content for all verses in the
>>>>> v11n that are not included in the module, and using the .conf Scope param
>>>>> to define exactly what is present in the module.
>>>>>
>>>>> So even though current module making best practice may be different,
>>>>> it would be good for JSword to be graceful with modules that are encoded
>>>>> somewhat differently if at all possible, at least for a time. There are
>>>>> many modules out there, old and new, which don't contain the complete v11n,
>>>>> so determining book coverage is important.
>>>>>
>>>>> -John
>>>>>
>>>>>
>>>>>
>>>>> On 03/25/2014 08:19 PM, DM Smith wrote:
>>>>>
>>>>> Those verses exist since they are defined in the OSIS input file to
>>>>> osis2mod. Osis2mod retains everything in its input. This is a well
>>>>> documented behavior of osis2mod.
>>>>>
>>>>> The end chapter markup will be put in the last verse that is in the
>>>>> chapter, which might be verse 0.
>>>>>
>>>>> They should use xslt to strip empty verses, chapters and books out of
>>>>> their file into an intermediate file and give that as input to
>>>>> osis2mod.
>>>>>
>>>>> Alternatively they can use <!-- ... --> to comment out huge swaths of
>>>>> the input file.
>>>>>
>>>>>
>>>>> -- DM
>>>>>
>>>>> On Mar 25, 2014, at 7:48 AM, Martin Denham <mjdenham at gmail.com
>>>>> <mailto:mjdenham at gmail.com <mjdenham at gmail.com>>> wrote:
>>>>>
>>>>> IBT have just passed me more information regarding their handling of
>>>>> empty verses to help clarify if this is an IBT module issue or not.
>>>>> The following is an extract from IBT's e-mail:
>>>>>
>>>>>    Here are examples of how IBT's OSIS source defines empty verses in
>>>>>    the markup:
>>>>>
>>>>>    Empty book (Epistle of Jeremiah):
>>>>>    <div type="x-Synodal-non-canonical"__><div type="book"
>>>>>    osisID="EpJer"><chapter osisID="EpJer.1"><verse sID="EpJer.1.1-72"
>>>>>    osisID="EpJer.1.1 EpJer.1.2 EpJer.1.3 EpJer.1.4 EpJer.1.5
>>>>>    EpJer.1.6 EpJer.1.7 EpJer.1.8 EpJer.1.9 EpJer.1.10 EpJer.1.11
>>>>>    EpJer.1.12 EpJer.1.13 EpJer.1.14 EpJer.1.15 EpJer.1.16 EpJer.1.17
>>>>>    EpJer.1.18 EpJer.1.19 EpJer.1.20 EpJer.1.21 EpJer.1.22 EpJer.1.23
>>>>>    EpJer.1.24 EpJer.1.25 EpJer.1.26 EpJer.1.27 EpJer.1.28 EpJer.1.29
>>>>>    EpJer.1.30 EpJer.1.31 EpJer.1.32 EpJer.1.33 EpJer.1.34 EpJer.1.35
>>>>>    EpJer.1.36 EpJer.1.37 EpJer.1.38 EpJer.1.39 EpJer.1.40 EpJer.1.41
>>>>>    EpJer.1.42 EpJer.1.43 EpJer.1.44 EpJer.1.45 EpJer.1.46 EpJer.1.47
>>>>>    EpJer.1.48 EpJer.1.49 EpJer.1.50 EpJer.1.51 EpJer.1.52 EpJer.1.53
>>>>>    EpJer.1.54 EpJer.1.55 EpJer.1.56 EpJer.1.57 EpJer.1.58 EpJer.1.59
>>>>>    EpJer.1.60 EpJer.1.61 EpJer.1.62 EpJer.1.63 EpJer.1.64 EpJer.1.65
>>>>>    EpJer.1.66 EpJer.1.67 EpJer.1.68 EpJer.1.69 EpJer.1.70 EpJer.1.71
>>>>>    EpJer.1.72"/><verse eID="EpJer.1.1-72"/></chapter>__</div></div>
>>>>>
>>>>>    I'm not sure how osis2mod handles all this when importing to the
>>>>>    module, but it works perfectly without warnings or errors. Also,
>>>>>    when the resulting module is passed to the "emptyvss" tool, it
>>>>>    passes this test without warnings.
>>>>>
>>>>>
>>>>>
>>>>> On 25 March 2014 11:38, Martin Denham <mjdenham at gmail.com
>>>>> <mailto:mjdenham at gmail.com <mjdenham at gmail.com>>> wrote:
>>>>>
>>>>>    I am having problems getting a list of BibleBooks contained in
>>>>>    some AV modules which we know do not contain certain books.  I
>>>>>    can't work out if the problem is with JSword, the modules, or
>>>>>    osis2mod.
>>>>>
>>>>>    There are 2 related problems I can see:
>>>>>
>>>>>     1. book.contains(nonExistingVerse) returns TRUE
>>>>>     2. book.getRawText(nonExistingVerse) returns <chapter end tag>
>>>>>
>>>>>    Here is a simple test to show the problem using KAZ which has
>>>>>    Synodal v11n but does not contain any deuterocanonical books:
>>>>>
>>>>>    SwordBook kaz = (SwordBook)Books.installed().getBook("KAZ");
>>>>>    Verse esd11Verse = new Verse(kaz.getVersification(),
>>>>>    BibleBook.ESD1, 1, 1);
>>>>>    System.out.println(kaz.contains(esd11Verse));// prints: *true*
>>>>>    System.out.println(kaz.getRawText(esd11Verse));// prints:
>>>>>    *<chapter eID="gen7" osisID="1Esd.1"/>*
>>>>>    Verse esd12Verse = new Verse(kaz.getVersification(),
>>>>>    BibleBook.ESD1, 1, 2);
>>>>>    System.out.println(kaz.contains(esd12Verse));// prints: *true*
>>>>>    System.out.println(kaz.getRawText(esd12Verse));// prints:
>>>>>    *<chapter eID="gen7" osisID="1Esd.1"/>*
>>>>>
>>>>>    So how does "<chapter eID="gen7" osisID="1Esd.1"/>" get into verse
>>>>>    content unexpectedly?
>>>>>
>>>>>    It seems to me like it could be either:
>>>>>
>>>>>     1. a module problem; but IBT say they do not add empty verse slots
>>>>>     2. Sword osis2mod issue
>>>>>     3. JSword issue: why is JSword returning a chapter end tag
>>>>>        instead of verse content
>>>>>
>>>>>    Any ideas what might cause this problem?
>>>>>
>>>>>    Thanks
>>>>>    Martin
>>>>>
>>>>>
>>>>>    On 11 March 2014 12:15, DM Smith <dmsmith at crosswire.org
>>>>>    <mailto:dmsmith at crosswire.org <dmsmith at crosswire.org>>> wrote:
>>>>>
>>>>>        We haven't pushed this down into JSword. So far it is the
>>>>>        responsibility of the front-end. Chris B has made it efficient
>>>>>        to ask a Book whether it contains a Verse.
>>>>>
>>>>>        Essentially, when it comes to asking a module if it has
>>>>>        meaningful content, you want containsAny(Key verses, boolean
>>>>>        includeIntros) and containsAny(Key verses) { return
>>>>>        containsAny(verses, false); }
>>>>>
>>>>>        I think it should ignore verse 0 by default. If it doesn't
>>>>>        have verse content, then does the content really mean something?
>>>>>
>>>>>        As you have noted contains(Key) is confusing. There are a few
>>>>>        places where it means containsAny. Usually it means
>>>>>        containAll. The name, contains, was chosen early as we derived
>>>>>        from a container class where the argument was an element of
>>>>>        the container.  That is, contains is supposed to mean
>>>>>        isMemberOf. Later we changed the inheritance as it wasn't an
>>>>>        "is a" relationship.
>>>>>
>>>>>        But we need to be careful of not introducing more confusion.
>>>>>
>>>>>        By the way, the list serve was holding mail for a few days.
>>>>>
>>>>>        In Him,
>>>>>                DM
>>>>>
>>>>>        On Mar 8, 2014, at 5:26 PM, Martin Denham <mjdenham at gmail.com
>>>>>        <mailto:mjdenham at gmail.com <mjdenham at gmail.com>>> wrote:
>>>>>
>>>>>        > Is there an efficient way to find if a BibleBook is
>>>>>        contained in a Book (Bible or commentary) using JSword?
>>>>>        >
>>>>>        > I recall this subject being discussed but can't recall the
>>>>>        outcome.
>>>>>        >
>>>>>        > Thanks
>>>>>        > Martin
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>>> Instructions to unsubscribe/change your settings at above page
>>>>>
>>>>
>>>> _______________________________________________
>>>> jsword-devel mailing list
>>>> jsword-devel at crosswire.org
>>>> http://www.crosswire.org/mailman/listinfo/jsword-devel
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> jsword-devel mailing list
>>>> jsword-devel at crosswire.org
>>>> http://www.crosswire.org/mailman/listinfo/jsword-devel
>>>>
>>>>
>>> _______________________________________________
>>> jsword-devel mailing list
>>> jsword-devel at crosswire.org
>>> http://www.crosswire.org/mailman/listinfo/jsword-devel
>>>
>>>
>>>
>>> _______________________________________________
>>> jsword-devel mailing list
>>> jsword-devel at crosswire.org
>>> http://www.crosswire.org/mailman/listinfo/jsword-devel
>>>
>>>
>> _______________________________________________
>> jsword-devel mailing list
>> jsword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/jsword-devel
>>
>>
>>
>> _______________________________________________
>> jsword-devel mailing list
>> jsword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/jsword-devel
>>
>>
> _______________________________________________
> jsword-devel mailing list
> jsword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/jsword-devel
>
>
>
> _______________________________________________
> jsword-devel mailing list
> jsword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/jsword-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/jsword-devel/attachments/20140415/ea9cd5f4/attachment-0001.html>


More information about the jsword-devel mailing list