[sword-devel] search idea

Matthias Ansorg sword-devel@crosswire.org
Mon, 3 Jan 2000 22:04:20 +0100

On Mon, 03 Jan 2000, you wrote:

>On Saturday, 1 January, 2000 21:07:21, Matthias Ansorg <aNsis@gmx.de> wrote:
>> Trevor et al.,
>> Some ideas that might perhaps be useful to integrate when planning advanced
>> search features:
>> It would be useful for advanced searching to be more able to distinguish the
>> semantic means of text when searching. Examples:
>> 1. Searching for a number like, say, 33, produces at present in some
>> translations like "1952 Schlachter Bibel" hits like Psalms 18:32. The "33" is
>> here contained in a string that shows that this verse was originally verse 33.
>> It would be useful to do a search, say, FIND numbers(33) that finds only real
>> numbers contained in the bible text.
>An interesting search. I tried the same thing in Online Bible for Macintosh
>only to disover that it was searching on Strong's numbers. Bizarre as the
>translation I seached in (RSV) does not have these included.
>My view is that text is text and annotation is annotation. The default would
>be to search only the text. The search syntax I'm proposing includes "field
>names" so that items like annotations could be dealt with sensibily.

OK, Agreed. Bible is Bible, annotation is addition. Anyway, with your new search system one will
get only hits on the bible text by default. Good.

>The introduction of these field names is a function of the schema that
>accompanies each module. The schema might be implicit for existing module or
>could be made explicit. For translations some obvious field names are BOOK,
>CHAPTER, VERSE. (These names would have to be internationalised at user's
>> 2. Strong's numbers: It would be useful to do a search that finds only hits
>> with a given Strong number and not additional verss that contain this number
>> in it's text. Perhaps FIND strongs(0929). It's interesting monitoring the use
>> of a specific Hebrew word through the bible using BibleTime's graphical
>> analysis feauture!
>The presence of Strong's numbers n a translation would (implicitly)
>introduce a suitable field into the schema.
>> 3. Names: Imagine the situation you have forgotten the name of a single person
>> mentioned in the Bible or a commentary or book except of one or two letters.
>> You could avoid unnecessary hits by restricting the search to only names,
>> perhaps FIND names(Ben*). The markup of names is provided by ThML-markup.
>Again the presence of such information in a translation would (implicitly)
>extend the schema with a suitable field.
>The GBF markup scheme does not appear to have the same feature of
>distinuishing names.

Might be useful to be implemented in GBF2 during the current redesign. 

>(By the way, where is a specification of ThML?)

Try http://www.ccel.org/ThML/ThML1.0.htm
Also available in pdf and rtf format in the same directory.
The first official version (1.0) of ThML is now out.

>> 4. Annotations: show only hits that occur in annotations or that occur not in
>> annotations to reduce the amount of unnecessary hits to view through.
>Again the presence of annotations extend the schema.
>> 5. Meta information: Find (perhaps in some modules at one time) information
>> that is stored in meta fields, such as the publication date or author of
>> commentaries or (of course not yet implemented) general books, an appropriate
>> markup like ThML provided. Such as: give me all books written by Darby would
>> be a FIND meta.author(Darby) IN modules.books
>I'm beginning to sound like a parrot. :-)
>However, I do not see a need to distinguish this "meta" data from other
>associated data provided with a text. To me "meta data" implies the
>structure of the database itself, i.e. the schema. What mght be useful is a
>search like FIND modules.meta=(annotation and names and ThML).

"meta information" is an inapropriate expression I used. I tried to describe that kind of data that
does not belong to any specific part of the text itself, such as copyright info or author or title
or translator or ISBN of printed edition or ... .
I am, however, glad that you will already integrate corresponding search features in your
specifications :-) You see my little understanding of what "full-text-retrieval" is.

>> 6. Scripture references: a appropriate marup to sripture references provided
>> like in ThML, one could search for each reference to a given verse or verse
>> range in every commentary and (perhaps later) even every book you have. That
>> way, one could find nearly everything written about a specific verse and not
>> only that which is included in the appropriate portions of your commentaries.
>> 7. texts with other semantic markups like date or anything else that might be
>> of some use. Searching on verses that are written in a certain mood and are
>> about a certain topic (proposed by Jerry Hastings earlier in this thread) is
>> related to this but perhaps easier to handle when coding: these meta
>> information is the same for every bible translation and is therefore not
>> needed to be marked up in the bible text itself.
>I'll defer on this one given the followup from Mads Kiilerich
>> IMHO, the idea of using index files for searching is great, for it provides
>> possibilities like creating an (perhaps semi-hand-written) index file with a
>> list-of -contents for an mp3-module like an audio sermon if this becomes once
>> a module for SWORD. (and is not done over href to file like at the moment in
>> BibleTime).
>I wonder about this "hand-written" index file. Would itnot be better to have
>an installation specific contents file, which perhaps should be module in
>its own right.

That is, each audio module comes with its contents file that would perhaps show up by choosing
the module in a SWORD frontend? And the index file is automatically generated of this contents
file? Agreed, for this pretty idea contributes to easier handling of these hypothetical audio
modules by the user's side: one could implement a jump to the appropriate mark in the audio file if
a user clicks on a entry in the contents file.

>> Please discard that portions of this message that are only "technical toys"
>> and not useful in a bible study tool directed to further HIS kingdom.
>Personally I don't see your requests as technical toys. Because of my
>professional involvement in full-text retrieval systems I want these same
>features for the more important job of understanding the scriptures.
>Yesterday I had lunch at my mother-in-law's house. She has a radio in the
>kitchen tuned to BBC Radio 4. The audio quality is poor because she has it
>on the long wave frequency (AM) rather than the better quality of the VHF
>service (FM). The transformation is incredible when you switch her receiver
>>from AM to FM. For me the use of full-text searches is equivalent to that
>same switch over from low quality to high quality.

Promising comparison.

>One final comment concerning these extensions to the underlying schema. If
>the corresponding data is not in the module then will not be possible to
>search with these additional fields names. They'd always come up with no
>Regards, Trevor
>British Sign Language is not inarticulate handwaving; it's a living
>language. So recognise it now.
> <>< Re: deemed!

In Christ.