[sword-devel] Thesaurus

Trevor Jenkins sword-devel@crosswire.org
Fri, 31 Dec 1999 05:22:58 +0000


On Thursday, 30 December, 1999 18:40:04, darwin@ichristian.com 
<darwin@ichristian.com> wrote:

> I have enjoyed the messages concerning the use of a thesaurus type search,
> and I remember ideas skirting my ideas.  I would love the ability to search
> on several different levels.  Using the example of "jewel" that was
> discussed earlier, I will try to show what I would find ideal (until I
> learn otherwise).

Let me repeat the (outline) structure I posted a few days ago:

>> The structure of a thesuarus could be:
>>
>>    term
>>        broader term
>>        narrower terms
>>        related terms
>>        use for
>>        scope
>>        synonyms
>>
>> There is an ANSI standard for this structure, which is similar to the one
>> I've given above. The trick is to use same inverted file scheme for
>> thesaurus files as other text.

For each term (notice that this is not "word") there can be zero or more of
each one of these concepts. There would be separate entries for ruby and
opal and for jewel.

And now to relate them to your requests:

> I would like a tree constructed for a variety of search possibilities.
>
> The first is obvious, and it is peers, also known as synonyms.  This could
> include words like gem.

Directly these are the "synonyms". :-)

> The second is to search for the word and its "descendents".  This would
> include words like ruby and opal.

These are the "narrower terms". :-)

> The third is to search for the word and its "predecessors".  This would
> include words like stone.  The next level up could include mineral.

These are the "broader terms". :-)

The one thing that could be argued is that gold and silver might be related
terms of jewel.

> The syntax for such possibilities is a key issue that I haven't considered
> yet, I am open to almost anything.

I prefer the search syntax of the Common Cammand Language (CCL), which is an
ISO standard. I can't lay hold of my copy at the moment but the convention
for differentiating between search terms and thesaurus terms is

FIND jewel

would search for the term jewel in the appropriate text

FIND NT(jewel)

would search the thesaurus for the term jewel, extract the narrower terms
(e.g. ruby, opal) and then search for those terms in the appropriate text.

The NT could be replaced by RT for related terms, BT for broader terms, SYN
for synonym terms. The use for and scope elements exist as user help text
rather than for searches.

Because I would want to search both Bible texts, commentary texts and also
what ever other texts might be held in SWORD I don't want to limit the
feature to Bible texts alone.

> There are of course quesitons of scope, and there should be a default scope
> (x levels) set in a configuration file.  The user should be prompted to
> expand or narrow the search using some reasonable dialog.

I presume that you mean how any distinct terms are extracted from the
thesaurus before the search in the appropriate text is made. There is one
limit I've not (yet) mentioned in that a thesaurus is a directed acyclic
graph to eventually any expansion of terms would eventually terminate. Might
need lots of memory if you started too far up the graph.

> I don't know if these items are assumed in the discussion of a thesaurus
> lookup, but I thought I would throw in my opinion early in the process.

As you can see from the above they were just obfuscated in my assumption
that everyone was familiar with the conventions used to describe such
thesaurii. :-)

>    Creation is more scientifically valid than evolution!

Amen to that! :-|

Regards, Trevor

British Sign Language is not inarticulate handwaving; it's a living
language. So recognise it now.

--

<>< Re: deemed!