[sword-devel] Why Sword?

Trevor Jenkins sword-devel@crosswire.org
Mon, 11 Feb 2002 11:30:32 +0000 (GMT)


On 11 Feb 2002, David White <dave@whitevine.com> wrote:

> On Mon, 2002-02-11 at 02:26, Trevor Jenkins wrote:
>
> > For those who don't know or have forgotten I proposed an inverted file
> > search system. Every word to be indexed and its position recorded. The
> > position pointers should be of two forms. For a Bible search program the
> > obvious one is position as book, chapter, verse, word within verse.
> > However, there are enough examples in the scriptures where the position
> > should be book, chapter, sentence, word within sentence because sentences
> > cross verse (even chapter boundaries). Actually it should include
> > translation too.
> >
>
> Although I use a system like this in my program, and it works well, I
> don't necessarily think this is needed. A bible in plain text is a few
> megabytes of data, it should take no time at all to iterate over. An
> indexing system could give even better results, but isn't necessary to
> give reasonable results.

This is a common criticism of inverted file systems. The cost of a few
ad-hoc queries using grep is faster than building the indicies. However,
once you go beyond a few such queries, especially ones involving
positional information then a "grep" sytyle approach quickly slows. If the
user is likely to make many searches in the text then the cost of grep
remains constant whereas the cost of an inverted search rduces. Where the
break even point is I'll leave to others to decide. For each Sword user
the setup cost of the inverted file will take some time to recoup unless
each module is distributed with a pre-built copy, in which case the
processor time to create it is nil.

Regards, Trevor

British Sign Language is not inarticulate handwaving; it's a living language.
Support the campaign for formal recognition by the British government now!
Details at http://www.fdp.org.uk/ or http://www.bsl-march.co.uk/

-- 

<>< Re: deemed!