[sword-devel] Comming soon: new improved sword searching

sword-devel@crosswire.org sword-devel@crosswire.org
Sun, 15 Sep 2002 14:18:15 +0600


> On September 14, 2002 21:39, David White wrote:
> This is exactly how I did it for the Bible program I was working on. It must 
> be that great minds think alike ;-) For Sword, things will be a little 
> different. I'll post a page soon with the description of what I am planning 
> to use as an index file format.

It seems that all overlooked my idea, which I pointed in a past letter:

Use this index format for rare words, but bit vectors for often encountered 
words.

You also can compress (using ZIP/whatever) indexing better, storing not 
absolute positions of words but relative positions relatively previous 
occurences. Better would be to use just two bytes, pointing absolute verse 
numbers rather than book/chapter/verse.

> > Anyhow, if I stored the
> > locations of all words, the index file ballooned out to over 2 megabytes
> > (for the KJV, or any typical Bible). When I stored only words that
> > appeared less than 2000 times, it was just over 1 megabyte. That's going
> > to be a substantial difference on space-limited devices.
> 
> Do you really think 1 megabyte makes that much difference? How space-limited 
> is the most limited device that Sword currently exists on? Very space-limited 
> devices may want to have no index at all. Certianly on most PCs, +/- 1 Mb 
> isn't a big deal. Perhaps some of the handheld frontend maintainers can give 
> some specs.

Time of Internet downloading is also important (for me about 2.5 Kbyte/sec). 
Internet is gratis only in USA (and probably Britain).

P.S. Please make search framework two-level: on the first level only support 
for index and other general things; while on the above level may be anything 
like regex or whatever one will add.

P.P.S. Does it depend the search architecture from whether do you going now to 
do search in paragraphs? (Remember the discussion about paragraphs started by 
me!) I mean whether it would be to difficult to add paragraph searching in the 
future if not now. If it depends, please tell me and I will leave my current 
working on Sword Server and its clients and will plan paragraphs searching 
architecture with bit vectors, write the code and test it for speed. I VERY 
want paragraph searching.
-- 
Victor Porton (porton@ex-code.com)