[sword-devel] A possible way to speed up was Re: Search optimized (still too slow)

Thu, 8 Apr 2004 22:33:56 +0100

What happened to clucene, I've been trying to get it to work but no  
luck as yet. With all the talk of speeding up searches, and I don't  
know too much about searching, but I think the only sensible way to  
search anything biggish is to create an index. Yes with faster  
computers and more memory, we can just read the bibles in and run  
through them fast. However searches can get complicated, and modules  
bigger.

Perhaps a index could be created the first time a module is searched.  
Much in the same way MacSword and BibleTime cache the contents of  
lexicons, to speed them up. -- ideally we wouldn't have to do this  
either.

Using indexes would not be helped by separating content and markup.  
Other things might such as rendering speeds - I don't know.

–Will

On 8 Apr 2004, at 19:27, Daniel Glassey wrote:

> Hiya,
> I was going to wait until I had thought this through (and had got
> somewhere) but since it has been brought up I think I'd better mention
> it. Quite a while back David White suggested that separating content
> from markup would be a good idea. With the files getting big by using
> raw OSIS(or is it pseudo-OSIS, I'm not sure) and the search being so
> slow in these modules I think it is worth doing - to aim for 1.6.0 or
> 2.0.0 or whatever the next major version is.
>
> What I'm suggesting is to make a new module type that contains a binary
> representation of OSIS with the text in one file and the markup in a
> second file. I think the markup should be based on something like WBXML
> (http://www.w3.org/TR/wbxml/) but have pointers into the text rather
> than containing the text.
> Suggested name SBXML (Sword Binary XML)
> This would mean that the search could be made on just the plain text.
> Most filters would only operate on the markup.
>
> If we think it's a good idea then let's try to design this using the
> wiki. I've added a page for it[1].
>
> I think it should be possible to subclass the existing classes for use
> by new module drivers and filters so that the current code will  
> continue
> to work.
>
> Until it would be ready to become core would be optionally included on  
> a
> configure option.
>
> I don't think I've explained that very well so questions, discussion,
> plain opinions and constructive criticism would be very welcome :)
>
> I'm starting at the bottom up so I'm currently looking at changing
> VerseKey (new class VerseKey2) to support multiple versification
> systems. I'll explain that once I get far enough to do so. But it's
> basically going to be based on the OSIS refsys system[2] and it is  
> going
> to lump all the books together rather than separating into testaments.
> Chris, I see now you've already been doing something on the
> versification stuff[3], how is that going?
>
> Regards,
> Daniel
>
> [1]http://www.crosswire.org/ucgi-bin/twiki/view/Swordapi/SbXml
> [2]http://www.ccel.org/refsys/refsys.html
> [3]http://www.crosswire.org/ucgi-bin/twiki/view/Swordapi/ 
> AlternateVersification
>
> On Thu, 2004-04-08 at 14:59, Joachim Ansorg wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Hi,
>> I spent some time to optimize the search in CVS.
>> The problem is/was for example the extensive the use of XMLTag in the  
>> filters,
>> I tried to avoid them in the filters where it was possible without  
>> having to
>> rewrite them.
>> I also used SWBuf::append directly where SWBuf::operator+ was used  
>> before.
>>
>> I see some good chances where we can optimize:
>> 	-Using XMLTag as few as possible
>> 	-Change copy constructor of SWBuf to implicit sharing, we have lots  
>> of SWBuf
>> copy-constructor calls I think
>> 	-optimize SWBuf::append(char), maybe we can tweak the memory  
>> allocation to
>> alloc larger blocks but more seldom. the append(char) function gets  
>> called
>> more than any other function in a search
>>
>> But the best solution would be to parse the text only once and then  
>> do the
>> right stuff with it. ATM each filter parses the text again which will  
>> make
>> modules with lot's of filters slow (e.g. KJV).
>>
>> I got these results (with debug code and profiling code included):
>> WEB:
>> before:	0m8.233s
>> after:	0m7.586s
>> 	
>> KJV:
>> before:	1m35.769s
>> after:	0m21.874s
>>
>>
>> I have not yet committed, because I have to make sure the code  
>> doesn't have
>> some untested bugs.
>>
>> Joachim
>
>>
>
> _______________________________________________
> sword-devel mailing list
> sword-devel@crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
>