[sword-devel] Optimizing index time Was: Re: module modtime -vs- CLucene index out-of-date-ness

DM Smith dmsmith555 at yahoo.com
Thu May 3 03:48:28 MST 2007

That's a 3x improvement under Windows when Google indexing and Mcafee  
is on.

I don't have commit privs for this patch.
Would someone else please commit it?

In His Service,

On May 2, 2007, at 11:42 PM, Chris Little wrote:

> My benchmark system is a 2.0GHz Pentium-M with a 7200RPM drive and
> 1.25GB ram. These are times for compressing KJV (which takes
> significantly longer than most other Bibles).
> Old mkfastmod (mcafee protection & google indexing on):
> 5m33.007s
> Old mkfastmod (mcafee protection & google indexing off):
> 4m16.322s
> New mkfastmod (virus protection/search indexing made insignificant
> differences):
> 1m46.252s
> --Chris
> DM Smith wrote:
>> Karl fixed the bugs in my patch and I am attaching a new patch.
>> His statistics under cygwin on Windows XP against all modules:
>> Before: old mkfastmod: 344.577u 129.499s 9:08.59 86.4%
>> After: new mkfastmod: 328.452u 29.749s 6:20.30 94.1%
>> (The three values are: user, system, wall and cpu)
>> So there was nearly a 30% gain.
>> --------------------------------------------------------------------- 
>> ---
>> Chris has volunteered to benchmark under Windows.
>> On May 2, 2007, at 7:52 PM, DM Smith wrote:
>>> Attached is a patch that uses the RAMDirectory. It parallels the
>>> JSword code and it compiles, but other than that I have not  
>>> tested it.
>>> Would any of you mind testing it, especially in Windows with Virus
>>> scanning on and also off. There should be negligible difference
>>> between the two. Also, measure RAM usage when indexing a Bible with
>>> Strong's numbers, like the KJV.
>>> <patch.zip>
>>> In His Service,
>>> DM
>>> On May 2, 2007, at 4:54 PM, DM Smith wrote:
>>>> Chris Little wrote:
>>>>> Unfortunately, that's impractical. With a virus scanner on, the
>>>>> compression takes 5 minutes for a single Bible (OT+NT) on my Win32
>>>>> system (2GHz Pent-M, 7200RPM drive), due to the constant disk  
>>>>> access. We
>>>>> would either have to tell users to disable virus protection or  
>>>>> deal with
>>>>> people complaining that their systems freeze every time they  
>>>>> add/update
>>>>> a module.
>>>>> --Chris
>>>> Actually, Lucene has an implementation of a RamDirectory to  
>>>> which the
>>>> index can be written. And once completed it can be copied to the  
>>>> local
>>>> file system. We've done it in JSword and the results were  
>>>> phenomenal. I
>>>> presume that the CLucene implementation is sufficiently similar to
>>>> Lucene to have it. It is less than 10 lines of additional code  
>>>> in Java.
>>>> The only problem is that it eats RAM proportional to the size of  
>>>> the
>>>> final index. I have not measured it to see how big it is, but since
>>>> Win98SE with all the updates on an old Pentium laptop is hardly  
>>>> usable
>>>> with less than 64M RAM, I think that most machines have enough RAM.
>>>> After ugrading my old laptop to 128M ram, JSword can index in  
>>>> about 4
>>>> minutes, whereas I never had the patience to let it complete  
>>>> before.
>>>> That aside, it shifts from being disk bound to cpu bound and the  
>>>> machine
>>>> is still practically unresponsive. So I think that it will still be
>>>> impractical.
>>>>> Kahunapule Michael Johnson wrote:
>>>>>> What about updating the Sword engine to index each module as  
>>>>>> it is
>>>>>> installed, if the indexing can be used. That way, you get small
>>>>>> downloads for everyone, faster searches for those who can use  
>>>>>> indexes,
>>>>>> and a little more module installation time.
>>>>>> Just a thought...
>>>>>> Michael

More information about the sword-devel mailing list