[sword-devel] CLucene and Sword

DM Smith dmsmith555 at yahoo.com
Fri May 25 11:48:17 MST 2007


Manfred,

What ever you do, don't index a module without deleting the current  
index, if there is one. The code, as it stands, will add data to an  
existing index, but not as an update but as a duplication. This will  
make the index much bigger than it needs to be and it will provide  
multiple hits for the verses that are indexed multiple times. I think  
that if the multiple hits for a single verse are treated as one, then  
it won't matter to the user.

I think that this "feature" is a bug, but I could be wrong. (I think  
that the create flag should always be set to true, which will do the  
delete if there is one present.)

In His Service,
	DM

On May 25, 2007, at 2:00 PM, Troy A. Griffitts wrote:

> Manfred,
> 	Have a look at the source for sword/utilities/mkfastmod.cpp and
> sword/examples/cmdline/search.cpp
>
> 	Checking whether or not the indecies are created is the most  
> confusing
> part.  Originally, the plan was to let the SWModule::search method
> return whether or not a search was supported by the search type
> requested.  So, if you called SWModule::search requesting CLucene  
> type,
> and passed a bool * to justCheckIfSupported, it would set your bool to
> true if the indecies were created, and false otherwise.  This would
> allow search engine plugins to create different indecies depending on
> the search string features passed in and such.  There are routines to
> see if a driver even is compiled with code which CAN create a fast
> index, and also if it HAS created the index.
>
> 	Anyway, it's all too complicated and impractical.  Hopefully we will
> change it to something much more straightforward, like: bool
> hasIndex(int searchType), when we do the 2.0 refactoring soon.
>
> 	The place to look for the current interface is
> sword/include/swsearchable.h  Someone else wrote the comment in there,
> who didn't understand how things worked.  I can't blame them, as I
> hadn't written ANY comments, so they at least tried.  I've updated  
> them
> slightly and committed just now.
>
> 	Currently, the best way to 'make it work' is to use the search dialog
> from BibleCS as an example.  It shows the [Create Index] button to the
> user if the indecies have not been created, and if they have, it hides
> the button and adds the "Optimized Search" option to the user  
> choices if
> the index is there.
>
> 	Here's a direct link to the file in svn.  In your browser, search for
> all occurances of: toggleIndex
> That should get you into all the blocks of code you need to lift.
>
> http://crosswire.org/svn/biblecs/trunk/searchfrm.cpp
>
> ('target' is any SWModule *)
>
>
> 	Hope this helps,
>
> 		-Troy.
>
>
>
> Manfred Bergmann wrote:
>> Troy,
>>
>> that's great.
>> I finally compiled sword with clucene support for the Mac.
>> Unfortunately currently only for PPC platform because cross-compiling
>> clucene for Intel didn't work. Maybe I need someone with an Intel Mac
>> for this.
>>
>> However, there are some question to using the sword clucene
>> implementation.
>>
>> - where are the index files stored?
>> - are there some API examples on how this works or is it straight
>> forward with looking at the API docs?
>>
>>
>> Regards,
>> Manfred
>>
>>
>>
>> Am 18.05.2007 um 19:56 schrieb Troy A. Griffitts:
>>
>>> Manfred,
>>>     I believe Will's reason for not using CLucene in SWORD was  
>>> because
>>> he couldn't easily get CLucene compiled on the Mac.  Using SWORD's
>>> CLucene implementation has many advantages, and I'm not sure any  
>>> real
>>> world disadvantages.  But, of course, I'm biased.
>>>
>>> o   You get to share indexes between frontends
>>> o   You get the implementation for free
>>> o   Your features continue to improve for free when others  
>>> contribute
>>> o   You get to benefit others if you add features
>>>
>>> Currently, to my knowledge, SWORD's implementation of CLucene  
>>> supports
>>> MORE features than any frontend exposes (with the possible
>>> exception of
>>> DM's latest JSword work):
>>>
>>> o    Full SWORD VerseKey Range parsing support (e.g., Search only in
>>> Paul's Epistles, "Rom-Phile", or "Jo;1jo-3jo;rev")
>>> o    Choose verse or chapter granularity for a hit (e.g., Find all
>>> these
>>> words within the same [verse | chapter])
>>> o    Search in any SWORD module type (Bibles, General Books,
>>> Commentaries, Lexica, Devotionals, etc.)
>>> o   Advantage of using SWORD's filter facility to massage data  
>>> before
>>> indexing:
>>>        - Ignore accents and diacritics in Greek and Hebrew
>>>        - Ignore critical markup in transcriptions.
>>> o   Currently supported doc fields:
>>>        - key: The SWORD Key (e.g., in a lexicon "Adam", in a Bible,
>>> the
>>> osisID)
>>>        - content: The body of the entry
>>>        - lemma: Strong's numbers or other lemma data included in
>>> the module
>>> o   Seamless integration with other SWORD search mechanisms:
>>>         - ability to search WITHOUT creating indexes.  This is
>>> frustrating for me with the newest version of Bibletime.  There are
>>> often times when I don't want to create a lucene index on a  
>>> module.  I
>>> seldom search most modules and an unindexed search average 5 second
>>> wait
>>> time is perfectly acceptable to me on these modules.  I neither
>>> want the
>>> disk overhead nor the initial index creation time.
>>>        - Regular Expression searching
>>>        - Searching in ANY EntryAttribute which existing filters, or
>>> your
>>> custom filters, might decide to add.  Some of these currently  
>>> include:
>>> footnotes, headings, lemma, morph, AVPhrase (Greek lexicon,  
>>> Authorized
>>> Version translation choices for a Greek entry), src (interlinear  
>>> data
>>> which links a translation to original), refList (footnotes
>>> crossreference verses), morpheme (WLC Hebrew morpheme breakdown).
>>> (DM:
>>> This seems a logic place to add the ability to create new CLucene  
>>> doc
>>> fields based on these modular filters)
>>>
>>> In conclusion, it seems to me that utilizing and extending the  
>>> current
>>> search support in SWORD benefits everyone and leverages an already
>>> existing solid set of features.
>>>
>>>     -Troy.
>>>
>>>
>>> Manfred Bergmann wrote:
>>>> Hi.
>>>>
>>>> Since when is CLucene integrated in Sword and for what exactly  
>>>> is it
>>>> used?
>>>> Can it be used by client applications for searching?
>>>>
>>>> I'm not really satisfied with using Java Lucene in Objective-C in
>>>> MacSword.
>>>> It is possible to use Java classes in Objective-C but it is not  
>>>> very
>>>> straight forward and difficult to debug.
>>>> So I'm wondering if we could get rid of Lucene and use the Sword
>>>> integrated CLucene.
>>>>
>>>>
>>>>
>>>> Regards,
>>>> Manfred
>>>>
>>>>
>>>> _______________________________________________
>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>> Instructions to unsubscribe/change your settings at above page
>>>>
>>>
>>> _______________________________________________
>>> sword-devel mailing list: sword-devel at crosswire.org
>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>> Instructions to unsubscribe/change your settings at above page
>>
>>
>> _______________________________________________
>> sword-devel mailing list: sword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
>
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page




More information about the sword-devel mailing list