[sword-devel] sword 1.5.9: problem while searching

Linas S. mail at operis.org
Thu Sep 27 22:06:15 MST 2007


Hi Troy,

First, I would like to thank you for very detailed explanation.

I followed your text and tried check what is the exact situation in my  
case. The module I use is "Lithuanian". In .conf file there is  
Encoding=UTF-8. Both icu and CLucene are installed and enabled in  
usrinst.sh. I am not sure if I understood your question concerning indexed  
module. I have't run mkfastmod on the module "Lithuanian" - I just have  
sent the text to someone of the sword developers. They have compiled it.  
But I have problems not only with this module "Lithuanian", but also with  
Russian bible "RST". Therefore I think it does not depend on module.

This problem occurs not only on my compiled sword. I have downloaded the  
binaries of the SWORD Project for Windows. It has the same problem.

Regards,

Linas Spraunius



On Wed, 19 Sep 2007 16:37:24 +0300, Troy A. Griffitts  
<scribe at crosswire.org> wrote:

> Linas,
>
> Could you look in the module's .conf file which you are searching and
> determine what the Encoding= entry says.  It if is not UTF-8 then sword
> will not attempt to use ICU on the text, even if it is compiled in.
>
> Having said that, the sword engine could use some attention in the area
> of utf8 processing.  Could you tell me if you are using an index module
> (clucene compiled in, and you've indexed the module using your favorite
> frontend, or the CLI mkfastmod)?  If you are using the unindexed search
> framework, I'm afraid it is not very utf8 friendly and could some tweaks
> to make things work correctly.  We've started down that path with the
> creation of a new class: StringMgr (thanks Joachim!), here:
>
> http://crosswire.org/svn/sword/trunk/src/mgr/stringmgr.cpp
>
> Though it currently only has toupper functionality.
>
> Old string routine declarations are here but should slowly go away.
> Currently their impl should just call the methods in StringMgr, but they
> might not all do that yet (I'm pretty sure stricmp does, which is the
> most widely used in the engine).
> http://crosswire.org/svn/sword/trunk/include/utilstr.h
>
> The search code is here:
>
> http://crosswire.org/svn/sword/trunk/src/modules/swmodule.cpp
> search for: multiword (2nd occurance)
>
> The comparison is done using stristr (bad), which should probably be
> changed to use a new StringMgr::stristrUTF8 method, but for now you
> could simply try changing stristr to strstr and toupper_utf8 both the
> input once before the loop, and the candidate buffer just before the
> comparison.  Sorry for the bad language support.
>
> 	-Troy.
>
>
>
> Linas S. wrote:
>>> I think that indicates that diatheke is built without ICU. Sword uses
>>> ICU to do upper case, but only if it is present.
>>
>> Unfortunately it is compiled with icu. Is it possible that I made a
>> mistake when compiling it? I enabled icu in usrinst.sh, then launched  
>> it.
>> After that - make, make install.
>>
>> Regards,
>>
>> Linas Spraunius
>>
>> _______________________________________________
>> sword-devel mailing list: sword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
>
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page



-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/m2/



More information about the sword-devel mailing list