[sword-devel] sword 1.5.9: problem while searching
mail at operis.org
Thu Sep 27 22:06:15 MST 2007
First, I would like to thank you for very detailed explanation.
I followed your text and tried check what is the exact situation in my
case. The module I use is "Lithuanian". In .conf file there is
Encoding=UTF-8. Both icu and CLucene are installed and enabled in
usrinst.sh. I am not sure if I understood your question concerning indexed
module. I have't run mkfastmod on the module "Lithuanian" - I just have
sent the text to someone of the sword developers. They have compiled it.
But I have problems not only with this module "Lithuanian", but also with
Russian bible "RST". Therefore I think it does not depend on module.
This problem occurs not only on my compiled sword. I have downloaded the
binaries of the SWORD Project for Windows. It has the same problem.
On Wed, 19 Sep 2007 16:37:24 +0300, Troy A. Griffitts
<scribe at crosswire.org> wrote:
> Could you look in the module's .conf file which you are searching and
> determine what the Encoding= entry says. It if is not UTF-8 then sword
> will not attempt to use ICU on the text, even if it is compiled in.
> Having said that, the sword engine could use some attention in the area
> of utf8 processing. Could you tell me if you are using an index module
> (clucene compiled in, and you've indexed the module using your favorite
> frontend, or the CLI mkfastmod)? If you are using the unindexed search
> framework, I'm afraid it is not very utf8 friendly and could some tweaks
> to make things work correctly. We've started down that path with the
> creation of a new class: StringMgr (thanks Joachim!), here:
> Though it currently only has toupper functionality.
> Old string routine declarations are here but should slowly go away.
> Currently their impl should just call the methods in StringMgr, but they
> might not all do that yet (I'm pretty sure stricmp does, which is the
> most widely used in the engine).
> The search code is here:
> search for: multiword (2nd occurance)
> The comparison is done using stristr (bad), which should probably be
> changed to use a new StringMgr::stristrUTF8 method, but for now you
> could simply try changing stristr to strstr and toupper_utf8 both the
> input once before the loop, and the candidate buffer just before the
> comparison. Sorry for the bad language support.
> Linas S. wrote:
>>> I think that indicates that diatheke is built without ICU. Sword uses
>>> ICU to do upper case, but only if it is present.
>> Unfortunately it is compiled with icu. Is it possible that I made a
>> mistake when compiling it? I enabled icu in usrinst.sh, then launched
>> After that - make, make install.
>> Linas Spraunius
>> sword-devel mailing list: sword-devel at crosswire.org
>> Instructions to unsubscribe/change your settings at above page
> sword-devel mailing list: sword-devel at crosswire.org
> Instructions to unsubscribe/change your settings at above page
Using Opera's revolutionary e-mail client: http://www.opera.com/m2/
More information about the sword-devel