[sword-devel] Searching for hyphenated words?

Troy A. Griffitts scribe at crosswire.org
Mon Mar 4 06:25:47 MST 2013


Please remember,

SWORD already supports a search normalization layer.  We have 
normalizers for many things like accents, diacritics, etc., that we run 
on the text before passing the text to lucene (or using our own search 
mechanism).

SWORD has distinct stages where it applies filters.  The two most 
obvious are the render stage and the search stage (names Render and 
Strip in the engine).  We have many filters that do many different 
things and any can be applied to a module for normalizing during search 
by including a: LocalStripFilter=FilterName in the module's .conf file.

Here are the filters currently available:
http://www.crosswire.org/svn/sword/trunk/src/modules/filters/


So, for example, we use have:

LocalStripFilter=UTF8GreekAccents
LocalStripFilter=PapyriPlain

To normalize papyrilogical searches on the Duke Databank of Papyri:
http://crosswire.org/study/wordsearchresults.jsp?mod=DDP&searchTerm=%CF%80%CE%B1%CF%81%CE%B1%CE%B3%CE%B3%CE%B5%CE%BB%CE%BB*

These normalizations discussed certainly need to be discussed and 
considered but we have a mechanism in place to do this in SWORD.

Troy



On 03/03/2013 05:57 PM, DM Smith wrote:
>
> On Mar 3, 2013, at 11:53 AM, Chris Burrell <chris at burrell.me.uk 
> <mailto:chris at burrell.me.uk>> wrote:
>
>> Yes although in French only the contacted form is correct
>>
>>
>
> WRT indexing and searching, it really doesn't matter which is correct. 
> The normalization is not visible to the user. Normalization often goes 
> to forms that are ugly for the end-user.
>
> -- DM
>
>> On 3 Mar 2013 16:10, "David Haslam" <dfhmch at googlemail.com 
>> <mailto:dfhmch at googlemail.com>> wrote:
>>
>>     There are similar issues in French modules.
>>
>>     e.g. Some French Bibles have "coeur", some have "coeur", and some
>>     even use
>>     both!
>>
>>     etc., etc.
>>
>>     David
>>
>>
>>
>>     --
>>     View this message in context:
>>     http://sword-dev.350566.n4.nabble.com/Searching-for-hyphenated-words-tp4652016p4652042.html
>>     Sent from the SWORD Dev mailing list archive at Nabble.com
>>     <http://Nabble.com>.
>>
>>     _______________________________________________
>>     sword-devel mailing list: sword-devel at crosswire.org
>>     <mailto:sword-devel at crosswire.org>
>>     http://www.crosswire.org/mailman/listinfo/sword-devel
>>     Instructions to unsubscribe/change your settings at above page
>>
>> _______________________________________________
>> sword-devel mailing list: sword-devel at crosswire.org 
>> <mailto:sword-devel at crosswire.org>
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
>
>
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20130304/84ba5c05/attachment.html>


More information about the sword-devel mailing list