[sword-devel] French ligatures in Louis SÉGOND’s text
Troy A. Griffitts
scribe at crosswire.org
Mon Jul 16 01:06:55 MST 2007
Regarding searching, when you can define the correct behavior for a
language, we can search appropriately in sword. We do something similar
for the early papyri and inscription databases.
1) We add a special strip filter to the .conf file appropriate for the
module, which normalizes the search text.
2) Then, the user supplied search expression is also passes through this
It seems to work quite appropriately for the papyri searching. You can
supply accented or unaccented greek, and searches will match even
regardless of transcription marks like (), etc.
Here is an example. Notice the unaccented search text and the different
results, even across papyri annotations.
Eeli Kaikkonen wrote:
> On Sun, 15 Jul 2007, Chris Little wrote:
>> Should umlauted letters be decomposed also? So a-umlaut becomes ae,
>> o-umlaut becomes oe, u-umlaut becomes ue--which works fine for German,
>> but I doubt for many other languages.
> It doesn't work for Swedish and I think not for other Scandinavian
> languages either. Definitely not for Finnish.
>> The only ligatures that we could safely decompose without reference to
>> language are typographic ligatures, and we would never encode those as
>> ligatures in the first place.
> As was said, KJV and Webster's use ligatures (IIRC the first entry in
> Webster's is a ligature - or was it the last one?)
> For those languages/modules where the occasions are rare, could it be
> possible to add special markup inside the module? like
> Then that markup could be indexed with the main text for the search.
> Eeli Kaikkonen (Mr.), Oulu, Finland
> e-mail: eekaikko at mailx.studentx.oulux.fix (with no x)
> sword-devel mailing list: sword-devel at crosswire.org
> Instructions to unsubscribe/change your settings at above page
More information about the sword-devel