[sword-devel] Latin diacritics

Dominique Corbex dominique at corbex.org
Sun Jan 31 13:29:42 MST 2016


There are annoying search problems in French on words including:
- accented letters
- ligatures

Here is a sample, the first query show the number of results for
'Égypte' with an acute accent, the second without:

$ diatheke -b FreCrampon -s phrase -k Égypte | tr ';' '\n' | wc -l 
107
$ diatheke -b FreCrampon -s phrase -k Egypte | tr ';' '\n' | wc -l
498

Not all OS allow the user to easily enter ligature, so some texts
have ligatures directly converted to regular letters, others have not.


So, for languages based on Latin script, shouldn't SWORD provide a strip
filter to remove accents and ligatures? What do you think?

-- 
domcox <dominique at corbex.org>



More information about the sword-devel mailing list