[sword-devel] search failing in Hebrew modules

Chris Little chrislit at crosswire.org
Thu Jul 30 23:45:35 MST 2009


DM Smith wrote:
> Unicode can have multiple possible representations (byte sequences) for 
> a single decorated character. Search will work only if the request and 
> index match.

Something to bear in mind here is that, while we've agreed to 
standardize on NFC normalization of Unicode, WLC is not normalized. This 
is because of some issue with NFC and Hebrew decorated with vowels, 
dagesh, & cantillation that results in incorrect rendering. So in those 
cases (and I don't know how rare they are) where our encoding differs 
from NFC, there could be a mismatch.

Thus, for WLC, it would be wise to include UTF8NFC() in the set of 
stripFilters--in addition to NFC normalizing the search key provided by 
the user.

--Chris



More information about the sword-devel mailing list