[sword-devel] XML Numeric character references (entities) in BibleCS

Chris Little chrislit at crosswire.org
Thu Jan 31 15:16:14 MST 2008


On Jan 31, 2008, at 1:29 PM, Benny Wasty wrote:

> Hello,
>
> I noticed that BibleCS doesn't seem to be able to display unicode
> characters encoded as numeric character references (e.g. ö) in an
> OSIS module I am currently working on. The characters are just  
> omitted.
> I guess they should be displayed correctly, as this a "basic" XML
> feature as far as I know.
> BibleDesktop shows them by the way.

Correct, Sword does not handle numbered entities. I don't think we  
want to add support for them at runtime either, because doing so would  
1) waste processor time in converting to UTF-8 and 2) waste a lot of  
storage space compared to UTF-8. I will, however add a todo to the bug  
tracker to do conversion to UTF-8 during import.

All data in modules is assumed to be NFC normalized UTF-8.

I haven't looked at the code or tested this, but I would be willing to  
bet BibleDesktop is displaying you characters correctly but wouldn't  
match them in a search.

--Chris




More information about the sword-devel mailing list