[sword-devel] osis2mod change
dmsmith555 at yahoo.com
Sun Feb 24 15:25:21 MST 2008
On Feb 24, 2008, at 4:46 PM, Chris Little wrote:
> DM Smith wrote:
>> I have added a -n flag to osis2mod.
> I'm going to add it to the other major importers (osis2gbs & imp2*)
> as soon as I get things into a fairly stable state.
>> This flag, to be enabled, requires osis2mod to be compiled with ICU
>> support enabled.
>> -n stands for normalized to NFC, the agreed upon UTF-8 encoding
>> When should this flag be used?
>> 1) When the input is UTF-8
>> 2) It is not known to be NFC
> First, I feel like there's really no reason NOT to perform
> normalization, provided that the input is UTF-8. Even if the input is
> already in NFC, it won't hurt anything to do it again. It will take
> extra time to compile the module, but I feel like it's better to be
> than sorry in this case.
I mostly agree. But once I know that the module is NFC, I'd rather not
take the hit. I must have made the KJV into a module 100 or more times
before I got it right.
> Second, your comment about needing UTF-8 input makes me think we
> go ahead and add encoding conversion to the importers as well,
> with automatic charset detection.
I'd like to see OSIS modules also be UTF-8.
What mechanism were you thinking of for automatic charset detection? I
have a buggy routine to detect whether something is UTF-8, 7-bit ascii
or other. We could use that (once I fix it).
As to automatic charset detection, could we require that every input
to osis2mod have:
<?xml version="1.0" encoding="UTF-8"?>
<?xml version="1.0" encoding="cp1252"?>
and use whatever is the value for the encoding attribute?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the sword-devel