[sword-devel] NFC and osis2mod

DM Smith dmsmith555 at yahoo.com
Thu Jan 31 18:19:14 MST 2008

Can someone offer some pointers as to what I am doing wrong?

I am trying to add the ability to osis2mod to optionally ensure that a  
UTF-8 document is normalized to NFC.

I added -n as a flag to indicate that normalization should occur and  
set a global boolean variable "normalize" to true iff the flag is  

Rather than reinventing the wheel, I figured Sword's UTF8NFC filter  
would be the ticket.

First I added the header with:

#ifdef _ICU_
#include <utf8nfc.h>

And I created a global variable:

#ifdef _ICU_
UTF8NFC normalizer;

Then right before adding the entry I ran it through the filter:

#ifdef _ICU_
			if (normalize) {
				normalizer.processText(activeVerseText, (SWKey *)2);  // note the  
hack of 2 to mimic a real key. TODO: remove all hacks

Now I ran the KJV.xml at www.crosswire.org/~dmsmith/kjv2006 through  

Since I thought I had already normalized the text, I expected a diff  
to show nothing.

However I found corruption in Matthew 3:17 at the end of the raw text  
in the module. (and many places later.)

The corruption is always at the end of the line. Here is the raw text  
for that verse:
<w lemma="strong:G3588" morph="robinson:T-NSM" src="13"></w><w  
lemma="strong:G2532" morph="robinson:CONJ" src="1">And</w> <w  
lemma="strong:G2400" morph="robinson:V-2AAM-2S" src="2">lo</w> <w  
lemma="strong:G5456" morph="robinson:N-NSF" src="3">a voice</w> <w  
lemma="strong:G1537" morph="robinson:PREP" src="4">from</w> <w  
lemma="strong:G3588 strong:G3772" morph="robinson:T-GPM robinson:N- 
GPM" src="5 6">heaven</w>, <w lemma="strong:G3004" morph="robinson:V- 
PAP-NSF" src="7">saying</w>, <w lemma="strong:G3778" morph="robinson:D- 
NSM" src="8">This</w> <w lemma="strong:G2076" morph="robinson:V- 
PXI-3S" src="9">is</w> <w lemma="strong:G3450" morph="robinson:P-1GS"  
src="12">my</w> <w lemma="strong:G27" morph="robinson:A-NSM"  
src="14">beloved</w> <w lemma="strong:G3588 strong:G5207"  
morph="robinson:T-NSM robinson:N-NSM" src="10 11">Son</w>, <w  
lemma="strong:G1722" morph="robinson:PREP" src="15">in</w> <w  
lemma="strong:G3739" morph="robinson:R-DSM" src="16">whom</w> <w  
lemma="strong:G2106" morph="robinson:V-AAI-1S" src="17">I am well  
pleased</w>.<milestone resp="pdy 2003-12-14-08:48" type="x- 

Any help would be appreciated.


Working together,
	DM Smith

More information about the sword-devel mailing list