[sword-devel] NFC and osis2mod

DM Smith dmsmith555 at yahoo.com
Thu Jan 31 18:19:14 MST 2008


Can someone offer some pointers as to what I am doing wrong?

I am trying to add the ability to osis2mod to optionally ensure that a  
UTF-8 document is normalized to NFC.

I added -n as a flag to indicate that normalization should occur and  
set a global boolean variable "normalize" to true iff the flag is  
present.

Rather than reinventing the wheel, I figured Sword's UTF8NFC filter  
would be the ticket.

First I added the header with:

#ifdef _ICU_
#include <utf8nfc.h>
#endif

And I created a global variable:

#ifdef _ICU_
UTF8NFC normalizer;
#endif


Then right before adding the entry I ran it through the filter:

#ifdef _ICU_
			if (normalize) {
				normalizer.processText(activeVerseText, (SWKey *)2);  // note the  
hack of 2 to mimic a real key. TODO: remove all hacks
			}
#endif

Now I ran the KJV.xml at www.crosswire.org/~dmsmith/kjv2006 through  
osis2mod.

Since I thought I had already normalized the text, I expected a diff  
to show nothing.

However I found corruption in Matthew 3:17 at the end of the raw text  
in the module. (and many places later.)

The corruption is always at the end of the line. Here is the raw text  
for that verse:
<w lemma="strong:G3588" morph="robinson:T-NSM" src="13"></w><w  
lemma="strong:G2532" morph="robinson:CONJ" src="1">And</w> <w  
lemma="strong:G2400" morph="robinson:V-2AAM-2S" src="2">lo</w> <w  
lemma="strong:G5456" morph="robinson:N-NSF" src="3">a voice</w> <w  
lemma="strong:G1537" morph="robinson:PREP" src="4">from</w> <w  
lemma="strong:G3588 strong:G3772" morph="robinson:T-GPM robinson:N- 
GPM" src="5 6">heaven</w>, <w lemma="strong:G3004" morph="robinson:V- 
PAP-NSF" src="7">saying</w>, <w lemma="strong:G3778" morph="robinson:D- 
NSM" src="8">This</w> <w lemma="strong:G2076" morph="robinson:V- 
PXI-3S" src="9">is</w> <w lemma="strong:G3450" morph="robinson:P-1GS"  
src="12">my</w> <w lemma="strong:G27" morph="robinson:A-NSM"  
src="14">beloved</w> <w lemma="strong:G3588 strong:G5207"  
morph="robinson:T-NSM robinson:N-NSM" src="10 11">Son</w>, <w  
lemma="strong:G1722" morph="robinson:PREP" src="15">in</w> <w  
lemma="strong:G3739" morph="robinson:R-DSM" src="16">whom</w> <w  
lemma="strong:G2106" morph="robinson:V-AAI-1S" src="17">I am well  
pleased</w>.<milestone resp="pdy 2003-12-14-08:48" type="x- 
strongsMarkup"/>="22"꧁


Any help would be appreciated.

Thanks!

Working together,
	DM Smith



More information about the sword-devel mailing list