[sword-devel] Normalising on the commandline

Chris Little chrislit at crosswire.org
Wed Jan 21 12:01:16 MST 2009


Peter von Kaehne wrote:
> As a side issue of the other debate - how can I achieve NFC for a text I
> am working on via commandline utilities?
> 
> All I can find in ICU documentation is about programming methods
> available, but I have seen no command line utilities.

DM's suggestion of using the Perl facility is fine, and I use it myself 
plenty often when I'm scripting Perl. But there's also an ICU utility 
which can achieve normalization (and much more).

uconv (meant as a replacement for iconv, if you're familiar with that) 
does codepage/encoding conversion, transliteration, and normalization. 
It's part of the standard ICU distribution and we have Windows binaries 
on the FTP site:
http://crosswire.org/ftpmirror/pub/sword/utils/win32/uconv.zip
http://crosswire.org/ftpmirror/pub/sword/utils/win32/icudt40-big.zip

(I'd recommend the big, 7.6 MB version of the ICU data for this.)

Use is fairly straightforward, but to take a file "input" and NFC 
normalize it as a file "output" you would use (assuming both are UTF-8):

uconv -f utf-8 -t utf-8 -x NFC -o output input

--Chris



More information about the sword-devel mailing list