[sword-devel] Normalising on the commandline

DM Smith dmsmith555 at yahoo.com
Wed Jan 21 08:34:03 MST 2009


Peter von Kaehne wrote:
> As a side issue of the other debate - how can I achieve NFC for a text I
> am working on via commandline utilities?
>
> All I can find in ICU documentation is about programming methods
> available, but I have seen no command line utilities.
>
> Peter
You can use perl to do it, using the following module:
http://search.cpan.org/~sadahiro/Unicode-Normalize-1.02/Normalize.pm
Note, the more recent the version of perl, the more recent the version 
of unicode. See the bottom of the page for the mapping.

Once this is installed, it should be something like: (I'm going from 
memory as I haven't used perl significantly for quite a while)
    perl -p -i.bak -MUnicode::Normalize  -e '$_ =  NFC($_)' filename
This will rename x.txt to x.txt.bak and apply the argument of -e to 
every line and then print the line.
For more details see:
    perldoc perlrun

The tei2mod and osis2mod do conversion to Unicode and NFC normalization 
by default. You can turn it off when you know the input is already NFC 
or that it is cp1252. Chris has said that he'd like all the module 
making programs to be modified to do the same.

Hope this helps.

In Him,
    DM






More information about the sword-devel mailing list