[sword-devel] Traditional Chinese to Simplified Chinese Conversion

David Haslam dfhmch at googlemail.com
Wed Feb 6 08:55:31 MST 2013


For the Simplified Chinese module *ChiUns*, the conf file includes this line:

.... converted to Simplified from Traditional via *MacOSX* (2011-01-22)

Now I'd guess that this conversion was done on the OSIS XML source text file
used to make the Traditional module *ChiUn*.

This prompts a question about Chinese [Traditional] to Chinese [Simplified]
(*C2C*) conversion and context.

Q. Does the presence of XML tags in between Chinese CJK characters affect
the outcome of the C2C conversion?

i.e. Would the following methods give identical results?

A. Convert OSIS XML file and then strip out all XML elements - just leaving
the Chinese Biblical text.
B. Strip out all the XML elements, and then perform the C2C conversion on
the Biblical text alone.

Discussing the linguistics underlying C2C with Andrew West of BabelPad
suggests that the answer might be "No".

This is because some of the converted CJK characters are language context
dependent when it comes to performing C2C algorithms.

Leaving in the XML elements would sometimes (though not always) break the
immediate linguistic context.

IMHO, an experiment is needed to determine whether the answer really is as I
suspect.

I'm not a Mac user, and to be fair to the module, the comparison needs to be
done using the aforementioned MacOSX tool.

Best regards,

David 



--
View this message in context: http://sword-dev.350566.n4.nabble.com/Traditional-Chinese-to-Simplified-Chinese-Conversion-tp4651907.html
Sent from the SWORD Dev mailing list archive at Nabble.com.



More information about the sword-devel mailing list