[sword-devel] Observations about Thai script and the ThaiKJV module

David Haslam dfhmch at googlemail.com
Fri Jan 27 09:44:57 MST 2012


Thanks to Mike Hart for putting me on to this track.

Module ThaiKJV suffers from the lack of any spaces at word boundaries.

This has implications for the search methods in SWORD.

A feature of the Thai written language is that the words are discrete and
should wrap on word boundaries. Thai just displays no gap between words
(similar to ancient Greek).

World Bible Translation Center (WBTC) used a program called KUCut to insert
spaces for typesetting their Thai ERV NT.

This same method should work for the ThaiKJV too.

In fact, the person that did the KUCut process said she practiced with the
ThaiKJV bible and used dictionary information from it for KUCut to work with
religious texts. 

The space character would then need to be replaced with a thin or zero width
space to preserve the appearance.

This is unlike an online service I found which inserts ZWSP (or optionally a
hyphen to provide test output that's convincing). See
http://www.thai-language.com/?nav=zwsp

This utility prepares Thai text by inserting the proper Unicode "Zero-Width
Space Character" between detected word breaks.

KUCut is described here (in Thai)
http://veer66.wordpress.com/2009/11/23/kucutwindows/

The Python source code is maintained here
https://bitbucket.org/veer66/kucut 

David

--
View this message in context: http://sword-dev.350566.n4.nabble.com/Observations-about-Thai-script-and-the-ThaiKJV-module-tp4333992p4333992.html
Sent from the SWORD Dev mailing list archive at Nabble.com.



More information about the sword-devel mailing list