[sword-devel] Observations about Thai script and the ThaiKJV module
dfhmch at googlemail.com
Sat Jan 28 04:52:14 MST 2012
Here's the KUCut README:
### KU Wordcut Installation Instructions ###
### Copyright (C) 2004 Kasetsart University, NAiST Laboratory.
### Author: Sutee Sudprasert <sutee at vivaldi.cpe.ku.ac.th>
KU wordcut is thai word segmentor that is difference from existing
segmentor such as CTTEX or SWATH.
The main objective of CTTEXT or SWATH is wrapping the text then speed up of
computing is the most
important then accuracy or precision may be ommited. By the way, some tasks
such as NLP, prefers
more precision than speed up of computing. In the mention before, we will
attempt to build the segmentor
that is suitable for NLP tasks. Our segmentor can reduce some problem that
be ommited in CTTEX or SWATH
such as unknown recognition and some case of boundary ambiguity.
The algorithm using in this segmentor have been proposed in NCSEC 2003
processing (Thai word segmentation
based-on Local and Global Unsupervised Learning).
python 2.5 or above
using the followed command on command line
python setup.py install
How to use
on command line use
kucut [option] <filename>
<filename> is input filename.
--line=?? for replace space with some special character, default is "/n"
Report Bug & Comment
E-mail : <sutee at vivaldi.cpe.ku.ac.th> or <cpe11_sutee at yahoo.com>
MSN : cpe11_sutee at hotmail.com
ICQ : 88938507
View this message in context: http://sword-dev.350566.n4.nabble.com/Observations-about-Thai-script-and-the-ThaiKJV-module-tp4333992p4335903.html
Sent from the SWORD Dev mailing list archive at Nabble.com.
More information about the sword-devel