[sword-devel] Musings about the Cherokee NT module
dfhmch at googlemail.com
Mon Jul 2 05:41:25 MST 2012
We should ignore pronunciation methods for processing Cherokee transcribed
The Sequoyah transliteration system is explicitly described as not being
based on phonetics!
Please refer to the Wikipedia page.
The edit distance method may be more fruitful, yet there are also hidden
assumptions and potential pitfalls.
(a) The Cherokee NT is not 100% proofread. Judging by comparisons with the
PDF file from Google books, it is sometimes quite difficult to judge where
word boundary spaces are. Moreover, several pairs of Cherokee symbols are
very alike visually, which coupled with the differences in the fonts, this
all makes character recognition quite a challenge. So I wouldn't be
surprised if the accuracy of the 2009 CNT text download is as low as 85% ( a
mere subjective guess ).
(b) Although many words will yield good edit distance scores (dewi = David,
equahami = Abraham), there will be several proper names or titles in which
the Cherokee is closer to a translation of the meaning of the original Greek
word. (tsisa = Jesus, galonedv = Christ).
An example of a missing space is in Mark 1:1 which reads,
adalenisgv yisdv kanohedv, tsisagalonedv unelanvhi uwetsi utseliga.
There should be a space between tsisa & galonedv.
The word capitalization task is therefore a huge challenge.
It may not be worth even starting it until the proofreading accuracy is much
closer to 100%.
It will help however to learn that there is a Cherokee English dictionary
And there are several other websites with useful resources for the Cherokee
But now I am running far ahead of my original musings, as to go down this
route would require someone with real competence in speaking and writing the
View this message in context: http://sword-dev.350566.n4.nabble.com/Musings-about-the-Cherokee-NT-module-tp4650474p4650490.html
Sent from the SWORD Dev mailing list archive at Nabble.com.
More information about the sword-devel