[sword-devel] Musings about the Cherokee NT module

Sat Jun 30 14:38:18 MST 2012

On Jun 30, 2012, at 7:42 AM, David Haslam <dfhmch at googlemail.com> wrote:

> 
> For that reason, I've developed a TextPipe filter to transliterate the
> Cherokee text to the Sequoyah Latin equivalents, using the information in
> the Wikipedia page about Cherokee.
> 

If you're using the icu-sword data bundle, which you are if you use Sword utilities that I compiled for Win32, then you have a reversible Cherokee-Latin transliterator already. It's then trivial to get transliterated text either by telling diatheke to transliterate as it outputs or you can use mod2imp followed by uconv, which can perform any transliteration transform known to its icu data bundle.

> At least this provides the possibility whereby proper names in the English
> KJV could be mapped to the right words in the Cherokee translation
> (i.e. by fuzzy matching and manual editing, perhaps with some intelligent
> guesswork).
> 
> This could pave the way for back-conversion from the Latin script to the
> Cherokee symbols, while at the same time converting the capitalized words to
> a suitable XML markup.
> 

It should be fairly trivial to automate tagging of the names. We really only need a list of names present within a particular verse, in English preferably. Then we can compute the edit distance of the Cherokee words in a verse to the names on our list of names in that verse. Finally, assign the Cherokee word with the lowest edit distance to the English name and tag accordingly. A type of Soundex edit distance would probably work best, but Levenshtein might suffice.

If you can locate a list of names in the Bible and all the verses in which they appear, I can implement the above algorithm to do the tagging.

--Chris