[sword-devel] Musings about the Cherokee NT module

Sat Jun 30 07:42:03 MST 2012

A few weeks ago, I spent some time researching the Cherokee NT module
Che1860.

The source text comes from the *Cherokee New Testament* project at
www.cherokeenewtestament.org

The Cherokee New Testament was published in 1860 by the American Bible
Society.

The project's electronic edition is still undergoing proofreading.
Our module is is based on the Feb 20, 2009 edition (downloaded 2009-05-13).

The text is also available in PDF facsimile at
http://books.google.com/books?id=v0MTAAAAYAAJ&oe=UTF-8.

I tried to make contact with the project leader, but have not had any
response so far.
I have not yet found any evidence of further progress for the project since
2009.

One of the interesting points about the Cherokee syllabary in Unicode is
that there is no uppercase set of symbols.
However, the historic practice during the 19th century was that proper names
and the start of a sentence were printed with a symbol enlarged by 20%.

I did download the PDF file from Google books, which confirmed this
observation.

The question arises therefore - were we to attain a position in text
development whereby we could correctly identify all the places where
enlarged characters were used in the 1860 edition, how might we encode this
for presentational purposes using SWORD front-ends?

As far as I can judge, we'd need to use a suitable XML construction in OSIS.
Even USFM does not have a special character style marker to enlarge portions
of text.

So it would probably boil down to using a custom extension to OSIS
attributes, in order to mark the symbols that should be enlarged by 20%.

Of course we are a long way off from achieving the ostensible goal of
implementing such an enhancement. 

The relative unfamiliarity of the Cherokee syllabary to non-Cherokee
speakers presents an initial hurdle to be overcome.

For that reason, I've developed a TextPipe filter to transliterate the
Cherokee text to the Sequoyah Latin equivalents, using the information in
the Wikipedia page about Cherokee.

At least this provides the possibility whereby proper names in the English
KJV could be mapped to the right words in the Cherokee translation
(i.e. by fuzzy matching and manual editing, perhaps with some intelligent
guesswork).

This could pave the way for back-conversion from the Latin script to the
Cherokee symbols, while at the same time converting the capitalized words to
a suitable XML markup.

I have already developed a method to back-convert (case-insensitive) Latin
to Cherokee, which (along with the aforementioned filter) overcomes all the
ambiguities that I identified during my researches.

I don't wish to spend a lot more time on this, unless there might be a real
prospect of enhancing SWORD to do what is necessary to enlarge individual
symbols in scripts (such as Cherokee) that do not make any distinction
between lowercase and uppercase in Unicode.

Further details available on request.

David Haslam

--
View this message in context: http://sword-dev.350566.n4.nabble.com/Musings-about-the-Cherokee-NT-module-tp4650474.html
Sent from the SWORD Dev mailing list archive at Nabble.com.