[sword-devel] new morphology
chrislit at crosswire.org
Sat Jan 26 17:06:45 MST 2008
We gave a new Greek morphology in the pipeline to replace virtually
all of our existing morphologies, and I would be interested to hear
people's opinions or concerns, considering it does represent a certain
amount of change from the current system.
Presently we have 3 morphologies in use:
1) Robinson is used in Bibles like Byz that come from Maurice Robinson.
2) Packard is used in one or two special circumstances (LXX maybe?)
and has roughly a subset of the Robinson tag semantics with slightly
different encoding practices.
3) TVM codes appear in a few Bibles, but their actual explanation is
not offered anywhere since Larry Pierce claims a copyright. They are
also a subset of the Robinson system's semantics with completely
different encoding (4 digit codes that look like an extension to
In case it's not obvious from the above, the new morphology is based
on Robinson's system. Our current Robinson module is based on the tags
actually present in Byz. All of those tags were
programmatically decoded into the existing plain English entries. The
new morphology takes the Robinson system and generates every possible
tag plus its plain English explication.
When the Packard and TVM codes found in various modules are converted
to Robinson format, the new Robinson module should have complete
coverage. In fact, they should have coverage of all current and future
possible morphology codes using the Robinson system--tens of thousands
more than will ever actually appear.
The result is that the existing 150k Robinson module would grow to 2M.
Is that size increase reasonable?
We could simply generate a list of all codes currently appearing
across all modules, which would probably result in a module of 300k or
less. That would handle all current codes, but might require updates
in the future (and we wouldn't know if updates were necessary without
going through the whole collation process over again).
We could push morphology code parsing into the library as a pseudo-
module since it's not particularly difficult to parse the codes. That
would result in the least size gain of all but would place the burden
of re-implementation on JSword. It would likely be the fastest
solution but would increase the library's memory footprint a little.
More information about the sword-devel