[sword-devel] new morphology

Sat Jan 26 17:06:45 MST 2008

We gave a new Greek morphology in the pipeline to replace virtually  
all of our existing morphologies, and I would be interested to hear  
people's opinions or concerns, considering it does represent a certain  
amount of change from the current system.

Presently we have 3 morphologies in use:
1) Robinson is used in Bibles like Byz that come from Maurice Robinson.
2) Packard is used in one or two special circumstances (LXX maybe?)  
and has roughly a subset of the Robinson tag semantics with slightly  
different encoding practices.
3) TVM codes appear in a few Bibles, but their actual explanation is  
not offered anywhere since Larry Pierce claims a copyright. They are  
also a subset of the Robinson system's semantics with completely  
different encoding (4 digit codes that look like an extension to  
Strong's numbers).

In case it's not obvious from the above, the new morphology is based  
on Robinson's system. Our current Robinson module is based on the tags  
actually present in Byz. All of those tags were  
programmatically decoded into the existing plain English entries. The  
new morphology takes the Robinson system and generates every possible  
tag plus its plain English explication.

When the Packard and TVM codes found in various modules are converted  
to Robinson format, the new Robinson module should have complete  
coverage. In fact, they should have coverage of all current and future  
possible morphology codes using the Robinson system--tens of thousands  
more than will ever actually appear.

The result is that the existing 150k Robinson module would grow to 2M.  
Is that size increase reasonable?

We could simply generate a list of all codes currently appearing  
across all modules, which would probably result in a module of 300k or  
less. That would handle all current codes, but might require updates  
in the future (and we wouldn't know if updates were necessary without  
going through the whole collation process over again).

We could push morphology code parsing into the library as a pseudo- 
module since it's not particularly difficult to parse the codes. That  
would result in the least size gain of all but would place the burden  
of re-implementation on JSword. It would likely be the fastest  
solution but would increase the library's memory footprint a little.

Thoughts?

--Chris