[osis-core] morph regex error

Chris Little osis-core@bibletechnologieswg.org
Mon, 08 Dec 2003 02:50:35 -0600


Troy A. Griffitts wrote:
> I think you are sorely incorrect about historical facts, but in regard 
> to the current schema:
> 
> You may argue that it SHOULD conform to osisIDRegex, but NOW is not the 
> time to argue that.
> 
> The PROBLEM I have is not this:
> 
> <xs:attribute name="morph" type="osisIDType" use="optional"/>
> 
> it's NOT defined that way in the schema.  If that's what you want, we 
> can talk/debate about changing it to the above at our next meeting.
> 
> The PROBLEM is that being defined correctly, like this:
> 
> <xs:attribute name="morph" type="osisGenType" use="optional"/>
> 
> (which is how it IS defined in the official schema)
> osisGenType (osisGenRegex) SHOULD NOT BE RESTRICTED TO THE SAME THING AS 
> osisIDType (osisIDRegex) or we wouldn't have 2 types.
> 
> 
>     Does that make sense?
> 
>             -Troy.
> 
> 
> PS. Even if changing it to osisIDType was being proposed (which I think 
> you've done).  I still believe that a serious flaw exists in this proposal:

Really, that's not at all what I've "proposed" or would propose.  I 
simply believe that the morph attribute should be a value that can serve 
as an osisID (or osisRef, for that matter).  That means it needs to be 
equal to, or a subset of, the osisID regex.  It is such a regex 
currently, and I think it works well.

> There has to be a way programmatically to restore the encoding WITHOUT 
> the software knowing anything about the morph scheme or else we've 
> forced enumeration of the known morph schemes in software implementation.
> 
> e.g. You can't Change 'N-[G]@5' to 'N__G_5'.  If we ever decide to force 
> morph to conform to osisIDType, then we MUST provide a programmatic way 
> to restore the original morph code, e.g 'N%2D%5BG%5D%405'  Which I think 
> still looks horrible and is not acceptable to me, but at least would 
> allow me to remove the ambiguity and programmatically reconstruct the 
> original code.

Does "N-[G]@5" actually exist as a morphological tag in some system or 
are you just making up examples that would be difficult to encode?  I 
just looked through about 5 systems for morphological tagging in 
BibleWorks (and know of 3 others used elsewhere for biblical languages) 
and none of them require anything other than space or hyphen.  In 
linguistics, you might find a period used in a morphological tag (but in 
those cases, you would never find a hyphen or a space--and finding a 
period would itself be rare).

In no system that I'm aware of is any semantic content held by a 
character other than letters & numbers.  For that reason, I consider it 
truly irrelevent how these are rendered.  Internally, I believe they 
should be represented by underscores.  How they are rendered is a matter 
of preference for stylesheet designers to determine.

--Chris