[osis-core] morph regex error

Troy A. Griffitts osis-core@bibletechnologieswg.org
Sun, 07 Dec 2003 23:43:00 -0700


Patrick,
	This is a serious restriction/change.  I specifically remember 
discussing this with you and we agreed that these tags should NOT be 
restricted to osisID-like syntax.

	Serious reasons:

	VERY REAL SCHEMES (probably the only ones that have ever been marked in 
OSIS) USE OFFENDING CHARACTERS.

	We have defined no escape character.

	Without an escape character EVERY SOFTWARE needs to magically KNOW the 
scheme used to recode these schemes, instead of just mindlessly 
displaying them to the scholar (which is what should be allowed).  This 
is unreasonable.

	I have texts that I need to release with this morphological scheme NOW, 
not when 3.0 is released.

	This is NOT a change that should have been applied without everyone's 
consent.


	Not to be a jerk, but being the one that asked for this attribute, and 
being the only one using this attribute that I know of, I'm a little 
ticked that it was changed.


	-Troy.



Patrick Durusau wrote:
> Troy,
> 
> I think the regex is correct, no hyphens are allowed. This does not mean 
> that you should use a range in any of these, although that is possible. 
> It does allow these to be used as osisRefs so that they can refer to 
> other sources of information.
> 
> Perhaps we should revisit at the January OSIS meeting but I don't think 
> we will reach a different conclusion.
> 
> Hope you are having a great day!
> 
> Patrick
> 
> Troy A. Griffitts wrote:
> 
>> :)
>>
>> Unless I'm going senile-- which I've been suspecting for some time 
>> now-- I believe that the last discussion on this subject, before 
>> release of 2.0, concluded that lemma, xlit, gloss, and morph WOULD NOT 
>> be restricted by osisRef syntax.  We would make a separate complexType 
>> for them, which basically would allow: prefix:any_string
>>
>> I think I wanted to allow spaces (expecially for gloss), Patrick found 
>> real world occurances of other systems that used prohibiting 
>> characters, as well.
>>
>> So the conclusion was either:
>>
>> prefix:any_string
>>
>> or
>>
>> prefix:any string
>>
>> I think Steve may have made some push for replacing the 'space' but 
>> don't remember the conclusion on that one.
>>
>> But regardless, there are no spaces in my offending line that I quoted 
>> earlier, and yet I still get an error.
>>
>> If I have to remove the cobwebs to defend this again, I will try, but 
>> think it's just a mis-sight in the .xsd.
>>
>>     -Troy.
>>
>>
>>
>>
>> Chris Little wrote:
>>
>>> Okay, okay.  No need to shout.  Don't kill the messenger.  Etc. :)
>>>
>>> The problem with changing the format is that we can no longer use 
>>> morph, lemma, etc. values as osisRefs.  As it stands, any of these 
>>> attributes could double as an osisRef/osisID.  So your lexicon, 
>>> organized by lemma, could have divisions with osisIDs that are the 
>>> same as their lemma values.  Likewise, if you organize the Robinson 
>>> morphology scheme as a sort of lexicon, you can look up entries and 
>>> tag them with osisIDs that are identical to your morph value.
>>>
>>> --Chris
>>>
>>> Troy A. Griffitts wrote:
>>>
>>>> NO!
>>>>
>>>>
>>>> Chris Little wrote:
>>>>
>>>>> Troy A. Griffitts wrote:
>>>>>
>>>>>> Hey guys.  It seems we may have messed up the regex on the morph 
>>>>>> attribute of <w>.
>>>>>>
>>>>>> Here my line:
>>>>>>
>>>>>> <w xml:lang="grc" lemma="strongs:15" morph="robinsons:V-PAM-2P" 
>>>>>> xlit="la:agaqopoieite">GREEK UTF8 TEXT HERE</w>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Here's the MSV error output:
>>>>>>
>>>>>> Error at line:279, column:117 of 
>>>>>> file:///space/home/scribe/msv/./lexcounts
>>>>>>   attribute "morph" has a bad value: the value does not match the 
>>>>>> regular expression 
>>>>>> "((((\p{L}|\p{N}|_)+)(\.(\p{L}|\p{N}|_))*:)((((\p{L})|(\p{N})|_)+)(((\.(\p{L}|\p{N}|_)+)*))?))". 
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> The value you give has never been valid.  Hyphens have never been 
>>>>> allowed in morph or lemma attributes (nor have spaces and various 
>>>>> other characters).  I think the decision we made before releasing 
>>>>> 2.0 was to force folks to transcode these as '_'.
>>>>>
>>>>> Does that work for you?
>>>>>
>>>>> --Chris
>>>>
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> osis-core mailing list
>>> osis-core@bibletechnologieswg.org
>>> http://www.bibletechnologieswg.org/mailman/listinfo/osis-core
>>
>>
>>
>> _______________________________________________
>> osis-core mailing list
>> osis-core@bibletechnologieswg.org
>> http://www.bibletechnologieswg.org/mailman/listinfo/osis-core
>>
> 
>