FW: [osis-core] character counting issue: proposed solution

Steve DeRose osis-core@bibletechnologieswg.org
Tue, 25 Jun 2002 12:11:39 -0400


At 06:27 AM -0400 06/21/02, Patrick Durusau wrote:
>Steve,
>
>Can you and Harry (and any one else who has comments on this issue) 
>derive a consensus on the solution?

I think if we went with my suggestion from earlier, all we'd have to 
do is change 'char' to 'cp' in the schema (and put a "?" for the 
+length if it isn't there already (skipping (g) below). The rest goes 
in comments or other prose. I'd be more inclined to go for Harry's 
suggesting of using just strings, except that string comparison has 
some of the same problems as counting in general...

>
>>I'm inclined to suggest:
>>
>>a) change 'character' to 'code point' and explain that it's dumb.

-- by 'dumb', i meant that it only counts code points, so surrogates 
and other such stuff may not come out right (except via (b)).

>>
>>b) adopt Harry's method of looking forward upon finding mismatch.
>>
>>c) make the +length optional, and default it to the string length
>>
>>d) state that length 0 is a point selection before the nth char
>>
>>e) state that offsets start at 1 and can't be negative to count backwards.
>>
>>f) state what happens if the offset or length goes beyond the 
>>content of the referenced element we're counting in. Just copy the 
>>xpointer rules on this, I suppose (now, if i could only remember 
>>what they are...).
>>
>>g) perhaps? make offset optional in which case you get the string. eh.
>>
>>Does that cut a plausible compromise on well-defined counting vs. 
>>ease of implementation? Any boundary cases left unspecified?
>>

-- 

Steve DeRose -- http://www.stg.brown.edu/~sjd
Chair, Bible Technologies Group -- http://www.bibletechnologies.net
Email: sderose@speakeasy.net
Backup email: sderose@mac.com, sjd@stg.brown.edu