[osis-core] Lists in Attribute values: final call

Scribe osis-core@bibletechnologieswg.org
Mon, 20 Oct 2003 13:52:58 -0700 (MST)


While I conceded to making use of ' ' as a separator for a short time (~10 
hours), I believe I also would like to recant that concession.  There are 
just too many cases where a ' ' might occur in the data.

I have always thought choosing a ' ' for a list delimiter was a silly
thing.  I think the XML group will also feel the same in a future version
of the XML spec.

I don't like the idea of forcing USERS to modify data to meet the list 
requirement.  Patrick has a good point about user error (I realize not all 
documents will be hand edited by users and I'm sure that will be pointed 
out).  If I had to pick a ' ' or '|' to be least likely in the data, '|' 
for my money.  There are no tools that I know of that do anything useful 
with attribute lists based on spaces (I believe Patrick may have alluded 
to one in the message below).  It's easy for me to change (already done 
actually) my code to look for a '|' to separate my list.

I realize that following this logic might lead one to conclude that I 
should just as soon favour changing all lists to use '|', then.  Well, 
actually, I'd be fine with that.  Maybe it would speak loud enough to 
accelerate the change of the XML spec, or maybe I'm being arrogant again.  
I always get progress and the latter mixed up ;)

	-Troy.





On Mon, 20 Oct 2003, Patrick Durusau wrote:

> Greetings!
> 
> Well we have spilled a lot of ink, errr, electrons on this one!
> 
> At the heart of the dispute seems to me to be how one declares and 
> treats lists in XML attribute values.
> 
>  From an XML standpoint, it is really quite simple, if you want a list 
> in an attribute value, it is a space delimited list and that excludes 
> any values in the list that have spaces. End of discussion.
> 
> On the other hand, the no white space in the values is an arbitrary 
> limitation of XML lists, which may not conform to the data that we wish 
> to store in such lists.
> 
> Now the argument can be made (and has been made) that we can reform the 
> values that are to be placed in such lists (substitute underscores, 
> etc.) for the values as seen by a user entering the text.
> 
> The major problem with the reformation argument is that I tend to type 
> what I am familiar with more accuracy and consistency than I do if I try 
> to conform to an unfamiliar practice. Even when I know I should be using 
> an underscore or some other character, I will slip and if the prefix is 
> optional, there is no XML error to alert me to the error. (That is if: 
> pld:123 is valid, pld:123_567 is valid, but pld123 567 should not be. I 
> don't have a prefix on 567 and actually there should not be one because 
> I really meant: pld:123_567.
> 
> Now, using that same example, I can also write a list as 
> "pld:123|pld:123 567" because I am not using the XML list mechanism and 
> can have spaces, so long as the separator does not otherwise appear in 
> the string.
> 
> I can even validate that expression by requiring the "|" symbol between 
> the parts of the list, thus:
> 
> <xs:pattern 
> value="(((\p{L}|\p{N}|_)+)(\.(\p{L}|\p{N}|_))?):((\p{L}|\p{N}|\p{Zs}|_|\.|\-)*)?(\|(((\p{L}|\p{N}|_)+)(\.(\p{L}|\p{N}|_))?):((\p{L}|\p{N}|\p{Zs}|_|\.|\-)*)?)?"/>
> 
> Yeah, ugly isn't it?
> 
> The point of all this being that we are faced with two ways to handle 
> lists in attribute values:
> 
> 1. XML list (white space delimited)
> 
> 2. Delimited by some other separator (in the example the pipe "|" sign
> 
> Either way, the list must be processed by software to do more than find 
> something is in the list. So the question is: Does it really make any 
> difference to an application whether it splits on the "|" or on a white 
> space.
> 
> My sympathies are with the XML method but I do now know that there are 
> POS values (in modern Hebrew) that do have spaces.
> 
> Could take the path of saying that data has to be reformed to meet our 
> specifications but that introduces user error.
> 
> Where I am coming out on this is that I don't see the benefit of 
> following the whitespace protocol of the XML standard. Won't be 
> processed meaninfully by an XML parser anyway so I am not sure what that 
> gets us for these cases.
> 
> Note that I am aware of the uses of list where you have an enumerated 
> set of values to validate against an attribute value restriction, but so 
> far as I know, no one has proposed such a set for any of these 
> attributes. That would be a case for making it a list but I would be 
> real leary of saying that everyone had to use our names for their 
> linguistic categories.
> 
> Got to run, have to eat my snack and jump into a conference call on 
> OpenOffice.
> 
> Will try to make the rounds this afternoon so we can get back on schedule.
> 
> Hope everyone is in good health and spirits!
> 
> Patrick
> 
>