[osis-core] whitespace

Chris Little osis-core@bibletechnologieswg.org
Fri, 08 Aug 2003 15:59:02 -0700


Troy A. Griffitts wrote:

>> Realizing that I probably hold the minority position, ;-), I would 
>> recommend normalizing as part of the application (note not the XML 
>> parser), all the white space in your example to single spaces.
> 
> no, no; I know of at least one other that might agree with you.

That would be me.  Contiguous whitespace should be equivalent to a 
single instance of any type of whitespace.

My best reason for saying that is that encoders will treat the situation 
as such if they have knowledge of HTML.  Editors like XMLSpy also 
happily insert whitespace for pretty formatting (though they might quit 
doing that if xml:space="preserve" were assigned).

I think Troy's example should reduce to:
<seg osisID="entry">This is an entry. I was just going to make 2 points:
o this is point 1 o and this is point 2</seg>
and that the person who encoded this should be chided, harshly.  Adding 
linebreak elements is simple and retains most of the important 
formatting of this.

All that said, I also forsee that there ARE a very few instances where 
contiguous whitespace itself needs to be encoded.  Stylistics like 
double space between sentences are one.  Another might be encoding some 
kind of manuscript or document facsimile where multiple spaces are 
interpreted to exist within the original.

Adding an &nbsp; entity seems like a pretty painless and helpful 
shortcut to add (since people could already use 0xA0), but might send 
the wrong message by encouraging presentation formatting.  Adding an 
element like HTML's <pre> would be another (extremely unpleasant, in my 
opinion) possibility.

--Chris