[osis-core] <hi> types

Todd Tillinghast osis-core@bibletechnologieswg.org
Thu, 21 Aug 2003 10:57:06 -0600


Troy,

In the previous post that I referred to in the post yesterday, I
suggested the use of &xxx within the text.  I think that a combination
of &xxx entities, <lb>, and a possibly a few other OSIS elements
addresses the need for white space preservation and as Patrick points
out will likely provide more consistent results.

I think the Chinese case can be handled EITHER at presentation time or
by the use of &xxx.

I think tabs should not be preserved.  There is a meaning there that
should be encoded as either a <l> in a <lg>, an <item> in a <list>, or
as a part of a <p> or <div> that has a specific characteristic.

In most cases I think line breaks should encode as the starting and
ending of an element rather than <lb/>.

I to am interested in having a large volume of texts marked up and made
available, but I think we will hinder the longer term usefulness and
adoption if we allow too much non-markup to carry meaning within PCData
elements.

Is it possible to encode text by breaking the text up into general
sections with osisID attributes but NOT including characters within the
PCData that carry presentational or ideological meaning?

This would make the texts available without requiring hand editing.
Then later as people find a text valuable they could go back and add in
additional markup to the texts that warrant the effort.  But things like
lists that could not automatically be identified as a list and encoded
as <lg>/<l> or <list>/<item> would simply flow in the text without a
line break or indentation.  

What are your thoughts?

Todd






> -----Original Message-----
> From: osis-core-admin@bibletechnologieswg.org [mailto:osis-core-
> admin@bibletechnologieswg.org] On Behalf Of Troy A. Griffitts
> Sent: Thursday, August 21, 2003 12:39 AM
> To: osis-core@bibletechnologieswg.org
> Subject: Re: [osis-core] <hi> types
> 
> Todd,
> 	How would YOU suggest we force people to markup 2 spaces between
> sentences?
> 	2 spaces between STATE and ZIP in an address?
> 	Extra spaces before GOD in Chinese?
> 	Preserve TABs?
> 	Preserve NewLines?
> 
> 	How would YOU suggest we allow large amounts of data, like I
have
> suggested WON'T make it into OSIS and Harry seems to think the same,
if
> we FORCE the people marking up text to add all these in by hand?
> BETWEEN EVERY SENTENCE (Whatever you propose, as we don't even have a
> &nbsp; right now).
> 
> 	Wouldn't it be nice to take a LARGE volume of texts that aren't
> worth
> spending the time to markup in detail, tack the
> xml:whitespace="preserve" tag to the top, break it up into general
> sections with osisID attributes and be done with it?
> 
> 	-Troy.
> 
> 
> 
> Todd Tillinghast wrote:
> > Troy,
> >
> > I think <hi> and xml:whitespace fall into two different categories.
I
> > think the discussion to date points away from the need for
> > xml:whitespace.
> >
> > Todd
> >
> >
> >>-----Original Message-----
> >>From: osis-core-admin@bibletechnologieswg.org [mailto:osis-core-
> >>admin@bibletechnologieswg.org] On Behalf Of Troy A. Griffitts
> >>Sent: Wednesday, August 20, 2003 3:32 PM
> >>To: osis-core@bibletechnologieswg.org
> >>Subject: Re: [osis-core] <hi> types
> >>
> >>So does that mean we intend to honor the xml:whitespace="preserve"
> >>attributed suggested by W3C?
> >>
> >>Patrick Durusau wrote:
> >>
> >>>Harry,
> >>>
> >>>Harry Plantinga wrote:
> >>>
> >>>
> >>>>>I am concerned that encoders using would use the presentation
> >
> > related
> >
> >>>>>elements RATHER THAN other elements.  (Ex <hi
> >>>>>type='smallCaps'>Lord</hi> rather than <divineName
> >>>>>type='yhwh'>Lord</divineName>, etc...)
> >>>>>
> >>>>>I do see a need for <hi> in non-Biblical texts.  If as Chris
> >
> > suggests
> >
> >>>>>we use <hi> to encode meaning and not presentation we will be
> >
> > better
> >
> >>>>>off. I would like to say away from type values of bold, italics,
> >>>>>etc... in favor of strongEmphasis, emphasis, etc...  I don't have
> >
> > a
> >
> >>>>>good suggestions for a comprehensive set of a type values.
> >>>>
> >>>>
> >>>>
> >>>>I've seen this debate many times before and usually it is not
> >>>>settled to everyone's satisfaction. However, it is clear that
> >>>>there are times when italics, bold, etc. will be present in a text
> >
> > and
> >
> >>>>will not be representable in any OSIS markup apart
> >>>>from something like <hi type="bold">.
> >>>>
> >>>
> >>>Say its not so, Harry! ;-)
> >>>
> >>>
> >>>>It is also clear to me that 95% of the time encoders are going
> >>>>to be unwilling to go through an old book and figure out
> >>>>what each instance of italicized text means when there is
> >>>><hi type="italics"> available that meets 95% of people's usage
> >>>>needs.
> >>>>
> >>>>That is, everyone has a threshhold at which they say "I just
> >>>>mean italics, darnit!" but if italics is an available markup
> >>>>option, it'll be used much more than some will find desirable.
> >>>>
> >>>>But if there is no way of marking some text as 'italics', OSIS
will
> >>>>not be usable for quick-and-dirty conversion of
> >>>>texts from one markup to another -- only for very laborious,
> >>>>hand-tuned markup. If that's what you want, go for it!
> >>>>
> >>>
> >>>I think Harry has the right of it, reluctantly, but I do. Getting
> >
> > large
> >
> >>>amounts of texts into some semblance of reasonable markup is
> >
> > difficult
> >
> >>>enough without insisting on practices that most encoders either
> >
> > aren't
> >
> >>>capable of following or won't. At best the material is unmarked
> >>>altogether, at worse they don't use the markup system at all.
> >>>
> >>>I would go with Chris's suggestion of common names, such as italic,
> >>>bold, etc., (yea, verily, presentation language) rather than less
> >>>intuitive alternatives.
> >>>
> >>>Actually we could begin to build NLP software with knowledge bases
> >
> > of
> >
> >>>terms, names, etc., that would allow some automated upgrading of
> >
> > less
> >
> >>>complex encoding.
> >>>
> >>>Hope everyone is having a great day!
> >>>
> >>>Patrick
> >>>
> >>>
> >>>>-Harry
> >>>>
> >>>>_______________________________________________
> >>>>osis-core mailing list
> >>>>osis-core@bibletechnologieswg.org
> >>>>http://www.bibletechnologieswg.org/mailman/listinfo/osis-core
> >>>>
> >>>
> >>>
> >>_______________________________________________
> >>osis-core mailing list
> >>osis-core@bibletechnologieswg.org
> >>http://www.bibletechnologieswg.org/mailman/listinfo/osis-core
> >
> >
> > _______________________________________________
> > osis-core mailing list
> > osis-core@bibletechnologieswg.org
> > http://www.bibletechnologieswg.org/mailman/listinfo/osis-core
> 
> _______________________________________________
> osis-core mailing list
> osis-core@bibletechnologieswg.org
> http://www.bibletechnologieswg.org/mailman/listinfo/osis-core