[sword-devel] What is markup?

Todd Tillinghast sword-devel@crosswire.org
Fri, 19 Mar 2004 09:25:31 -0700


Michael,

I am trying to understand why you think by putting quote marks "in the
text" rather than in an attribute makes the quote mark any more or less
a part of the "Bible text".

If I were to encode a Bible at the character level as follows:
<verse osisID="Gen.1.1"><c value='I'/><c value='n'/><space/><c
value='t'/><c value='h'/><c value='e'/>...</verse>

vs

<verse osisID="Gen.1.1">In the...</verse>

Are the characters "In the" any more or less a part of the encoding
either way?

By using XML you MUST entities for some characters (<, >, /, ...).
These are not plain text but rather a place holder for those characters.

Most encoders are satisfied to logically represent the start and end
quote marks with the <q> element it self and let the rendering process
choose the glyph to be rendered.  The point you bring is that there are
cases where this is not sufficient, because not all the information the
translator intended can be represented with this more simplistic model.

What I suggested with the use of the "n" attribute was that rather than
simply encoding a <q> element that records the start and end of a quote
(and having that character to render be up to the rendering process), we
could also allow the option for the encoder to specify that a specific
character should be used rather than leaving it up to the rendering
process.  

The thing that is troubling with <q n="" sID="uniqueID"/>"text text"<q
n="" eID="uniqueID"/> is that you have said that there is a quote that
has no punctuation to delimit and that within that quote there is a
character ["] that is simply a character and DOES NOT carry the meaning
that a quote is starting or ending but rather that there is a word
["text] at the first of the quote and another word [text"] at the end of
the quote.

Naturally the follow on argument is that in [This is a whole sentence.]
that [sentence.] is a whole word in that the period is punctuation in
the same way that a quote mark is punctuation.  The difference is that
you use a <q> element to encode the quote.

Todd