[sword-devel] USFM end marker clarification
Kahunapule Michael Johnson
kahunapule at mpj.cx
Tue Aug 14 11:44:49 MST 2012
On 08/13/2012 11:02 PM, Chris Little wrote:
> On 08/13/2012 11:09 PM, David Haslam wrote:
>> The relevant paragraph is,
>> "In USFM, character level markup can be nested (embedded) within a paragraph
>> element, or another character element, but (depending on the way in which
>> the markers are written) does not necessarily cancel out the previous
>> marker's attributes. Paratext (a UBS translation editor) is not capable of
>> rendering all of the display variations that would be implied due to
> It's not relevant to note- or cross-reference-internal markup. In those cases, the reference is explicit that "Paratext ... will interpret the presence of a new marker as an implicit closure of any preceding character level marker."
> I can't really conceive how such a dramatic change to USFM is considered acceptable. And the imprecise hedges like 'not necessarily' in the quoted paragraph are not terribly reassuring about USFM's reliability as an archival format. The interpretation of a document encoded in USFM prior to this change in the reference may have been altered by the change in the reference, without any action on the part of the encoder.
> FWIW, USX explicitly does not permit nesting of character level markup (<char>). One could encode everything with milestone elements instead of containers, but the fact the USX schema could easily allow <char> nesting but doesn't indicates to me that implicit nesting is not an intended interpretation.
Indeed, character style nesting is an issue because of (1) varying interpretations of the standard, at least initially, (2) the (false) assumption that it isn't needed in real texts, and (3) implementation choices made historically in Paratext.
Right now, I just disallow it, but there are some styles that are truly orthogonal in the way they are normally presented that could easily be nested with better results in presentation. For example, a red-letter KJV... or better yet, a red-letter NASB, which might at some points overlap Words of Jesus (red), supplied/added text (italics) and OT quotes in the NT (small caps). Other cases include a PNG language that italicizes borrowed words, which may be within a quote of Jesus. (Note: in cases where
presentational vs. functional markup separation meet the requirements of translators who have spent decades of their blood, sweat, and tears translating a Bible, the translators win, one way or another. Sometimes new markup has to be created or compromises in use of deprecated markup made to keep the customers happy.)
In the PNG case, there is no red letter markup, and in the KJV case, I stop \wj ...\wj* markup for \add ...\add* then restart it afterwards, yielding some ugly results on display. In the NASB case, I don't bother, because it isn't freely redistributable. Thus, I perpetuate the idea that the assumption that character style nesting isn't needed in real texts, but it is just an illusion. :-)
The last I heard, however, future versions of Paratext will support character style nesting, and USX will follow suit in doing so.
More information about the sword-devel