[sword-devel] Sword support of indents and line breaks

John Austin gpl.programs.info at gmail.com
Sat Apr 13 03:33:16 MST 2013

On 04/13/2013 01:47 PM, Troy A. Griffitts wrote:
> John,
> I'm trying to sympathize with you, but I'm having a hard time. I still
> have no clue WHAT the translator is trying to convey to the reader with
> the indent. Can you explain?

Yes, essentially the translator works very hard to make their texts both 
true to their source AND as comprehensible as possible to their readers. 
Ordinarily, at the beginning of the translation process, a draft of a 
new text will be printed and passed to a variety of people who know the 
language well (Christians, non-Christians, the less educated, the more 
educated, young, old, male, female). They will be asked to retell the 
particular Scripture segment in their own words, and they will be asked 
some basic questions about the passage, and they will be asked to read 
the passage out loud directly from the page. All this is done to gauge 
and evaluate the level and ease of comprehension which they are 
achieving with their written translation.

It has been demonstrated through such comprehension testing, that indeed 
indents are important to achieving the best comprehension. For instance, 
in one major language of the region, it was discovered that when a quote 
margin (ie. "Jesus said", or "said Jesus") is embedded in a paragraph, 
that people would often trip over the reading. And when, furthermore, 
the quote margin was placed between quotes, or breaking up the quotation 
(ie "Go forth and multiply", said Joe, "and also live long and prosper") 
that comprehension sometimes completely failed. The reader did not 
understand that the quote was a continuation of the same speaker. It was 
found that the best comprehension was attained by starting with a 
newline followed by an indent, followed by the quote margin, then by a 
colon and another newline followed immediately by another indent 
(without a blank line in between), and then finally followed by the 
entire quote itself. This construct was determined, for this particular 
language, to bring the best comprehension to readers of this language. 
This construct may be unique to this language, or it may not (turns out 
it's not). The point is that the careful selection of the indents and 
the line breaks (or lack thereof) are very intentional, so as to provide 
the best and quickest comprehension.

The bottom line is this:
In every way that the words themselves are content, so are the indents 
and line breaks. All three components: the words themselves, the 
indents, and the line breaks are all deliberately chosen to achieve the 
best possible comprehension of the written text. None of these are added 
later for reasons of style, structure, or presentation. They are each 
added one at a time by a translator's hand to achieve maximum 
comprehension. Via the indents and line breaks the translator is trying 
to convey the meaning of Scripture in the most comprehensible way possible.

Translators are very good at this sort of thing, as Peter attests. But 
let me also share what translators are not so good at: Translators 
generally care nothing about style sheets, can't even install Paratext 
on their own computer, and will look at you glassy eyed if you even 
suggest they change their markup for semantic reasons. True, they SHOULD 
care, but these kind of people often just can't, and if any of you want 
to talk to our translators about this I'll be happy to put you in 
contact. I've tried and failed (my wife is a translator). My point here 
is this: translators see and work with the words, indents, and line 
breaks, they do not work with the markup! We as computer people should 
make some adjustments for the real world. It will be good for everybody. 
If we just let the publisher decide whether a bit of their content 
should be considered as a paragraph, or as an indent, then the text can 
be encoded true to the translator's wishes on a case by case basis. This 
will in turn improve comprehension of Sword texts in other languages. 
This will save all the translators' content for future generations. I 
think it would work out great.


> John Austin <gpl.programs.info <http://gpl.programs.info>@gmail.com> wrote:
>     On 04/13/2013 09:24 AM, Chris Little wrote:
>         On 4/12/2013 11:18 AM, John Austin wrote:
>             On 04/12/2013 07:45 PM, Chris Little wrote:
>                     I've worked with many, many SFM texts, and they
>                     often do not follow SFM
>                     rules or play nice in a variety of ways. All of this
>                     greatly
>                     complicates
>                     an already serious conversion from SFM to Sword. The
>                     proof may in the
>                     the pudding. Simple is sometimes better in the real
>                     world. Sure, IBT
>                     could recreate their modules using container
>                     elements, but that still
>                     would not provide the reliability or control enjoyed
>                     by the existing
>                     modules. I still don't see (beyond theory and
>                     arguable semantics) a
>                     good
>                     reason to deny "customers" a sound and working solution.
>                 As a rule, we don't do things incorrectly when we know
>                 that they are
>                 wrong beforehand. Indent milestones are arbitrary, ad
>                 hoc, bad
>                 engineering practice, and bad markup practice.
>                 Generating &nbsp;s as
>                 pretend paragraph indentation is bad (X)HTML and
>                 completely inflexible.
>                 (What happens when a content provider wants a half
>                 indent? A hanging
>                 indent?) The proposal is a big kludge. We should instead
>                 implement the
>                 correct method of generating indented and other
>                 paragraph types.
>             They work perfectly well. They validate against the OSIS
>             schema. They
>             are good engineering practice because they solve a difficult
>             problem
>             without negative effects of any kind. We can argue about bad
>             markup etc.
>             but some grace should be given to an approach that is proven and
>             perfectly valid, which already exists in practice, and which
>             has solved
>             a nagging real life problem.
>         They don't work perfectly well.
>         In terms of representation, the milestones represent something that
>         isn't there and should instead be a property of something that
>         actually
>         is there.
>         In terms of the formatted output (the (X)HTML), you're emitting
>         something extremely bad. You want indentation, which is a formatting
>         matter. To achieve your intended formatting you are corrupting the
>         character data stream by inserting NBSPs to cause a side effect:
>         horizontal spacing. If you want to change horizontal position, you
>         should do so through one of the established methods, not as a side
>         effect of inserting characters that have different semantics.
>         Consider this: When you copy & paste text from a front end or
>         webpage,
>         should the indentation be copied as a bunch of NBSPs? Hopefully you
>         agree it should not. The NBSPs are noise that has been inserted
>         into the
>         character stream. (If you try this on the PDF you linked and the
>         rendering by phpsword, you can see that they behave differently
>         when you
>         copy text and paste it into a word processor or text editor. That's
>         because the PDF does formatting correctly using PDF layout
>         methods, but
>         phpsword relies on a side effect.)
>     The issue is not how the indent is implemented by the engine. It is the
>     acceptance of these translator dictated elements as valid milestone
>     content in the OSIS file, and Sword recognizing and implementing these
>     indents as indents.
>             Actually, the line I copied above is the whole "paragraph"-
>             it is not a
>             multi-line anything. See
>             http://ibt.org.ru/en/text.htm?m=UZVL&l=Ruth.1.15&g=0 for the
>             real
>             location of this example. These two words are not a paragraph in
>             anyone's book, and to call this a paragraph, as you insist
>             that I must
>             do to use Sword, is in my book: "arbitrary, ad hoc, bad
>             engineering
>             practice, and bad markup practice", and just wrong. Let
>             publishers
>             decide what it is and what it will look like- users of Sword
>             will all be
>             glad!
>         Abstractly, it's multi-line. Some (most?) of these paragraphs are
>         multi-line. Even your two word example would be multi-line with a
>         sufficiently narrow column. These paragraphs break in exactly
>         the same
>         way as other paragraphs.
>         I still can't see the argument for these not being paragraphs. I
>         would
>         accept that they could be a different type of paragraph from the
>         type
>         that starts at the start of a sentence, but they are clearly
>         paragraphs.
>         Paragraphs with hanging indents are markedly more different than
>         these,
>         but they're still paragraphs.
>     I still can't see the argument for requiring that everyone call these
>     questionable instances paragraphs, and require that they must always be
>     marked up as such. Why not give the publisher the option of calling it a
>     paragraph if they consider it a paragraph, or else calling it an indent
>     if they think it will be more correctly understood as an indent? For
>     instance, many people consider that a paragraph should be followed by a
>     blank line (between paragraphs). What if I desire that this indented
>     line in my translation should never have a blank line after it, and that
>     it is an actual indent which is the content I intend to add- in order to
>     make my text more understandable? Then I should be able to call it an
>     indent. I would be very correct in doing so. Future readers of my OSIS
>     file would also unambiguously understand my intentions as well.
>         --Chris
>         ------------------------------------------------------------------------
>         sword-devel mailing list: sword-devel at crosswire.org
>         http://www.crosswire.org/mailman/listinfo/sword-devel
>         Instructions to unsubscribe/change your settings at above page
>     ------------------------------------------------------------------------
>     sword-devel mailing list: sword-devel at crosswire.org
>     http://www.crosswire.org/mailman/listinfo/sword-devel
>     Instructions to unsubscribe/change your settings at above page
> --
> Sent from my Android phone with K-9 Mail. Please excuse my brevity.

More information about the sword-devel mailing list