[sword-devel] Sword support of indents and line breaks

Troy A. Griffitts scribe at crosswire.org
Fri Apr 12 06:01:50 MST 2013


Dear John,

I certainly want to provide what is necessary to satisfy ministry needs.

Having said this, I want to be sure you understand why the push-back.

This statement is not well defined or reasonable:
"This is exactly how I want the formatting, everywhere, any time. Period."

Really?  You really want 5 indented spaces on a mobile device trying to 
show 2 parallel Bibles on a 3" wide screen?  You really want me to 
indent when the user is trying to show 4 Russian Bibles running 
horizontal and stacked vertically for comparison?  You want indentation 
when showing search results? You want exactly x number of spaces 
indented when viewed in Go Bible on a 1.5" Nokia candybar phone which 
can barely get 3 words on 1 one line?

I don't wish to sound cold to your request, but it is not realistic to 
offer promises to publisher that we cannot deliver:

"This is exactly how I want the formatting, everywhere, any time. Period."

In your application, maybe you can 'reliably' give them this, but the 
SWORD engine is not simply for frontends running on desktop-sized 
screens viewing text in a "Bible reading"-only mode.  We offer options 
to our users: Red Letters for Christ's Words, Footnotes, X-References, 
Section Titles, Display in "Paragraph" Mode or "Verse per Line" mode, 
and many more.  These are all options many of our frontends on different 
platforms give our users and then when they are turned on, each frontend 
chooses vastly different ways to show each of these things-- according 
to what they feel best suits the display size and occasion for the text 
display.

I'm sure a publisher who wants display to be "...exactly how I want the 
formatting, everywhere, any time. Period." for poetry would not be happy 
with any of these user configurable choices and how some frontends deem 
it best to display these.

I am certainly willing to offer a means for the translator to designate 
what they desire, when the screen is large enough and the user wishes 
formatting to be turned on, and the occasion is simply reading the text, 
and not showing search results, or parallel texts, or other modes of 
display.  Does the support that Chris mentions in his previous email not 
address the need for your example?

My last item to address before a new release of the engine is to deal 
with whitespace issues, which this falls squarely into.

Can the people involved in this discussion suggest the desired OSIS 
encoding for your example, and the desired XHTML output from this OSIS 
through the osisxhtml filters, and I will place this into our testsuite 
OSIS document and add a test to be sure we generate the desired output 
for you.

How does this sound?

Troy











On 04/12/2013 01:57 PM, John Austin wrote:
> You didn't address my main point: Content providers should be given a 
> way to have final control over how their formatted texts appear, and 
> one which is simple and reliable. I'll comment below, but a Bible 
> translation is not a web-page or an app which might need a new look 
> someday, or a new skin. CSS and content abstraction etc. are great 
> ideas, but they should not be artificially forced onto Bible 
> publishers. Yes, they should be offered, and even encouraged- fine. 
> But publishers should be able to say: "This is exactly how I want the 
> formatting, everywhere, any time. Period." I don't understand why this 
> expectation is so abhorrent. Offering a handful of content 
> abstractions and extensions, all of whose definitions are arguable 
> (see below) and likely in flux, is neither simple, nor satisfying to 
> content providers who desire control over the presentation of their 
> texts.
>
> I've worked with many, many SFM texts, and they often do not follow 
> SFM rules or play nice in a variety of ways. All of this greatly 
> complicates an already serious conversion from SFM to Sword. The proof 
> may in the the pudding. Simple is sometimes better in the real world. 
> Sure, IBT could recreate their modules using container elements, but 
> that still would not provide the reliability or control enjoyed by the 
> existing modules. I still don't see (beyond theory and arguable 
> semantics) a good reason to deny "customers" a sound and working 
> solution.
>
> On 04/12/2013 03:26 PM, Chris Little wrote:
>> Executive summary:
>> I don't have a problem with making it clear how to encode indented
>> paragraphs and line breaks and improving support for diverse paragraph
>> types.
>> I do have problems with the specific syntax and the rationale described
>> below.
>>
>> On 4/11/2013 11:04 PM, John Austin wrote:
>>> Sword should support basic indents and line breaks. Content providers
>>> should be able to control the formatting of their texts and should not
>>> be required to assign their content to artificial <p>...</p> or other
>>> containers to do so. Although these containers might be useful, the
>>> text of some translation styles cannot be fit nicely into them. But
>>> often content providers do rightly desire their texts to appear with
>>> formatting similar to their printed texts, since this is exactly what
>>> the translators deemed easiest to read and understand.
>>>
>>> People who convert texts to Sword are often not at liberty to change
>>> the source texts to do so, and source texts in strange languages come
>>> with many unexpected language constructs. For these reasons it is
>>> important that Sword tries to offer content providers a simple,
>>> reliable way of formatting their own texts, without requiring them to
>>> fit into Sword's container scheme to do so.
>>>
>>> IBT of Russia is already using simple osis <milestone
>>> type="x-p-indent" /> and <lb /> to achieve all their formatting needs
>>> for their Sword modules. Currently, only xulsword supports both of
>>> these. But perhaps they should both be included in Sword's osis2html
>>> filters so that all front-ends can support them. At least something
>>> very similar should be adopted, if there is a strong reason not to
>>> adopt IBT's well tested method.
>>
>> So, encoders should not have to assign content to 'artificial
>> <p>...</p>' but they should have to encode an artificial <milestone
>> type="x-p-indent"/>? They shouldn't assign content to the structure that
>> it clearly is (<p>...</p>), rather to an imagined indentation object?
> Something like "     Бўаз Рутга:" is not clearly a <p> even though 
> that is how at appears in SFM, and that is how it would appear in the 
> module according to your argument. For instance, if some front-end 
> designer thinks it is really neat for his front-end's paragraphs to 
> have drop-caps and so he modifies his CSS to add them to "paragraphs"- 
> Then my text is completely broke because, in fact the above is NOT a 
> paragraph, in any language. It is, in fact, an indented line.
>
>>
>> There's not a location or an object that represents indentation.
>> Indentation is a property of paragraphs, so it should be marked on
>> paragraphs, as is our current practice.
> Indentation is a property of paragraphs- usually... but not always... 
> well, it depends... This is exactly why Sword also needs a simple 
> indent. One which is always an indent.
>
>>
>> Here's the list of paragraph types from the USFM reference along with
>> the paragraph type that usfm2osis.py will generate (in the form of a
>> Python dict): {'pc':'x-center', 'pr':'x-right', 'm':'x-noindent',
>> 'pmo':'x-embedded-opening', 'pm':'x-embedded',
>> 'pmc':'x-embedded-closing', 'pmr':'x-right', 'pi':'x-indented-1',
>> 'pi1':'x-indented-1', 'pi2':'x-indented-2', 'pi3':'x-indented-3',
>> 'pi4':'x-indented-4', 'pi5':'x-indented-5', 'mi':'x-noindent-indented',
>> 'nb':'x-nobreak', 'phi':'x-indented-hanging', 'ps':'x-nobreakNext',
>> 'psi':'x-nobreakNext-indented', 'p1':'x-level-1', 'p2':'x-level-2',
>> 'p3':'x-level-3', 'p4':'x-level-4', 'p5':'x-level-5'}.
>>
>> I believe that a bare <p> should, by default, be indented. The only case
>> where it shouldn't would be in a translation without any paragraphs,
>> which should have each verse start on a new line. I would argue that the
>> OSIS filters should be improved to translate these OSIS <p> types to
>> (X)HTML <p> classes or CSS or such. But we should not be supporting an
>> indentation milestone and generating &nbsp;s or something similar to
>> simulate indentation. (Nor should we translate indentation milestones to
>> (X)HTML <p> classes or CSS, if that's your implementation.)
> There is a demonstrated need for an indent, and a good implementation. 
> Where is the serious argument for why Sword should deny support for that?
>
>>
>> I presume you're already happy with the handling of <lb/>.
> Assuming they always render (when formatting is desired of course) as 
> basic line breaks, and NOT as blank lines (similar to <br> in html) 
> then yes.
>
>>
>>> Hard spaces and other such formatting are not acceptable solutions
>>> because they cannot be easily filtered. It is important that
>>> unformatted text can easily be obtained from formatted text since
>>> there are many uses for unformatted text, such as bookmark and
>>> cross-reference verse texts etc.
>>>
>>> Here is one example to show why forcing containers on a text is not a
>>> good idea. This is a section of SFM from the book of Ruth 1:8-12:
>>>
>>> \v 8 Йўлда давом этишаркан, Наима иккала келинига деди:
>>> \p — Боринглар, икковингиз ҳам оналарингизнинг уйларига қайтинглар.
>>> Менга ва марҳумларга бўлган иззат–ҳурматингиз учун Эгам сизларга ҳам
>>> марҳамат қилсин.
>>> \v 9 Икковларингизга ҳам яхши жойлардан ато қилсин, турмуш қуриб, ўз
>>> эрларингиз билан бахтли бўлинглар!
>>> \p Шундай деб Наима келинларини ўпди, иккаласи эса йиғлаб фарёд
>>> кўтаришди:
>>> \p
>>> \v 10 — Йўқ, биз сиз билан кетамиз, сизнинг халқингиз орасида яшаймиз,
>>> — дейишди.
>>> \v 11 Наима эса яна келинларига:
>>> \p — Қайтинглар, жон қизларим! — деди. — Мен билан кетганингиздан нима
>>> фойда?! Қорнимда яна ўғилларим бормидики сизларга умр йўлдоши бўлса?!*
>>> \v 12 Бўлди энди, қизларим, қайтинглар! Мен энди кексайдим, эрга
>>> тегишга ожизман. Борди–ю, мен, ҳали умид қилсам бўлади, деб шу кеча
>>> эрим билан қовушсаму ўғиллар туғсам,
>>>
>>> Here is a PDF of exactly what the translators designed this SFM to
>>> look like:
>>> http://ibt.org.ru/russian/bible/uzb/otcyr/08%20Rut%20-%20Uzbek%20Cyrillic.pdf 
>>>
>>>
>>>
>>> And here is what it looks like in Sword format using only basic osis
>>> intents and line breaks, rendered by xulsword's osis2html filter:
>>> http://ibt.org.ru/en/text.htm?m=UZV&l=Ruth.1.1.1&g=0. As you can see,
>>> the Sword module renders this strange (to us) formatting of text just
>>> like the translators wanted.
>>>
>>> However, now imagine trying to programmatically apply <p>...</p>,
>>> <l>...</l> etc. constructs to the above SFM to achieve the same
>>> effect. The designers of the SFM in this case are using the \p tag to
>>> represent a simple indent (not a paragraph) in order to achieve their
>>> desired non-Western layout. One might try and argue that the SFM
>>> designers have done something wrong, but the point is that we have
>>> what we have. So Sword should provide a simple way for content
>>> providers to control the formatting of their texts. Basic indents and
>>> line breaks do the trick for Central Asian languages, and probably
>>> many others as well. Poetry is even made easy, by putting indents in
>>> series as desired.
>>
>> I disagree. Those are paragraphs. I'm not sure why you would argue that
>> something which looks like a paragraph, acts like a paragraph, and is
>> encoded using paragraph markup is nevertheless not a paragraph. You can
>> achieve your desired typesetting by putting the paragraphs in <p>
>> elements and indenting them all. (Again, I would argue that all
>> paragraphs should be indented, except in unparagraphed translations.)
> Again, they are not paragraphs as most would understand them. Because 
> if they inherited any typical "paragraph" formatting, other than the 
> indent, they would render completely wrong. The fact that there is 
> serious discussion about whether they are paragraphs or not makes the 
> importance of point #1 clear as day: The content provider needs a 
> simple way to have control over their formatting. Now, forever, period.
>
>>
>> My only guess is that you don't believe paragraph breaks can occur
>> within sentences, but evidently they can.
>>
>> There's already defined syntax for poetry formatting using the level
>> attribute.
>>
>> --Chris
>>
>>
>> _______________________________________________
>> sword-devel mailing list: sword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page




More information about the sword-devel mailing list