[sword-devel] Sword support of indents and line breaks

John Austin gpl.programs.info at gmail.com
Fri Apr 12 04:57:51 MST 2013

You didn't address my main point: Content providers should be given a 
way to have final control over how their formatted texts appear, and one 
which is simple and reliable. I'll comment below, but a Bible 
translation is not a web-page or an app which might need a new look 
someday, or a new skin. CSS and content abstraction etc. are great 
ideas, but they should not be artificially forced onto Bible publishers. 
Yes, they should be offered, and even encouraged- fine. But publishers 
should be able to say: "This is exactly how I want the formatting, 
everywhere, any time. Period." I don't understand why this expectation 
is so abhorrent. Offering a handful of content abstractions and 
extensions, all of whose definitions are arguable (see below) and likely 
in flux, is neither simple, nor satisfying to content providers who 
desire control over the presentation of their texts.

I've worked with many, many SFM texts, and they often do not follow SFM 
rules or play nice in a variety of ways. All of this greatly complicates 
an already serious conversion from SFM to Sword. The proof may in the 
the pudding. Simple is sometimes better in the real world. Sure, IBT 
could recreate their modules using container elements, but that still 
would not provide the reliability or control enjoyed by the existing 
modules. I still don't see (beyond theory and arguable semantics) a good 
reason to deny "customers" a sound and working solution.

On 04/12/2013 03:26 PM, Chris Little wrote:
> Executive summary:
> I don't have a problem with making it clear how to encode indented
> paragraphs and line breaks and improving support for diverse paragraph
> types.
> I do have problems with the specific syntax and the rationale described
> below.
> On 4/11/2013 11:04 PM, John Austin wrote:
>> Sword should support basic indents and line breaks. Content providers
>> should be able to control the formatting of their texts and should not
>> be required to assign their content to artificial <p>...</p> or other
>> containers to do so. Although these containers might be useful, the
>> text of some translation styles cannot be fit nicely into them. But
>> often content providers do rightly desire their texts to appear with
>> formatting similar to their printed texts, since this is exactly what
>> the translators deemed easiest to read and understand.
>> People who convert texts to Sword are often not at liberty to change
>> the source texts to do so, and source texts in strange languages come
>> with many unexpected language constructs. For these reasons it is
>> important that Sword tries to offer content providers a simple,
>> reliable way of formatting their own texts, without requiring them to
>> fit into Sword's container scheme to do so.
>> IBT of Russia is already using simple osis <milestone
>> type="x-p-indent" /> and <lb /> to achieve all their formatting needs
>> for their Sword modules. Currently, only xulsword supports both of
>> these. But perhaps they should both be included in Sword's osis2html
>> filters so that all front-ends can support them. At least something
>> very similar should be adopted, if there is a strong reason not to
>> adopt IBT's well tested method.
> So, encoders should not have to assign content to 'artificial
> <p>...</p>' but they should have to encode an artificial <milestone
> type="x-p-indent"/>? They shouldn't assign content to the structure that
> it clearly is (<p>...</p>), rather to an imagined indentation object?
Something like "     Бўаз Рутга:" is not clearly a <p> even though that 
is how at appears in SFM, and that is how it would appear in the module 
according to your argument. For instance, if some front-end designer 
thinks it is really neat for his front-end's paragraphs to have 
drop-caps and so he modifies his CSS to add them to "paragraphs"- Then 
my text is completely broke because, in fact the above is NOT a 
paragraph, in any language. It is, in fact, an indented line.

> There's not a location or an object that represents indentation.
> Indentation is a property of paragraphs, so it should be marked on
> paragraphs, as is our current practice.
Indentation is a property of paragraphs- usually... but not always... 
well, it depends... This is exactly why Sword also needs a simple 
indent. One which is always an indent.

> Here's the list of paragraph types from the USFM reference along with
> the paragraph type that usfm2osis.py will generate (in the form of a
> Python dict): {'pc':'x-center', 'pr':'x-right', 'm':'x-noindent',
> 'pmo':'x-embedded-opening', 'pm':'x-embedded',
> 'pmc':'x-embedded-closing', 'pmr':'x-right', 'pi':'x-indented-1',
> 'pi1':'x-indented-1', 'pi2':'x-indented-2', 'pi3':'x-indented-3',
> 'pi4':'x-indented-4', 'pi5':'x-indented-5', 'mi':'x-noindent-indented',
> 'nb':'x-nobreak', 'phi':'x-indented-hanging', 'ps':'x-nobreakNext',
> 'psi':'x-nobreakNext-indented', 'p1':'x-level-1', 'p2':'x-level-2',
> 'p3':'x-level-3', 'p4':'x-level-4', 'p5':'x-level-5'}.
> I believe that a bare <p> should, by default, be indented. The only case
> where it shouldn't would be in a translation without any paragraphs,
> which should have each verse start on a new line. I would argue that the
> OSIS filters should be improved to translate these OSIS <p> types to
> (X)HTML <p> classes or CSS or such. But we should not be supporting an
> indentation milestone and generating &nbsp;s or something similar to
> simulate indentation. (Nor should we translate indentation milestones to
> (X)HTML <p> classes or CSS, if that's your implementation.)
There is a demonstrated need for an indent, and a good implementation. 
Where is the serious argument for why Sword should deny support for that?

> I presume you're already happy with the handling of <lb/>.
Assuming they always render (when formatting is desired of course) as 
basic line breaks, and NOT as blank lines (similar to <br> in html) then 

>> Hard spaces and other such formatting are not acceptable solutions
>> because they cannot be easily filtered. It is important that
>> unformatted text can easily be obtained from formatted text since
>> there are many uses for unformatted text, such as bookmark and
>> cross-reference verse texts etc.
>> Here is one example to show why forcing containers on a text is not a
>> good idea. This is a section of SFM from the book of Ruth 1:8-12:
>> \v 8 Йўлда давом этишаркан, Наима иккала келинига деди:
>> \p — Боринглар, икковингиз ҳам оналарингизнинг уйларига қайтинглар.
>> Менга ва марҳумларга бўлган иззат–ҳурматингиз учун Эгам сизларга ҳам
>> марҳамат қилсин.
>> \v 9 Икковларингизга ҳам яхши жойлардан ато қилсин, турмуш қуриб, ўз
>> эрларингиз билан бахтли бўлинглар!
>> \p Шундай деб Наима келинларини ўпди, иккаласи эса йиғлаб фарёд
>> кўтаришди:
>> \p
>> \v 10 — Йўқ, биз сиз билан кетамиз, сизнинг халқингиз орасида яшаймиз,
>> — дейишди.
>> \v 11 Наима эса яна келинларига:
>> \p — Қайтинглар, жон қизларим! — деди. — Мен билан кетганингиздан нима
>> фойда?! Қорнимда яна ўғилларим бормидики сизларга умр йўлдоши бўлса?!*
>> \v 12 Бўлди энди, қизларим, қайтинглар! Мен энди кексайдим, эрга
>> тегишга ожизман. Борди–ю, мен, ҳали умид қилсам бўлади, деб шу кеча
>> эрим билан қовушсаму ўғиллар туғсам,
>> Here is a PDF of exactly what the translators designed this SFM to
>> look like:
>> http://ibt.org.ru/russian/bible/uzb/otcyr/08%20Rut%20-%20Uzbek%20Cyrillic.pdf
>> And here is what it looks like in Sword format using only basic osis
>> intents and line breaks, rendered by xulsword's osis2html filter:
>> http://ibt.org.ru/en/text.htm?m=UZV&l=Ruth.1.1.1&g=0. As you can see,
>> the Sword module renders this strange (to us) formatting of text just
>> like the translators wanted.
>> However, now imagine trying to programmatically apply <p>...</p>,
>> <l>...</l> etc. constructs to the above SFM to achieve the same
>> effect. The designers of the SFM in this case are using the \p tag to
>> represent a simple indent (not a paragraph) in order to achieve their
>> desired non-Western layout. One might try and argue that the SFM
>> designers have done something wrong, but the point is that we have
>> what we have. So Sword should provide a simple way for content
>> providers to control the formatting of their texts. Basic indents and
>> line breaks do the trick for Central Asian languages, and probably
>> many others as well. Poetry is even made easy, by putting indents in
>> series as desired.
> I disagree. Those are paragraphs. I'm not sure why you would argue that
> something which looks like a paragraph, acts like a paragraph, and is
> encoded using paragraph markup is nevertheless not a paragraph. You can
> achieve your desired typesetting by putting the paragraphs in <p>
> elements and indenting them all. (Again, I would argue that all
> paragraphs should be indented, except in unparagraphed translations.)
Again, they are not paragraphs as most would understand them. Because if 
they inherited any typical "paragraph" formatting, other than the 
indent, they would render completely wrong. The fact that there is 
serious discussion about whether they are paragraphs or not makes the 
importance of point #1 clear as day: The content provider needs a simple 
way to have control over their formatting. Now, forever, period.

> My only guess is that you don't believe paragraph breaks can occur
> within sentences, but evidently they can.
> There's already defined syntax for poetry formatting using the level
> attribute.
> --Chris
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page

More information about the sword-devel mailing list