[sword-devel] Sword support of indents and line breaks

Chris Little chrislit at crosswire.org
Fri Apr 12 02:26:32 MST 2013


Executive summary:
I don't have a problem with making it clear how to encode indented 
paragraphs and line breaks and improving support for diverse paragraph 
types.
I do have problems with the specific syntax and the rationale described 
below.

On 4/11/2013 11:04 PM, John Austin wrote:
> Sword should support basic indents and line breaks. Content providers 
> should be able to control the formatting of their texts and should not 
> be required to assign their content to artificial <p>...</p> or other 
> containers to do so. Although these containers might be useful, the 
> text of some translation styles cannot be fit nicely into them. But 
> often content providers do rightly desire their texts to appear with 
> formatting similar to their printed texts, since this is exactly what 
> the translators deemed easiest to read and understand.
>
> People who convert texts to Sword are often not at liberty to change 
> the source texts to do so, and source texts in strange languages come 
> with many unexpected language constructs. For these reasons it is 
> important that Sword tries to offer content providers a simple, 
> reliable way of formatting their own texts, without requiring them to 
> fit into Sword's container scheme to do so.
>
> IBT of Russia is already using simple osis <milestone 
> type="x-p-indent" /> and <lb /> to achieve all their formatting needs 
> for their Sword modules. Currently, only xulsword supports both of 
> these. But perhaps they should both be included in Sword's osis2html 
> filters so that all front-ends can support them. At least something 
> very similar should be adopted, if there is a strong reason not to 
> adopt IBT's well tested method.

So, encoders should not have to assign content to 'artificial 
<p>...</p>' but they should have to encode an artificial <milestone 
type="x-p-indent"/>? They shouldn't assign content to the structure that 
it clearly is (<p>...</p>), rather to an imagined indentation object?

There's not a location or an object that represents indentation. 
Indentation is a property of paragraphs, so it should be marked on 
paragraphs, as is our current practice.

Here's the list of paragraph types from the USFM reference along with 
the paragraph type that usfm2osis.py will generate (in the form of a 
Python dict): {'pc':'x-center', 'pr':'x-right', 'm':'x-noindent', 
'pmo':'x-embedded-opening', 'pm':'x-embedded', 
'pmc':'x-embedded-closing', 'pmr':'x-right', 'pi':'x-indented-1', 
'pi1':'x-indented-1', 'pi2':'x-indented-2', 'pi3':'x-indented-3', 
'pi4':'x-indented-4', 'pi5':'x-indented-5', 'mi':'x-noindent-indented', 
'nb':'x-nobreak', 'phi':'x-indented-hanging', 'ps':'x-nobreakNext', 
'psi':'x-nobreakNext-indented', 'p1':'x-level-1', 'p2':'x-level-2', 
'p3':'x-level-3', 'p4':'x-level-4', 'p5':'x-level-5'}.

I believe that a bare <p> should, by default, be indented. The only case 
where it shouldn't would be in a translation without any paragraphs, 
which should have each verse start on a new line. I would argue that the 
OSIS filters should be improved to translate these OSIS <p> types to 
(X)HTML <p> classes or CSS or such. But we should not be supporting an 
indentation milestone and generating &nbsp;s or something similar to 
simulate indentation. (Nor should we translate indentation milestones to 
(X)HTML <p> classes or CSS, if that's your implementation.)

I presume you're already happy with the handling of <lb/>.

> Hard spaces and other such formatting are not acceptable solutions 
> because they cannot be easily filtered. It is important that 
> unformatted text can easily be obtained from formatted text since 
> there are many uses for unformatted text, such as bookmark and 
> cross-reference verse texts etc.
>
> Here is one example to show why forcing containers on a text is not a 
> good idea. This is a section of SFM from the book of Ruth 1:8-12:
>
> \v 8 Йўлда давом этишаркан, Наима иккала келинига деди:
> \p — Боринглар, икковингиз ҳам оналарингизнинг уйларига қайтинглар. 
> Менга ва марҳумларга бўлган иззат–ҳурматингиз учун Эгам сизларга ҳам 
> марҳамат қилсин.
> \v 9 Икковларингизга ҳам яхши жойлардан ато қилсин, турмуш қуриб, ўз 
> эрларингиз билан бахтли бўлинглар!
> \p Шундай деб Наима келинларини ўпди, иккаласи эса йиғлаб фарёд 
> кўтаришди:
> \p
> \v 10 — Йўқ, биз сиз билан кетамиз, сизнинг халқингиз орасида яшаймиз, 
> — дейишди.
> \v 11 Наима эса яна келинларига:
> \p — Қайтинглар, жон қизларим! — деди. — Мен билан кетганингиздан нима 
> фойда?! Қорнимда яна ўғилларим бормидики сизларга умр йўлдоши бўлса?!*
> \v 12 Бўлди энди, қизларим, қайтинглар! Мен энди кексайдим, эрга 
> тегишга ожизман. Борди–ю, мен, ҳали умид қилсам бўлади, деб шу кеча 
> эрим билан қовушсаму ўғиллар туғсам,
>
> Here is a PDF of exactly what the translators designed this SFM to 
> look like: 
> http://ibt.org.ru/russian/bible/uzb/otcyr/08%20Rut%20-%20Uzbek%20Cyrillic.pdf
>
> And here is what it looks like in Sword format using only basic osis 
> intents and line breaks, rendered by xulsword's osis2html filter: 
> http://ibt.org.ru/en/text.htm?m=UZV&l=Ruth.1.1.1&g=0. As you can see, 
> the Sword module renders this strange (to us) formatting of text just 
> like the translators wanted.
>
> However, now imagine trying to programmatically apply <p>...</p>, 
> <l>...</l> etc. constructs to the above SFM to achieve the same 
> effect. The designers of the SFM in this case are using the \p tag to 
> represent a simple indent (not a paragraph) in order to achieve their 
> desired non-Western layout. One might try and argue that the SFM 
> designers have done something wrong, but the point is that we have 
> what we have. So Sword should provide a simple way for content 
> providers to control the formatting of their texts. Basic indents and 
> line breaks do the trick for Central Asian languages, and probably 
> many others as well. Poetry is even made easy, by putting indents in 
> series as desired.

I disagree. Those are paragraphs. I'm not sure why you would argue that 
something which looks like a paragraph, acts like a paragraph, and is 
encoded using paragraph markup is nevertheless not a paragraph. You can 
achieve your desired typesetting by putting the paragraphs in <p> 
elements and indenting them all. (Again, I would argue that all 
paragraphs should be indented, except in unparagraphed translations.)

My only guess is that you don't believe paragraph breaks can occur 
within sentences, but evidently they can.

There's already defined syntax for poetry formatting using the level 
attribute.

--Chris




More information about the sword-devel mailing list