[sword-devel] div type="paragraph" [was: Valid vs Best Practice XML]

Jonathan Morgan jonmmorgan at gmail.com
Mon Sep 17 06:57:25 MST 2012


Hi Troy/Greg,

On Mon, Sep 17, 2012 at 11:49 PM, Troy A. Griffitts <scribe at crosswire.org>wrote:

> 3 brief points.
>
> The HTML filter set is old and no one I know of uses this filter set.
>  HTMLHREF, WEBIF, and XHTML are the 3 filter sets I know which are in use
> today.  I've started to switch SWORDWeb from WEBIF to the XHTML filter set.
>  Once this is done, I wouldn't mind deprecating both the WEBIF and HTML
> filter sets.  Eventually, I'd like to deprecate the HTMLHREF filter set,
> leaving only one (XHTML) filter set we all use in common, but I know
> xiphos, and others are still using this as the primary HTML output filter
> set.
>

I know BPBible uses HTMLHREF, though we derive from it and make many
changes to the output accordingly.

You should be seeing <!P><br /> output from this <div type="paragraph">
> construct, not simply <!P>. Again, let's remove the <!P> if xiphos no
> longer needs it. <br /> is certainly valid, even if not necessarily the
> most desirable XHTML output for a paragraph division.
>

BPBible has in several places of the code calls to:

    data.replace("<!P>", "</p><p>")

That is all the processing we seem to do on it.

Jon


> On 09/16/2012 01:54 AM, Greg Hellings wrote:
>
>> On Sat, Sep 15, 2012 at 5:11 PM, Troy A. Griffitts <scribe at crosswire.org>
>> wrote:
>>
>>> Greg,
>>>
>>> Thank you for posting the issue.  I'm still really having a tough time
>>> understanding the problem.  I know we've been crossing on IRC, so I'm not
>>> sure if you are seeing any of my responses to you there.
>>>
>>>  Anything you say while my Nick is in the channel is saved by ZNC and
>> bounced to me the next time I login, up until I manually clear the
>> logs. So yes, I've been getting the messages you've sent.
>>
>>  We have code to hand these divs and not pass them through, as shown here:
>>>
>>> http://crosswire.org/svn/**sword/trunk/src/modules/**
>>> filters/osisxhtml.cpp<http://crosswire.org/svn/sword/trunk/src/modules/filters/osisxhtml.cpp>
>>>
>>> search for "paragraph" and it should be like the 2nd or 3rd hit, but
>>> there
>>> is a comment which specifically shows your construct of <div eID=""
>>> type="paragraph" />
>>>
>>> The end result is that this get's output as <!P><br />
>>>
>>> If you look below in your ./lookup output, you will see this exact
>>> output.
>>>
>> That output is the result of FMT_WEBIF rendering. I'm not sure exactly
>> what that is, so I can't speak to that.
>>
>> When I rebuild with HTMLHREF and XHTML I get <!/P>. This makes fine
>> for HTMLHREF according to what Chris has said elsewhere and you state
>> below as that is intended for use by GS/Xiphos. That does not make for
>> acceptable XHTML - it is not valid.
>>
>> When I rebuild lookup with FMT_HTML I am still seeing the div tag
>> passed through untouched. That is not valid HTML as discussed earlier
>> in this thread unless we're hoping to target a very strongly
>> discouraged construct of an older version of HTML.
>>
>> Strangely, I can't get the output of Diatheke and lookup to sync up on
>> the XHTML results.
>>
>>  The <!P> was added for/by gnomesword years ago and can be taken out if
>>> you
>>> do a grep through the xiphos code and find it not needed any longer.  I'm
>>> not sure why it was added.
>>>
>>> But, the end result is that we do process this construct and should never
>>> pass it through.  If Bibletime get's it to passed through, then they are
>>> not
>>> using our filters, either because they are using their own filter
>>> distinct
>>> filter set, or their filter set overrides this processing and doesn't
>>> accept
>>> our default processing.
>>>
>> The issue in BibleTime has already been taken care of. This only came
>> to light because the offending <div> tags were in the preverse
>> material which BibleTime does not pass through any filters but instead
>> simply strips tags out of the raw text. I can't pretend to know what
>> that is a good idea, but I'm not interested in that - only in getting
>> my module looking correct.
>>
>> I figured I'd point out the discrepancies between SWORD's usages and
>> the specs in the meantime. To that point, XHTML and HTML are still
>> generating invalid output according to lookup.
>>
>> --Greg
>>
>>  If you point me to an svn or git or whatever link to the Bibletime Render
>>> Filter which processes OSIS, I'd be happy to have a look.
>>>
>>> Troy
>>>
>>>
>>> On 09/15/2012 06:56 PM, Greg Hellings wrote:
>>>
>>>> To emphasize that we have an issue here, in the SWORD filters, here is
>>>> the output from diatheke with HTML, HTMLHREF and XHTML (which support
>>>> I just hacked in now in order to test).
>>>>
>>>> greg at Gateway08:~/Source/sword/**build (master)$ !diath
>>>> diatheke -b TKE -o h -f HTMLHREF -k Gen 1:2
>>>> Genesis 1:2: Elaboya kayawomele naari kayanna dhego. Yaali mahinje
>>>> ooddiiha ni owoopiha yahuruwedhiwe ni yiihi. Muneba wa Mulugu
>>>> waviravira vadhulu va mahinje, osasanyedhelaga.  <!/P><br />
>>>> (TKE)
>>>> greg at Gateway08:~/Source/sword/**build (master)$ diatheke -b TKE -o h -f
>>>> HTML -k Gen 1:2
>>>> <meta http-equiv="content-type" content="text/html;
>>>> charset=UTF-8">Genesis 1:2: Elaboya kayawomele naari kayanna dhego.
>>>> Yaali mahinje ooddiiha ni owoopiha yahuruwedhiwe ni yiihi. Muneba wa
>>>> Mulugu waviravira vadhulu va mahinje, osasanyedhelaga.  <div
>>>> eID="gen11" type="paragraph"/><br />
>>>> (TKE)
>>>> greg at Gateway08:~/Source/sword/**build (master)$ diatheke -b TKE -o h -f
>>>> XHTML -k Gen 1:2
>>>> Genesis 1:2: Elaboya kayawomele naari kayanna dhego. Yaali mahinje
>>>> ooddiiha ni owoopiha yahuruwedhiwe ni yiihi. Muneba wa Mulugu
>>>> waviravira vadhulu va mahinje, osasanyedhelaga.  <div eID="gen11"
>>>> type="paragraph"/>
>>>> (TKE)
>>>>
>>>> All three are outputting the same verse from the same module. HTML and
>>>> XHTML are outputting <div eID="gen11" type="paragraph"/> which is what
>>>> the module has in its rawest form. HTMLHREF outputs <!/P> which is not
>>>> valid anything. There are other, odd, differences between the three
>>>> but none of those are germane to this discussion, it would seem to me.
>>>>
>>>> $ ./examples/cmdline/lookup TKE Gen.1.2
>>>> ==Raw=Entry===============
>>>> Genesis 1:2:
>>>> Elaboya kayawomele naari kayanna dhego. Yaali mahinje ooddiiha ni
>>>> owoopiha yahuruwedhiwe ni yiihi. Muneba wa Mulugu<note n="1">1.2*
>>>> <catchWord>Muneba wa Mulugu</catchWord> naari wi «pevo yuulubale.»
>>>> Mulugu ohukalana muneba mmohi oneethanihu «Muneba Woweela.» Muneba
>>>> Woweela ohukamihedha voopaddusiwa elabo. Mwaana a Mulugu, Yesu
>>>> Kirisitu, teto ohukamihedha moopaddusa (Zhuwawu 1.1-3; aKolose 1.16;
>>>> aHeberi 1.2.)</note> waviravira vadhulu va mahinje, osasanyedhelaga.
>>>> <div eID="gen11" type="paragraph"/>
>>>> ==Render=Entry============
>>>>                  .divineName {                   font-variant:
>>>> small-caps;
>>>> }               .wordsOfJesus {color: red;              }
>>>> Elaboya kayawomele naari kayanna dhego. Yaali mahinje ooddiiha ni
>>>> owoopiha yahuruwedhiwe ni yiihi. Muneba wa Mulugu waviravira vadhulu
>>>> va mahinje, osasanyedhelaga.  <!/P><br />
>>>> ==========================
>>>> Entry Attributes:
>>>>
>>>> [ Footnote ]
>>>>          [ 1 ]
>>>>                  body = 1.2* <catchWord>Muneba wa Mulugu</catchWord>
>>>> naari
>>>> wi «pevo
>>>> yuulubale.» Mulugu ohukalana muneba mmohi oneethanihu «Muneba
>>>> Woweela.» Muneba Woweela ohukamihedha voopaddusiwa elabo. Mwaana a
>>>> Mulugu, Yesu Kirisitu, teto ohukamihedha moopaddusa (Zhuwawu 1.1-3;
>>>> aKolose 1.16; aHeberi 1.2.)
>>>>                  n = 1
>>>>
>>>> On Fri, Sep 14, 2012 at 7:15 PM, Chris Little <chrislit at crosswire.org>
>>>> wrote:
>>>>
>>>>>
>>>>> On 09/14/2012 01:02 PM, Greg Hellings wrote:
>>>>>
>>>>>> So I've been debugging a module display problem in BibleTime. I
>>>>>> mentioned it on IRC with Troy the other day but we weren't able to
>>>>>> connect at the same time to discuss further. The issue has to do with
>>>>>> paragraph tags - in osis2mod these tags are being converted from <p>
>>>>>> to <div sID="someid" type="paragraph" />.
>>>>>>
>>>>> This is extraordinarily bad. This is a change in semantics, because <p>
>>>>> and
>>>>> <div type="paragraph"> are not semantically equivalent.
>>>>>
>>>>> <p> marks the type of paragraph we all probably think of first:
>>>>> generally, a
>>>>> chunk of text with newlines before and after.
>>>>>
>>>>> <div type="paragraph"> marks a formal division within a text that
>>>>> happens
>>>>> to
>>>>> be identified as a 'paragraph' and may consist of multiple <p>-type
>>>>> paragraphs. Examples of these divisions are found in many laws and the
>>>>> Catechism of the Catholic Church (which does exist in OSIS form).
>>>>> Here's
>>>>> part 1, section 1, chapter 1, article 1, paragraph 1 of the CCC:
>>>>> http://www.vatican.va/archive/**ENG0015/__P16.HTM<http://www.vatican.va/archive/ENG0015/__P16.HTM>.
>>>>> As you can see, it
>>>>> consists
>>>>> of many <p>-type paragraphs but is a single <div type="paragraph">-type
>>>>> paragraph.
>>>>>
>>>>> Abhorrent though I consider milestoned <p/>, I think I would much
>>>>> prefer
>>>>> to
>>>>> see us map <p>...</p> to <p sID=""/>...<p eID=""/> than see us clobber
>>>>> the
>>>>> semantics of a defined <div> type.
>>>>>
>>>>>
>>>>>  Thus, osis2mod is in violation of the suggested XML best practice by
>>>>>> creating a non-EMPTY tag as self-closing but this is seemingly pretty
>>>>>> common in the OSIS world. Furthermore our filters are producing
>>>>>> invalid (or very strongly discouraged) HTML as per every still-in-use
>>>>>> version of the specs (HTML4, XHTML, HTML5). As such, I'm of the
>>>>>> opinion that this represents a bug in SWORD - at the very least in the
>>>>>> filters that permit empty, self-closing div tags to slip through what
>>>>>> are supposedly HTML outputs. Do others agree or disagree on this?
>>>>>>
>>>>> I'm of the opinion that our OSIS is generally fine, meaning we should
>>>>> go
>>>>> ahead and keep allowing self-closing OSIS tags if possible (as input
>>>>> and
>>>>> output from osis2mod and as content of modules not produced by
>>>>> osis2mod).
>>>>> This is just a recommendation and specifically a recommendation for the
>>>>> purpose of aiding processing with legacy SGML tools, which I can't see
>>>>> us
>>>>> doing and don't personally care about. (The semantic violation noted
>>>>> above
>>>>> is a bug in my mind, but that issue is orthogonal.)
>>>>>
>>>>> I would agree that the filter output is buggy if we're generating
>>>>> disallowed
>>>>> tag forms. OSIS <div> and <p> would need to be translated to their
>>>>> correct,
>>>>> non-self-closing HTML forms. Beyond those two, I can't think of any
>>>>> tags
>>>>> that have the same form & general semantics in both OSIS & HTML.
>>>>>
>>>>> --Chris
>>>>>
>>>>>
>>>>>
>>>>> ______________________________**_________________
>>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>>> http://www.crosswire.org/**mailman/listinfo/sword-devel<http://www.crosswire.org/mailman/listinfo/sword-devel>
>>>>> Instructions to unsubscribe/change your settings at above page
>>>>>
>>>> ______________________________**_________________
>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>> http://www.crosswire.org/**mailman/listinfo/sword-devel<http://www.crosswire.org/mailman/listinfo/sword-devel>
>>>> Instructions to unsubscribe/change your settings at above page
>>>>
>>>
>>>
>>> ______________________________**_________________
>>> sword-devel mailing list: sword-devel at crosswire.org
>>> http://www.crosswire.org/**mailman/listinfo/sword-devel<http://www.crosswire.org/mailman/listinfo/sword-devel>
>>> Instructions to unsubscribe/change your settings at above page
>>>
>> ______________________________**_________________
>> sword-devel mailing list: sword-devel at crosswire.org
>> http://www.crosswire.org/**mailman/listinfo/sword-devel<http://www.crosswire.org/mailman/listinfo/sword-devel>
>> Instructions to unsubscribe/change your settings at above page
>>
>
>
> ______________________________**_________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/**mailman/listinfo/sword-devel<http://www.crosswire.org/mailman/listinfo/sword-devel>
> Instructions to unsubscribe/change your settings at above page
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20120917/ce081ad0/attachment-0001.html>


More information about the sword-devel mailing list