Module Tools
  1. Module Tools
  2. MODTOOLS-41

Update usfm2osis.py to cover the new Paratext feature of nested tags using \+

    Details

    • Type: New Feature New Feature
    • Status: Open (View Workflow)
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: usfm2osis.py
    • Labels:
      None
    • Environment:

      N/A

      Description

      The latest version of Paratext supports the concept of nested tags identified using

       \+ 

      Most of these are typically to be found in footnotes.

      Here's an example from the Welsh beibl.net translation.

       \v 13 Ond cyn mynd, galwodd ddeg o'i weision ato a rhannu swm o arian\f + \fr 19:13 \fq swm o arian: \ft Groeg, “10 \+tl mina\+tl*”. Roedd un \+tl mina\+tl* yn werth 100 denariws, sef cyflog tua tri mis.\f* rhyngddyn nhw. ‘Defnyddiwch yr arian yma i farchnata ar fy rhan, nes do i yn ôl adre,’ meddai. 

      I've asked Jeff Klassen to let me know when this enhancement will be covered by updated USFM documentation.

        Activity

        Hide
        Chris Little added a comment -

        You might try changing all instances of + to \. There's a chance that will fix everything since usfm2osis.py will process footnotes recursively.

        I rather hope it doesn't, however, because according to the USFM documentation a \tl element should not be possible within a \ft element. I hoped to make the script mostly fail when provided with invalid USFM input.

        Prior to seeing some documentation, I can't do much to update usfm2osis.py for this feature.

        Show
        Chris Little added a comment - You might try changing all instances of + to \. There's a chance that will fix everything since usfm2osis.py will process footnotes recursively. I rather hope it doesn't, however, because according to the USFM documentation a \tl element should not be possible within a \ft element. I hoped to make the script mostly fail when provided with invalid USFM input. Prior to seeing some documentation, I can't do much to update usfm2osis.py for this feature.
        Hide
        David Haslam added a comment -

        My own immediate difficulties will be how Go Bible Creator can be made to cope with the new & enhanced USFM tags.

        Reporting it here as well is a "spin-off", though of course, it goes without saying that Arfon would like us to make a module.

        Show
        David Haslam added a comment - My own immediate difficulties will be how Go Bible Creator can be made to cope with the new & enhanced USFM tags. Reporting it here as well is a "spin-off", though of course, it goes without saying that Arfon would like us to make a module.
        Hide
        David Haslam added a comment -

        FIO.

        Here are the USFM tag statistics for the Welsh beibl.net translation.

        Show
        David Haslam added a comment - FIO. Here are the USFM tag statistics for the Welsh beibl.net translation.
        Hide
        David Haslam added a comment -

        The text file has the new nested + tags at the top of the counted list.

        Show
        David Haslam added a comment - The text file has the new nested + tags at the top of the counted list.
        Hide
        David Haslam added a comment - - edited

        Please visit http://paratext.org/usfm to obtain the new release of the USFM Reference version 2.4

        2.4 - June 2013

        Marker Additions

        · Support for nesting character markup.

        Full support for the nested character markup syntax has been included in Paratext >=7.4, Publishing Assistant >=4.1, and the XML export format from Paratext known as "USX".

        Show
        David Haslam added a comment - - edited Please visit http://paratext.org/usfm to obtain the new release of the USFM Reference version 2.4 2.4 - June 2013 Marker Additions · Support for nesting character markup. Full support for the nested character markup syntax has been included in Paratext >=7.4, Publishing Assistant >=4.1, and the XML export format from Paratext known as "USX".

          People

          • Assignee:
            Chris Little
            Reporter:
            David Haslam
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated: