[sword-devel] usfm2osis.py and tag \cp

Chris Little chrislit at crosswire.org
Fri Oct 12 12:56:08 MST 2012


On 10/12/2012 4:00 AM, Peter von Kaehne wrote:
> Sorry, while the crash has gone, the function is not correct - at
> all.
>
> \cp is meant to give a printed chapter number which has no influence
> on the underlying counting of verses and chapters. How exactly to
> represent it in OSIS, we would need to figure out, but it should not
> influence the creation of subsequent osisIDs. I would think <hi
> type="bold"> is probably the best for our purposes. The OSIS
> reference is not exactly helpful at this point, nor does it reflect
> the reality of module making.

\cp (like \vp) is a workaround for a limitation in Paratext. Paratext 
requires that all chapter and verse numbers be numeric and strictly 
increasing. No lettered or out-of-order or repeated verse or chapter 
numbers are permissible. However, actual Bibles sometimes include these 
things. So Paratext requires that you enumerate the chapters/verses with 
strictly increasing numerals. \cp and \vp let Paratext substitute the 
correct underlying number when rendering.

The description of \cp in the USFM docs states: "This is a chapter 
marker (number, letter) which would be displayed in the published text 
(where the published marker is different than the \c # used within the 
translation editing environment)." The words "translation editing 
environment" are a reference to Paratext specifically, and the 
description as a whole conveys that \cp is the real chapter number if a 
different \c value is necessitated by Paratext.

OSIS doesn't have this limitation. You can encode the real verse and 
chapter numbers in OSIS, without need for a workaround.

So usfm2osis.py's replacement of the numeric dummy-chapter with the 
chapter number specified in \cp is correct.

If you look at your USFM document, I anticipate you see something like:

\c 1
\cp A
...
\c 2
\cp 1
...
\c 3
\cp 2
...
\c 4
\cp 3
...
\c 5
\cp B
...
\c 6
\cp 3
...
\c 7
\cp 4

The strictly increasing \c values are just dummy values for Paratext. 
The \cp values represent the actual underlying chapter numbers for this 
reference scheme. There aren't two different chapter 3s in Esther, just 
one that is briefly interrupted by chapter B, but Paratext can't deal 
with the underlying reference system, so it requires the \cp workaround. 
Likewise, chapter 4 (\cp 4) isn't really chapter 7 (\c 7).

This is mostly based on my experience encoding USX docs for ABS. If your 
USFM encoder intends that the value in \c be the chapter value, then \cp 
should not be used. You should look into \ca or \cl as alternatives.

> Right now the code does two things: It replaces in the sample below
> the chapter number 1 with an A for the subsequent verse's osisID
> ("Esth.A.1" instead of "Esth.1.1") and it leaves the \cp A in place.
> This is both not right - both acc OSIS reference and acc the desires
> of the USFM writer in my example.

With the update just committed, usfm2osis.py should now correctly remove 
\cp (and \vp). That was a bug--actually a set of bugs. Again, I 
regrettably haven't tested this, but the code looks good to me.

--Chris




More information about the sword-devel mailing list