[osis-core] Identifiers for segmented verses.

Steve DeRose osis-core@bibletechnologieswg.org
Wed, 19 Jun 2002 15:37:06 -0400


At 01:04 PM -0500 06/05/02, Todd Tillinghast wrote:
>As a "best practice" I would like to propose that when verses are
>segmented that we use the reference identifier for the whole verse for
>the segment that begins the verse (Matt.13.3) and then use the reference
>including the appropriate grain for the further segments
>(Matt.13.3:char:43(Once)).


Ahh, just saw this after my earlier posting on related topic. Hmmm.

>
>It is not appropriate to say Matt.13.3a unless the reference system of
>the translation being used has defined Matt.13.3a as a verse identifier.
>In this case the verse IS NOT SEGMENTED but there is simply more than
>one verse in this reference system that map to a single verse in other
>reference systems.

Agreed...

>
>Neither would Matt.13.3.a be appropriate because unless the reference
>system has identified a four component strategy of references.
>(Josephus has a four component strategy for references (ant.3.1.1) as
>there is a Hebrew reference system for Psalms that is employs four
>components (Ps.1.1.a).  If a verse using a four component reference
>system were to be segmented then it should be referenced using the
>grain.  (ant.3.1.1@char:41(wonderful))

Right -- that's why I threw out "_" for the prev/next markers in my 
mail of a few minutes ago.

>
>I attempted to use this strategy in the sample of Matt.13.1-Matt.13.23
>that I sent last night.
>
>Is there any error in the references I used in that file?  (I know that
>I did not use the correct character offsets values for the references in
>the <lineGroup>.  Just lazy late a night.)


:)

>
>This laziness might be point to the need for a useful alternative grain
>strategy that is easier to encode.  Arbitrary comes to mind
>(Matt.13.3@arb:a) but word based would also be much easier than
>character (Matt.13.3@word:9(Once) rather than Matt.13.3@char:43(Once)).

As I had understood it, the parenthesized string is basically a check 
field, and is optional (though awfully smart to use if there's any 
chance the text will get edited).

I had been looking at the grain syntax as being only applicable to 
references, not to self-labeling. Can we get away with simply using 
the unmodified verse identifier for each piece of the verse? That 
seems to simplify retrieving by reference, since you just grab 
everything that matches the reference exactly -- no need to carefully 
ignore a suffix. Also easier for encoders to provide by hand, and 
easy for software to create when confronted with the need to split a 
verse.

>
>
>I am not arguing for the removal of char as a grain type or even as the
>default.  Just proposing that at least word be added as an alternative.

I'm fond of 'word' except for one thing: We'd have to document 
*exactly* the algorithm for finding word boundaries. Do you have one 
in mind? sounds easy, but it's extraordinarily difficult in general 
-- and essentially impossible in languages (such as Japanese, and 
early written Greek) that don't have inter-word spaces at all....

>
>I am also arguing that "apparent" partial verse identifiers not be used
>for segmented verses.
>
>Can we adopt the "word" and possibly "arb" as standard grain values?

I think adding 'arb' would be confusing, since the way we've defined 
grains is as means for identifying things *below* the granularity of 
the reference system. If verse 2a is part of a reference system, it 
should be up in the main part, as either verse number "2a" or as 
another component ".a". I think.

>
>Can we adopt the above describe reference strategy as a "best practice"
>for references of verses that are segmented.
>
>Todd


-- 

Steve DeRose -- http://www.stg.brown.edu/~sjd
Chair, Bible Technologies Group -- http://www.bibletechnologies.net
Email: sderose@speakeasy.net
Backup email: sderose@mac.com, sjd@stg.brown.edu