[sword-devel] Faulty markup of multiple variants

David Instone-Brewer Technical at Tyndale.cam.ac.uk
Mon Feb 11 09:05:33 MST 2013


There are problems with the markup of variants in 
the Byz module when there is more than one variant in the same verse.
It looks like something which happened during a 
global change which didn't take into account the 
possibility of more than one variant in the verse,
because each verse is now encoded as if they contained only one variant.

Unfortunately this can't be fixed without looking 
at each variant because (as in the example below) 
the variant may have more than one word, and the 
current markup gives no clue about where the 
first variant ends or where the second variant starts.

There are only about a dozen occurrences in 
Byz  and I don't think there are any in TR but there are more than 50 in WHNU.
You can find them by searching for: 
variant[^$]*<[^$]*<[^$]*<[^$]*</seg>[^$]*variant
(though this produces a false positive when there 
is more than one lemma in the first variant reading).

The coding for a variant should be (as far as I can see):

<seg subType="x-1" type="x-variant">
     <w lemma="..." morph="...">GREEK</w>
</seg>
<seg subType="x-2" type="x-variant">
     <w lemma="..." morph="...">GREEK</w>
</seg>


But when there is more than one variant in a verse this has become corrupted
(MISSING indicates a tag which is no longer in the module)

<seg subType="x-1" type="x-variant">
     <w lemma="..." morph="...">GREEK</w>
MISSING: </seg>
MISSING: <seg subType="x-2" type="x-variant">
     <w lemma="..." morph="...">GREEK</w>
MISSING: </seg>
     <w lemma="..." morph="...">GREEK</w>
     <w lemma="..." morph="...">GREEK</w>
     <w lemma="..." morph="...">GREEK</w>
MISSING: <seg subType="x-1" type="x-variant">
     <w lemma="..." morph="...">GREEK</w>
</seg>
<seg subType="x-2" type="x-variant">
     <w lemma="..." morph="...">GREEK</w>
</seg>

For example
(the Greek in the following will get corrupted by 
email, so it is in an attachment)

$$$Matthew 19:5 ...
<seg subType="x-1" type="x-variant">
     <w lemma="strong:G3962" morph="robinson:N-ASM">pate?a</w> |
     <w lemma="strong:G3962" 
morph="robinson:N-ASM">pate?a</w> <w 
lemma="strong:G846" morph="robinson:P-GSM">a?t??</w> |
     <w lemma="strong:G2532" 
morph="robinson:CONJ">?a?</w> <w 
lemma="strong:G3588" 
morph="robinson:T-ASF">t??</w> <w 
lemma="strong:G3384" 
morph="robinson:N-ASF">µ?te?a</w> <w 
lemma="strong:G2532" morph="robinson:CONJ">?a?</w> |
     <w lemma="strong:G4347" 
morph="strongsMorph:G5701 robinson:V-FPI-3S">p??s???????seta?</w>
</seg>
<seg subType="x-2" type="x-variant">
     <w lemma="strong:G2853" 
morph="strongsMorph:G5701 robinson:V-FPI-3S">???????seta?</w>
</seg>
...

This should be:
$$$Matthew 19:5 ...
<seg subType="x-1" type="x-variant">
     <w lemma="strong:G3962" morph="robinson:N-ASM">pate?a</w> |
</seg>
<seg subType="x-2" type="x-variant">
     <w lemma="strong:G3962" 
morph="robinson:N-ASM">pate?a</w> <w 
lemma="strong:G846" morph="robinson:P-GSM">a?t??</w> |
</seg>
     <w lemma="strong:G2532" 
morph="robinson:CONJ">?a?</w> <w 
lemma="strong:G3588" 
morph="robinson:T-ASF">t??</w> <w 
lemma="strong:G3384" 
morph="robinson:N-ASF">µ?te?a</w> <w 
lemma="strong:G2532" morph="robinson:CONJ">?a?</w> |
<seg subType="x-1" type="x-variant">
     <w lemma="strong:G4347" 
morph="strongsMorph:G5701 robinson:V-FPI-3S">p??s???????seta?</w>
</seg>
<seg subType="x-2" type="x-variant">
     <w lemma="strong:G2853" 
morph="strongsMorph:G5701 robinson:V-FPI-3S">???????seta?</w>
</seg>
...


I'm not sure if the subType="x-2"  has to have a 
unique number. If so, the latter pair would be 
subType="x-3"  and  subType="x-4"

David IB
-------------- next part --------------

BTW the Byz module is marked up wrongly when there are two variants. The first variant has no markers for the second half and the second variant has no markers for the first half. I suspect this is an error introduced some time by a global change which didn't take into account multiple variants in the same verse, eg: 
$$$Matthew 19:5
...
<seg subType="x-1" type="x-variant">
    <w lemma="strong:G3962" morph="robinson:N-ASM">πατερα</w> |
    <w lemma="strong:G3962" morph="robinson:N-ASM">πατερα</w> <w lemma="strong:G846" morph="robinson:P-GSM">αÏ
τοÏ
</w> |
    <w lemma="strong:G2532" morph="robinson:CONJ">και</w> <w lemma="strong:G3588" morph="robinson:T-ASF">την</w> <w lemma="strong:G3384" morph="robinson:N-ASF">μητερα</w> <w lemma="strong:G2532" morph="robinson:CONJ">και</w> |
    <w lemma="strong:G4347" morph="strongsMorph:G5701 robinson:V-FPI-3S">προσκολληθησεται</w> 
</seg>
<seg subType="x-2" type="x-variant">
    <w lemma="strong:G2853" morph="strongsMorph:G5701 robinson:V-FPI-3S">κολληθησεται</w> 
</seg>
...

This should be: 
$$$Matthew 19:5
...
<seg subType="x-1" type="x-variant">
    <w lemma="strong:G3962" morph="robinson:N-ASM">πατερα</w> |
</seg>
<seg subType="x-2" type="x-variant">
    <w lemma="strong:G3962" morph="robinson:N-ASM">πατερα</w> <w lemma="strong:G846" morph="robinson:P-GSM">αÏ
τοÏ
</w> |
</seg>
    <w lemma="strong:G2532" morph="robinson:CONJ">και</w> <w lemma="strong:G3588" morph="robinson:T-ASF">την</w> <w lemma="strong:G3384" morph="robinson:N-ASF">μητερα</w> <w lemma="strong:G2532" morph="robinson:CONJ">και</w> |
<seg subType="x-1" type="x-variant">
    <w lemma="strong:G4347" morph="strongsMorph:G5701 robinson:V-FPI-3S">προσκολληθησεται</w> 
</seg>
<seg subType="x-2" type="x-variant">
    <w lemma="strong:G2853" morph="strongsMorph:G5701 robinson:V-FPI-3S">κολληθησεται</w> 
</seg>
...

This happens 13x in Byz. 
YOu can find them by searching for: variant[^$]*<[^$]*<[^$]*<[^$]*</seg>[^$]*variant


More information about the sword-devel mailing list