[osis-core] Re: Thoughts and Questions on compare file

Jim_Albright at wycliffe.org Jim_Albright at wycliffe.org
Wed Oct 27 09:28:29 MST 2004


Thanks for looking at my problem domain.
>>>>>>>>see comments below

Jim Albright
704 843-0582
Wycliffe Bible Translators






Patrick Durusau <Patrick.Durusau at sbl-site.org>
10/27/2004 08:54 AM

 
        To:     Jim_Albright at wycliffe.org
        cc:     Jeff_Gayle at sil.org, osis-core at bibletechnologieswg.org
        Subject:        Thoughts and Questions on compare file


Jim,

Some thoughts and questions on the latest compare file.

At first blush I did not see anything that we can't handle in OSIS now 
or with minor modification.
>>>>>>>>great

Decided to copy the osis-core list so we can get comments from others as 
well.

Will be checking today on actually producing a more restricted schema, 
that is one that takes out the x- capability and leaves only enumerated 
values. I take it you are aware that your software interface does not 
have to display the possibility of an x- value for attributes? That is 
to say you could only display enumerated values and not leave the user 
any mechanism for departing from the list?
>>>>>>>>>>>our interface will be a flat file structure Translation Editor, 
like Word with paragraph and character style names
 
Side note to osis-core on the restricted schema: What I am envisioning 
is a very small schema that imports osis-core and redefines/restricts 
attribute values to the enumerated lists. Since such an instance would 
be a conforming OSIS document (the greater always includes the lesser) I 
don't see any compatibility problems.
>>>>>> exactly what I want

Specific comments follow:

Alluded_Text: Posting a note today to add this as enumerated value on seg.
>>>good

Attribution: Should this be on lg or l? Thinking that it is more likely 
on l as part of a lg. On the other hand, don't you have the same case 
where you would not be using lg or l?
>>>either will work ... there is a better case for line group as some 
closures have two lines.
>>>you may want to add attributes to closer

Book_ID: I am not sure what you are asking for here? Isn't this already 
contained in the osisID?
>>>> think we can skip okay

Chapter_Head, Chapter_Label, Chapter_Number: Aren't these variants of 
title? That is to say that a chapter has a title and these are 
'additional' titles of some particular type?
>>>>toss out chapter label
>>>>Chapter Number is <chapter osisID="Genesis.1"> we have that is osisID
>>>>Chapter Head is like  Chapter Two
My first reaction is to have:

Chapter_Head = title type="main"
>>>>>I wouldn't be able to round trip this then

Chapter_Label = title type="sub"
>>>>toss out

Chapter_Number = title type="sub"
>>>> no

Granted that osisTitles (for type on title) only enumerates

<xs:enumeration value="acrostic"/>
<xs:enumeration value="continued"/>
<xs:enumeration value="main"/>
<xs:enumeration value="parallel"/>
<xs:enumeration value="psalm"/>
<xs:enumeration value="sub"/>
>>>>>>>>>> I think value="chapter" would work

Citation_Line1 (etc): Question, you list lg but I assume l?
>>>>> <citation> would be preferred

BTW, we have otPassage on seg. So, would <l><seg>...</seg></l>, meet the 
need here?

Err, actually just looked at the John the Baptist example and so you 
would want something like otPassage on lg. Hmmm, then you could do the 
line1, line2, etc. with XSLT. Actually would reduce the amount of markup 
you would need since if I am in a lg type=otPassage, then lines 1, 2, 
etc. fall out from the structure. I think that works for me if it works 
for you.
>>>> yes that is where I would want it .... but citation is more generic
>>>> I have cases where in the introduction a key verse in the following 
book
>>>> is cited ... thus citation rather than otPassage.

Citation_Paragraph: Looks like a block quote that contains a paragraph, 
which as you know can contain a reference element. I think this is what 
we used to call <cit> in TEI, which had a <q> followed by a <ref> (don't 
hold me to the names) element.
>>>>> Citation Paragraph and Citation Line1 are related ... if we can put
>>>>> a <cit> around them then we only need to put in p, l, ref ....

Not sure what would be different about using a block quote that contains 
a reference element.
>>>>>>block quote is descriptive
>>>>>> citation is meaning based

Citation_Reference: Why isn't this simply a reference with a type? That 
is to say all references are citation_references in some sense of the 
term. Since we have an element for marking all references, why not use 
that and add a type if necessary?
>>>>>>if we have <cit> or <citation> then just <ref> works ... the type is 
inferred by context


Closing: What is the problem with closer here?
>>>>>> two types of closers need  ... one for end of book, end of preface, 
and other for "says the Lord" in prophecies.

Congregational_Response: I will post this to the list. Suggestion for 
attribute value? response? congregation? on lg.
>>>>>>>either is okay with me

Copyright_Statement: Covered under rights in the header. This is the 
standard location in Dublin Core. Don't think we gain anything by adding 
another potential location for the information.
>>>>>>>> I see the need for three groups of things on the Copyright Page
>>>>>>>> this is still under development in TE
>>>>>>>> But for formatting the copyright page the three units are
>>>>>>>> Credits, Rights, and Copyright Statement ... with a possible code 
for internal control (SIL adds the job code here)
>>>>> see note on Rights below

Unless you mean to say that the copyright page as an artifact needs to 
be encoded. I suspect we could add a type to div but to be honest, I 
don't see the point. Just make it a div and if you need copyright 
information, get it from the header.
>>>> Yes I want div type="copyrightPage"
Credits: Same here, I could counsel just paragraphs with the usual 
sub-elements. Don't gain anything if you have properly prepared the 
header which has the very long enumeration of roles for credit, etc. I 
guess in part I don't see any reason to privilege a poor retelling of 
information already presented in a useful fashion in the header.
>>>>>> credits require the page number for each use ... so David C. Cook 
lets us use pictures found on page x,x,x,x,x .... which may need to be 
added by hand ... also sometimes the info in header is in English but in 
Credits it will be in national language.




Not saying people should not enter it, but it is just an artifact of 
printing and not something you will need to identify later for 
processing. For those purposes, use the header information.
>>>>>Printing is our main goal so it is much more than an artifact

Doxology: Let me check on that one. I know we have discussed, probably 
should be added.
>>>>> Yes .... found at end of each book of Psalms  .... usually formatted 
centered text

Embedded_text (all entries): We have discussed, posting to list for 
adding type to q.
>>>>>> q type="embeddedText" is great
>>>>>> div type="embeddedText" works too

Emphasis: ??? Sorry, why isn't this covered by hi?
>>>>>hi says what it LOOKS like
>>>>>In a text to speech conversion how do you say italic text? How do you 
say superscript?

Gloss: Hmmm, would require annotateRef so you could like to the word or 
phrase being glossed. Suggest same places and content model as hi?
>>>>>>>> yes it is similar to emphasis, hand, ..... hi is okay as long as 
there is <hi type="gloss">
or probably better <seg type="gloss">

Hand: Perhaps it is just confusion with the use of 'hand' to mean in 
transcription circles the scribal hand and not references to hand in the 
text, but I don't see this as a different element. Fair enough that the 
text says: 'in my own hand' but that does not seem to me to be a 
separate element in the structure of the text. We should discuss this 
one. I would suggest hi or seg at first blush.
>>>>>>>>>>>but  'in my own hand' is formatted differently very often so I 
need to distinguish it
>>>>>>>>>  <seg type="hand">

Inscription_Paragraph: Why doesn't the inscription element work here?
>>>>> inscription/p should work fine... I wasn't thinking

Interlude: Selah is enumerated under osisLine. Suggest same for Interlude?
>>>> good   Interlude and Selah are interchangeable

Intro (all): Suggest introduction type on div?
>>>>> yes   <div type="introduction"> already there

Line1-*: I assume from your notes you are handling these with XPath 
expressions?

Name_of_God: enumerate types, I assume you don't have any to add to the 
list I posted?
>>>>> this is for YHWH

Paragraph_Continuation: As we discussed, this is handled automatically 
in tree representations.
>>> yep

Parallel_Passage_Ref: Yes, handled by reference element
>>>>>>>>> yes

Quoted_Text: Why isn't this handled by q? As opposed to seg type = 
quoted text?
>>>>> quoted text is ot quote in nt ... as opposed to direct speech

Refrain: add type to lg?
>>>> yes

Rights: In terms of accessible information, handled by Dublin Core 
element in header. Is there some reason to duplicate here?
>>>>> may be enough here for content but 
>>>>> <p type="credits">
>>>>> <p type="rights">
>>>>> <p type="copyrightStatement">
>>>>> would really help for formatting

Section_Head_List: Do you mean a list within a list? If so, note that 
list contains list and all lists have head.
>>>>>> I would prefer <div type="list"> to allow for the A, B, C in Hebrew 
for PSA 119
>>>>>> and the 1, 2, 3 in PRO 22

See_in_Glossary: ?? The reference element is not empty. Reference is not 
limited to simply being Bible references but can contain a reference to 
another part of the work, such as a glossary, perhaps a map, etc.
>>>> the need is to be able to put a star in the printed text, and 
hyperlink in HTML

So_Called: Actually these are examples of the mentioned element.
>>>>> maybe a second look would help here..... I do have problems 
distinguishing them but
>>>>> I believe there are two categories "mentioned" and "so called"

BTW, the example in the help file is incorrect. The last occurrence of 
'sinners' is not a mentioned or so_called element. The quoted statement 
of the Pharisee's were *using* the term. The preceeding uses are 
examples of mentioned, that is the apeaker of those occurrences was not 
*using* the term.
>>>>>> thanks for catching that
>>>> see note above
>>>> it looks like the error is in the NIV as so called is formatted with 
quotes ....
>>>> Jesus uses sinners without quotes in the last line 
>>>> I think the Pharisees' comment would mean true sinners but the NIV 
put it in 'sinners'
>>>> so I have marked it correctly for the formatting of the text but NIV 
should change
>>>> NIV is the only English text so far that I find the "sinners" used.

Speech_lines (all): I assume you are going to handle these with XPath?
>>>>> ??

Stanza_Break: add type to lg?
>>>>yes  <lg type="stanza">
>>>>> just a few additions go a long way towards resolving my problems ... 
like <lg type="stanza">


Title_Main and Title_Tertiary: I assume the type on title works for 
main. Since sub titles should be inside of title, is there a need for 
another type here?
>>>>>>three levels exist: main, secondary, tertiary
>>>>>> I can only find  : main, sub in osis so would like one more 
>>>>>> : main, sub, sub (nested sub should work okay ... teriary is very 
rare) 

Thinking <title type="main"> blah, blah <title type="sub">blah, blah 
<title type="sub">blah, blah</title>(closes tertiary title) </title> 
(closes secondary title) </title> closes main title, and allow you to do 
uniform XPath expressions for all cases.

If you are going to use styles, etc., so no one sees the XML, suggest 
the embedding method for more reliable XPath processing.
>>>>>> ?? please elaborate

Untranslated_Word: ??? Sorry, you have me on that one and I could not 
find an example in the help file. Wouldn't this be foreign?
>>>>> foreign should work as it is in back translation and untranslated

Variant_Section_Paragraph/Head/Tail: Note sure what is being requested. 
Look at rdg. By definition, rdg is a variant so everything inside is 
about a variant. Perhaps if you could say a bit more about this one.
>>>>>> all of some endings to MRK are in italic showing it is a variant
>>>>> so <div type="variant"> would work well here

Verse_Number_Alternate: Hmmm, why not have more than one osisID, with 
the alternative prefixed by a work? Display is of course up to the 
application.
>>>>> so that would be <verse osisID="Genesis.32.1"><verse 
osisID="xxx:Genesis.32.2>
>>>>> that would work ... and also on Chapter Number Alternate

Words_of_Christ: I take it that the who attribute works for you?
>>>> who works

>>>>>>>>> <div type="tableOfContents"> also requested



Hope you are having a great day!

Patrick

-- 
Patrick Durusau
Director of Research and Development
Society of Biblical Literature
Patrick.Durusau at sbl-site.org
Chair, V1 - Text Processing: Office and Publishing Systems Interface
Co-Editor, ISO 13250, Topic Maps -- Reference Model

Topic Maps: Human, not artificial, intelligence at work!







More information about the osis-core mailing list