[sword-devel] [osis-editors] Re: The death of OSIS?

Fri Aug 13 21:25:46 MST 2004

I think that *a* goal of OSIS should be that stylesheet processing 
should be fast and efficient and it should influence decisions on the 
schema.

To sum up the point I tried to make and I am trying to make. The 
<milestone type="cquote"/> does not contain enough information to be 
useful beyond the simple case. It needs at least one more well-defined 
attribute.

I have tried to carefully answer your comments. I am merely trying to 
make this one point. But I am afraid that I rambled much too much and 
brought up too many other points.

Patrick Durusau wrote:

> Recall that a milestone marks a point in the text stream, and if you 
> want to use it for continued quotes, those must appear in the PCDATA 
> where you want the continued quote to appear. These are not nodes on an 
> axis so they must appear within whatever PCDATA you are trying to mark.

I would have seen this if I had looked at the schema more carefully. I 
don't think that the mistake changes either the question or the answer.

> 
>> It would be helpful to know what quote it is a continuation of.
>>
> 
> And that is difficult, why?

Perhaps, I am making it more difficult than it really is. Sometimes, I 
can be a bit dense. Maybe I am missing something very obvious.

First my minimal, beginning, simple assumptions (based upon a single 
example from a single language, namely English).

When you are continuing a quotation that is surrounded by " " you need 
to use a ". And when your continue a quote surrounded by ' ' you need to 
use a '. Quotes can nest deeper, but let's pretend for the moment that 
they can only nest one deep.

(As a coder by nature, I am trying to look for a coding solution)

I presume that the milestone (sID/eID) or container form of an element 
is used in a document but not both (just like it says in the OSIS 2.0.1 
draft manual for verse tags).

The problem I see with XSLT is that it does not have much of a memory. 
If XSLT could remember each time it emitted a quote whether the quote 
were the begin or end quote and xslt knew the series of quotes (i.e. 
first ", then '), then it would be simple to emit the same continuation 
quote as what preceded. This might be able to be done with mode="" on 
targets. You would have to have modes for the maximum number of quotes 
of quotes of quotes that you allow (three for q of q of q). I think that 
this is kind of messy.

Otherwise, when xslt sees <milestone type="cquote"/> it will need to 
scan in reverse for the start of the quote. Since the quote could be 
encoded as a milestone (sID/eID) or as a container, there would have to 
be code for both. This would be worse and it would be much slower. In 
XSLT looking for things is always much slower than knowing things.

However, it would be much simpler if the <milestone type="cquote"/> had 
another attribute which could tie back to the start of the quote. Or the 
depth of the quoting. I took a look at the schema and I could not tell 
from the comments whether any of the attributes of a milestone element 
would be appropriate for this.

> 
> Recall that the encoder, not the stylesheet is marking the location of 
> the continued quote. If we are going that far, seems like the careful 
> encoder will use subtype to indicate a particular type of quote they 
> want to appear.

According to the schema subType takes an x-value.

The problem with any of the attributes that take values that begin with 
x- is that they are all non-standard and require customization of 
software to handle them correctly.

I don't think that it is in the best interest of the OSIS standard to 
recommend their use.

> 
> That is to say the use of continued quotes was to allow the translator 
> to insert a marker for the punctuation they desired. In other words, no 
> computational processing, simply recognition of the marker and insertion 
> of the proper character.

I am not sure what you mean by translator here. Do you mean a person who 
translates a work or do you mean XSLT? I think you mean a person.

The marker as specified in your email does not indicate what the marker 
is. I understood the example to mean that " was the first quote mark, 
and ' was the quote mark for contained quotes.

> 
> Having said all of that, why are we now concerned about what quote it is 
> a continuation of? I understood the goal to be avoiding that question by 
> allowing the translator to insert the continued quote mark.

I thought the purpose of the mechanism was to disallow the encoder the 
insertion of the quotation marks and to allow the stylesheet to insert them.

If so, I think it needs some more help to allow the stylesheet to do it 
efficiently.

> 
> If you don't want to compute the containing quotes, then don't. Don't 
> see the advantage in having it partially represented by a marker and 
> partially computed. A clean solution would require doing it one way or 
> the other.

It is not a question of what I want. I think that it is a question of 
preserving the intellectual property of the writer of the work being 
encoded in OSIS. I think that it is my responsibility of a coder to do 
write the computation if it preserves their work.

> 
>>  From a selfish point of view, I agree w/ Troy, I don't want to have 
>> to know the language of the document and the kinds of quotes that are 
>> used for that language.
>>
> 
> Simply because XML/XSLT gives you the ability to do something does not 
> compell you to use it.
> 

I am not sure what you mean. Sword has several hundred modules, which I 
can imagine will be migrated to OSIS some day. Internally JSword 
converts non-OSIS and fragments of an OSIS document to valid OSIS. It 
then merely uses its single stylesheet to present valid OSIS to the user.

> Using the mechanisms already present in OSIS, you can encode the 
> quotation marks that have been mentioned, and still have the ability to 
> distinguish the use of apostrophe's for instance, something no one has 
> commented about.

I thought that it was clear that having the q element and the cquote 
make it such that a non-quote apostrophe is not an issue.

> 
> In order to localize an interface, which I assume will be displaying 
> material from the text, I would suggest knowing something about the 
> language beyond simply following what a translator has done is probably 
> essential.

The interface localization is entirely independent of displaying a text 
which may be in a different language than the user. The JSword goal is 
to visually present the text just as if they opened up a book.

> If you don't understand why something has been done, which 
> may depend upon both linguistic and cultural context, you may very well 
> be creating an interface that impedes rather than enhances access to the 
> text.
> 

The exact point is that embedding this knowledge into a program is very 
problematic, error prone and perhaps, as they say in computer science, 
"hard".

But what we are discussing here is merely the continuation mark. If a 
paragraph starts within a quotation and that paragraph is to display a 
continuation mark, then in English it is to be the same mark as began 
the quotation.

>> Can you suggest a way to determine the proper quoting if the text were 
>> Hindi? Hebrew? Greek? Spanish? French?
>>
> 
> Quotation rules vary not only by language but by time period. The rules 
> for modern usage, I assume that is what we are discussing?, can be found 
> in any text treating a particular language.

This is very true. While I would prefer to be able to have the same 
quotation mark that is in the text as it was published. I was not trying 
to drive to that end.

I was trying to see how your suggestion would work for a situation that 
is more complicated.

Specifically, I was asking how a language independent stylesheet could 
be made aware of the correct quotation system for the fragment of the 
document that is being processed. I assume it would be via parameter 
passing, or encoded in the markup or text.

It is my opinion that the quotation marks should be in the language of 
the document and not that of the reader (when the reader is of a 
different locale).
> 
> There has been a lot of work done on localization and over the weekend I 
> will see if I can find some pointers to punctuation issues. That is the 
> most likely place to find a set of rules already mapped out.
>

Again, we are not trying to localize the text to the reader, but to the 
writer. This is of critical importance.

> Will see what I can come up with.
> 
> Hope everyone is looking forward to a great weekend!
> 
> Patrick