[sword-devel] OSIS Glosses?

DM Smith dmsmith at crosswire.org
Fri Dec 12 13:58:30 MST 2014


There are 4 standard entities that are predefined for XML. (I used to think that it was 5 with both " and ' being defined.) XML allows decimal entities of the form &#ddd;. Any others need to be defined in a DTD. A schema (an xsd in the case of OSIS) does not allow for the defining of entities. (I’m not familiar with other schemas types.)

Regarding parsing and validator: An xml document may be well-formed, but not valid. The former is the responsibility of the parser. The latter is the responsibility of a validator. A validator takes it’s content from the parser, which may be an in memory tree and compares it to a schema or DTD. What the validator gets, as far as I know, is without entities.

— DM

> On Dec 12, 2014, at 9:01 AM, Greg Hellings <greg.hellings at gmail.com> wrote:
> 
> If that's the case, how does it handle escaping <>? I believe entity replacement is after XML validation but before passing them to a transformer or such.
> 
> On Dec 12, 2014 7:52 AM, "DM Smith" <dmsmith at crosswire.org <mailto:dmsmith at crosswire.org>> wrote:
> Best I can recall:
> Nope. An entity is merely an alternate way of specifying a character. The XML parser is supposed to replace the entity with the corresponding code point before the value is evaluated against the schema.
> 
>> On Dec 12, 2014, at 8:49 AM, Greg Hellings <greg.hellings at gmail.com <mailto:greg.hellings at gmail.com>> wrote:
>> 
>> It should be possible to escape any such characters with an XML entity, no?
>> 
>> On Dec 12, 2014 7:44 AM, "DM Smith" <dmsmith at crosswire.org <mailto:dmsmith at crosswire.org>> wrote:
>> 
>> > On Dec 12, 2014, at 8:26 AM, Peter Von Kaehne <refdoc at gmx.net <mailto:refdoc at gmx.net>> wrote:
>> >
>> > Gesendet: Freitag, 12. Dezember 2014 um 13:16 Uhr
>> > Von: "Troy A. Griffitts" <scribe at crosswire.org <mailto:scribe at crosswire.org>>
>> >
>> >> Not sure, but I thought we used optional prefixes to specify the kind of gloss if there are multiple, e.g., > gloss="en_US:18&nbsp;wheeler en_UK:articulated&nbsp;lorry"
>> >
>> > Should there be an option to escape colons?
>> 
>> IMHO:
>> Yes.
>> 
>> The definition of gloss in the schema is xs:string, not osisGenRegex.
>> The former places no semantic on the content an allows for an empty string.
>> 
>> If gloss should have a semantic, then it should be changed in the OSIS spec.
>> 
>> The latter is used by lemma and morph and is specified as:
>> ((((\p{L}|\p{N}|_)+)(\.(\p{L}|\p{N}|_))*:)?([^:\s])+)
>> which basically is work:value.
>> If I read this right it does not allow for :  to be escaped. I know we allow lemma=“x:a y:b” but I don’t see that this allows for the pattern to be repeated, separated by spaces.
>> 
>> The pattern would need to change ([^:\s])+ to (\\:|[^:\s] <>)+  [ not tested ]
>> 
>> In His Service,
>>         DM
>> _______________________________________________
>> sword-devel mailing list: sword-devel at crosswire.org <mailto:sword-devel at crosswire.org>
>> http://www.crosswire.org/mailman/listinfo/sword-devel <http://www.crosswire.org/mailman/listinfo/sword-devel>
>> Instructions to unsubscribe/change your settings at above page
>> _______________________________________________
>> sword-devel mailing list: sword-devel at crosswire.org <mailto:sword-devel at crosswire.org>
>> http://www.crosswire.org/mailman/listinfo/sword-devel <http://www.crosswire.org/mailman/listinfo/sword-devel>
>> Instructions to unsubscribe/change your settings at above page
> 
> 
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org <mailto:sword-devel at crosswire.org>
> http://www.crosswire.org/mailman/listinfo/sword-devel <http://www.crosswire.org/mailman/listinfo/sword-devel>
> Instructions to unsubscribe/change your settings at above page
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20141212/738fc0dc/attachment.html>


More information about the sword-devel mailing list