[osis-users] Validation related OSIS questions

Markku Pihlaja markku.pihlaja at sempre.fi
Wed Nov 14 08:14:35 MST 2012


Thanks DM,

The n attribute does seem quite appropriate indeed:
"...may be used to provide a name or number to identify the
particular element instance. However, it should not be used to encode a
value for which the osisID, osisRef, or other attribute is applicable."

I've always thought of something called "n" as something that contains an
ordinal number, that's why I managed to ignore that attribute previously.
But indeed, it looks like the right place for the Finnish info.

Thanks a lot!


Now there's still my question 2) about marking similar dashes (or other
characters or even strings) for different uses, for someone to reply.

Markku



2012/11/14 DM Smith <dmsmith at crosswire.org>

> The n attribute is probably the best for what you want. It is useful for
> footnote markers, verse numbers (especially when the verse is really a
> passage, ala The Message), ...
>
> So, the markup would be something like (not showing all attributes):
>  <div type="book" n="1. Moos.">
> <chapter n="3">
> <verse n="8">
>
> You typically wouldn't want to put the full reference on the verse.
>
>
> In Him,
> DM
>
> On Nov 14, 2012, at 5:40 AM, Markku Pihlaja <markku.pihlaja at sempre.fi>
> wrote:
>
>
> Thanks Peter,
>
> I guess I need to enlighten you on the goals of our project a bit.
>
> But since this is a rather lengthy explanation, I'll just mention to those
> who read my original questions that two questions still remain relevant:
>
> -----------
>
> 1) Is it possible to declare new custon attributes to tags, such as
> <verse osisID="Gen.3.8"  sID="Gen.3.8" *x-FI_ID*="1. Moos. 3:8" />
> and how do I do that, if it is possible?
>
> 2) Is this ok for encoding the same special character for different usages:
> <milestone type="x-punctuation-dash" marker="mdash" />
> <milestone type="x-range-dash" marker="mdash" />
>
> -----------
>
> And now to the explanation.
>
> We are not creating a file solely for use for programmers to produce
> different electronic applications of the Bible. Actually, that might not
> even be our primary goal. Or well, it is one of two. The main point of this
> project is to update the official source version of the current official
> Finnish translation of the Bible, which was made in 1992. The source files
> are word processor files also from that time and in need of technical
> updating.
>
> In addition to just updating the file format, we naturally also want to
> supply structural info about the contents instead of the more formatting
> oriented info in the old versions.
>
> There are two different main target groups for this source file. Those
> interested in electronic applications are one - a growing one, admittedly.
> But we also want to serve the more traditional target group: book
> publishers. This is why our goal is not to produce highly optimized XML
> code but rather consider also "readability" and "usability" of the code.
> Those words are in quotation marks because we're not talking in purely
> human reading. But we admit that the technical facilities of book
> publishers might not always be perfect, and in particular using some
> external library or coding in general might be quite an effort for some of
> them. So the processing might be half-human - that means for example using
> search & replace operations to produce the desired formatting for a printed
> product.
>
> Including the Finnish abbreviation in every single verse might be an
> overkill, I admit. The abbreviation seldom gets printed into every verse.
> Usually not into every chapter, either. But for those cases when they do,
> we want to supply - as an alternative - something easier than code
> libraries and lookup tables. The extra x-FI-ID attribute would enable
> relatively simple search & replace operations to reach the goal. I also
> believe it might make things easier for the programmer as well.
>
> I'm perfectly aware that from a purely technical point of view it does
> mean loads of redundant data. But in database terms, we don't need normal
> forms for our database, we'd rather supply more flexible alternative ways
> of processing the data.
>
> -----------
>
> As for encoding the ranges in titles as ranges: I will very probably do
> that. But I'm still keeping in mind the search & replace type of publisher
> / editor. Having done quite a lot of searching & replacing myself, I know
> how easy it is to do a global replace that accidentally turns also Bill
> Gates into Bsick Geatss ;), especially when dealing with too large amounts
> of data to confirm every replace. That was my original motivation for
> avoiding use of e.g. the actual em dash character "–" for several different
> purposes. And the display part of the range would still include the dash
> even though the range was encoded.
>
> But yes, it certainly also makes sense to me what you suggest: that for an
> application it's definitely a better approach to search for a range markup
> than scan for characters that look like a range.
>
> About the SWORD library:
> We are planning to include some references to tools that can be utilized
> with our OSIS file and will be glad to mention SWORD there. But could you
> write me a "hard facts" nutshell about it? I tried browsing the SWORD
> website for a while but didn't really find some essential info, such as
> what programming language the library is for - and what exactly it is meant
> for. For example "Research manipulation of Biblical texts" doesn't really
> say much.
>
> Markku
>
>
> -----------
>
> 2012/11/13 Peter von Kaehne <refdoc at gmx.net>
>
>> On 13/11/12 16:29, Markku Pihlaja wrote:
>>
>>> Thanks again!
>>>
>>> I'll first give you one further question and then comment on your
>>> previous answers.
>>>
>>> What would be a good way of including language versions of verse and
>>> chapter id's in the markup? I previously checked here that osisID's have
>>> to use the standard keywords and syntax. But I'd love to be able to
>>> supply the Finnish abbreviation of each verse as additional information.
>>> That is: when the osisID of a verse is "Gen.3.8", it would make life
>>> much easier for utilizers of this OSIS file if the verse also somehow
>>> contained the Finnish standard notation "1. Moos. 3:8".
>>>
>>
>> I think this is in general total overkill. Assuming you are not creating
>> an OSIS document because OSIS so beautiful, but you are creating an OSIS
>> document in order to use it in software you have two problems:
>>
>> 1) how to find a Finnish referenced verse ("1. Moos. 3:8") or verse range,
>>
>> 2) ensure appropriate display of references.
>>
>> Both can be easily solved without having this kind of information a
>> million times repeated within the text.
>>
>> For (1) you need a parsing solution which will parse arbitrary Finnish
>> references and create an OSIS reference from that. CrossWire's Sword
>> library does that and does it well.
>>
>> For (2) - again, this is a matter of simple lookup in a table during
>> rendering. Again, the Sword library would solve that for you.
>>
>> In essence, you should not look at the OSIS references as English
>> abbreviations but as tokens for the computer which happen to be somewhat
>> similar to English abbreviations.
>>
>>
>>  The "obvious" way would be to be able to add a new attribute to the
>>> verse tag, like:
>>> <verse osisID="Gen.3.8"  sID="Gen.3.8" FI_ID="1. Moos. 3:8" />
>>> but that probably isn't possible, is it? Or can I somehow declare new
>>> custom attributes like Chris declared new custom dash entities in his
>>> last reply?
>>>
>>
>> As an aside, any extra attributes you wish to use should look like
>> x-MyAttribute. so, here e.g. "x-fi_ID"
>>
>>  2012/11/8 Peter von Kaehne <refdoc at gmx.net <mailto:refdoc at gmx.net>>
>>>
>>>
>>>      > I'd like to be able to use some code or entity instead of an
>>>     actual dash
>>>      > characters (– or —), at least in some places, since we have two
>>>      > different semantics for the dashes and I'd like to keep them
>>>     separate in the code.
>>>
>>>     Don't have an answer for that, but what is the semantic and is there
>>>     not a better way to code it than the somewhat arbitrary length of a
>>>     dash character?
>>>
>>>
>>> That's a fair question. Indeed it would be nice to find a better way
>>> (I'm not using the length to separate these cases but just different
>>> notations of the same length), but I haven't (at least yet) found the
>>> better way.
>>>
>>> The two different cases are normal em dashes within sentences as
>>> punctuation – just like the dashes in this sentence – and then to
>>> indicate a range of chapters and verses in some headings. The latter is
>>> not in the markup but in the content to be printed (or otherwise shown
>>> to the reader). For example: "Second Speech of Moses (4:44–11:32)" just
>>> before Deut.4.44. The range has been included in the official
>>> translation by the translation committee and thus cannot be omitted.
>>>
>>
>> References as part of titles exist in OSIS and would encode what you want
>> to encode.
>>
>>
>>> At least in Finnish we nowadays use the em dash to indicate ranges as
>>> well as punctuation. And I'd just like to enable the users of this OSIS
>>> file to search for one or the other without getting ambiguous or extra
>>> results.
>>>
>>
>> So, if you encode the range properly your search should then go for the
>> range/passage  rather than simply for a string of text which happen to look
>> like a reference. Does this make sense?
>>
>>    2012/11/9 Chris Little <chrislit at crosswire.org
>>> <mailto:chrislit at crosswire.org**>>
>>>
>>>
>>>         How would you suggest that an exception like this should be
>>>         coded? Add
>>>         some custom type attribute value to indicate special handling in
>>>         layout?
>>>
>>>
>>>     This was exactly the case for which <chapter> was made milestonable.
>>>     You can switch all of your chapter elements to milestones:
>>>
>>>
>>> I was hoping for some other solution. My impression is that these
>>> milestone versions of structure indicators weaken the value and
>>> usability of markup: I'd guess there are numerous tools that assume
>>> "strong" markup where at least the basic structures are marked with
>>> proper start and end tags instead of milestones.
>>>
>>
>> You are right wrt generic xml tools. Specifically a DOM query or an XPATH
>> expression based query which picks up what is easily described as a "child"
>> of a verse or chapter is a lot more complicated to create if start and end
>> tag are milestoned.
>>
>> But - bear in mind again, the Sword library is a entirely different tool,
>> not an XML tool and it is set up to give you fine grained access.
>> Irrespective of XML niceties.
>>
>> Peter
>>
>>
>>
>>
>> ______________________________**_________________
>> osis-users mailing list
>> osis-users at crosswire.org
>> http://www.crosswire.org/**mailman/listinfo/osis-users<http://www.crosswire.org/mailman/listinfo/osis-users>
>>
>
> _______________________________________________
> osis-users mailing list
> osis-users at crosswire.org
> http://www.crosswire.org/mailman/listinfo/osis-users
>
>
>
> _______________________________________________
> osis-users mailing list
> osis-users at crosswire.org
> http://www.crosswire.org/mailman/listinfo/osis-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/osis-users/attachments/20121114/1a1dde47/attachment-0001.html>


More information about the osis-users mailing list