[osis-users] Validation related OSIS questions

DM Smith dmsmith at crosswire.org
Wed Nov 14 08:05:19 MST 2012


The n attribute is probably the best for what you want. It is useful for footnote markers, verse numbers (especially when the verse is really a passage, ala The Message), ...

So, the markup would be something like (not showing all attributes):
 <div type="book" n="1. Moos.">
<chapter n="3">
<verse n="8">

You typically wouldn't want to put the full reference on the verse.


In Him,
	DM
On Nov 14, 2012, at 5:40 AM, Markku Pihlaja <markku.pihlaja at sempre.fi> wrote:

> 
> Thanks Peter,
> 
> I guess I need to enlighten you on the goals of our project a bit.
> 
> But since this is a rather lengthy explanation, I'll just mention to those who read my original questions that two questions still remain relevant:
> 
> -----------
> 
> 1) Is it possible to declare new custon attributes to tags, such as 
> <verse osisID="Gen.3.8"  sID="Gen.3.8" x-FI_ID="1. Moos. 3:8" />
> and how do I do that, if it is possible?
> 
> 2) Is this ok for encoding the same special character for different usages:
> <milestone type="x-punctuation-dash" marker="mdash" />
> <milestone type="x-range-dash" marker="mdash" />
> 
> -----------
> 
> And now to the explanation.
> 
> We are not creating a file solely for use for programmers to produce different electronic applications of the Bible. Actually, that might not even be our primary goal. Or well, it is one of two. The main point of this project is to update the official source version of the current official Finnish translation of the Bible, which was made in 1992. The source files are word processor files also from that time and in need of technical updating.
> 
> In addition to just updating the file format, we naturally also want to supply structural info about the contents instead of the more formatting oriented info in the old versions.
> 
> There are two different main target groups for this source file. Those interested in electronic applications are one - a growing one, admittedly. But we also want to serve the more traditional target group: book publishers. This is why our goal is not to produce highly optimized XML code but rather consider also "readability" and "usability" of the code. Those words are in quotation marks because we're not talking in purely human reading. But we admit that the technical facilities of book publishers might not always be perfect, and in particular using some external library or coding in general might be quite an effort for some of them. So the processing might be half-human - that means for example using search & replace operations to produce the desired formatting for a printed product.
> 
> Including the Finnish abbreviation in every single verse might be an overkill, I admit. The abbreviation seldom gets printed into every verse. Usually not into every chapter, either. But for those cases when they do, we want to supply - as an alternative - something easier than code libraries and lookup tables. The extra x-FI-ID attribute would enable relatively simple search & replace operations to reach the goal. I also believe it might make things easier for the programmer as well.
> 
> I'm perfectly aware that from a purely technical point of view it does mean loads of redundant data. But in database terms, we don't need normal forms for our database, we'd rather supply more flexible alternative ways of processing the data.
> 
> -----------
> 
> As for encoding the ranges in titles as ranges: I will very probably do that. But I'm still keeping in mind the search & replace type of publisher / editor. Having done quite a lot of searching & replacing myself, I know how easy it is to do a global replace that accidentally turns also Bill Gates into Bsick Geatss ;), especially when dealing with too large amounts of data to confirm every replace. That was my original motivation for avoiding use of e.g. the actual em dash character "–" for several different purposes. And the display part of the range would still include the dash even though the range was encoded.
> 
> But yes, it certainly also makes sense to me what you suggest: that for an application it's definitely a better approach to search for a range markup than scan for characters that look like a range.
> 
> About the SWORD library:
> We are planning to include some references to tools that can be utilized with our OSIS file and will be glad to mention SWORD there. But could you write me a "hard facts" nutshell about it? I tried browsing the SWORD website for a while but didn't really find some essential info, such as what programming language the library is for - and what exactly it is meant for. For example "Research manipulation of Biblical texts" doesn't really say much.
> 
> Markku
> 
> 
> -----------
> 
> 2012/11/13 Peter von Kaehne <refdoc at gmx.net>
> On 13/11/12 16:29, Markku Pihlaja wrote:
> Thanks again!
> 
> I'll first give you one further question and then comment on your
> previous answers.
> 
> What would be a good way of including language versions of verse and
> chapter id's in the markup? I previously checked here that osisID's have
> to use the standard keywords and syntax. But I'd love to be able to
> supply the Finnish abbreviation of each verse as additional information.
> That is: when the osisID of a verse is "Gen.3.8", it would make life
> much easier for utilizers of this OSIS file if the verse also somehow
> contained the Finnish standard notation "1. Moos. 3:8".
> 
> I think this is in general total overkill. Assuming you are not creating an OSIS document because OSIS so beautiful, but you are creating an OSIS document in order to use it in software you have two problems:
> 
> 1) how to find a Finnish referenced verse ("1. Moos. 3:8") or verse range,
> 
> 2) ensure appropriate display of references.
> 
> Both can be easily solved without having this kind of information a million times repeated within the text.
> 
> For (1) you need a parsing solution which will parse arbitrary Finnish references and create an OSIS reference from that. CrossWire's Sword library does that and does it well.
> 
> For (2) - again, this is a matter of simple lookup in a table during rendering. Again, the Sword library would solve that for you.
> 
> In essence, you should not look at the OSIS references as English abbreviations but as tokens for the computer which happen to be somewhat similar to English abbreviations.
> 
> 
> The "obvious" way would be to be able to add a new attribute to the
> verse tag, like:
> <verse osisID="Gen.3.8"  sID="Gen.3.8" FI_ID="1. Moos. 3:8" />
> but that probably isn't possible, is it? Or can I somehow declare new
> custom attributes like Chris declared new custom dash entities in his
> last reply?
> 
> As an aside, any extra attributes you wish to use should look like x-MyAttribute. so, here e.g. "x-fi_ID"
> 
> 2012/11/8 Peter von Kaehne <refdoc at gmx.net <mailto:refdoc at gmx.net>>
> 
> 
>      > I'd like to be able to use some code or entity instead of an
>     actual dash
>      > characters (– or —), at least in some places, since we have two
>      > different semantics for the dashes and I'd like to keep them
>     separate in the code.
> 
>     Don't have an answer for that, but what is the semantic and is there
>     not a better way to code it than the somewhat arbitrary length of a
>     dash character?
> 
> 
> That's a fair question. Indeed it would be nice to find a better way
> (I'm not using the length to separate these cases but just different
> notations of the same length), but I haven't (at least yet) found the
> better way.
> 
> The two different cases are normal em dashes within sentences as
> punctuation – just like the dashes in this sentence – and then to
> indicate a range of chapters and verses in some headings. The latter is
> not in the markup but in the content to be printed (or otherwise shown
> to the reader). For example: "Second Speech of Moses (4:44–11:32)" just
> before Deut.4.44. The range has been included in the official
> translation by the translation committee and thus cannot be omitted.
> 
> References as part of titles exist in OSIS and would encode what you want to encode.
> 
> 
> At least in Finnish we nowadays use the em dash to indicate ranges as
> well as punctuation. And I'd just like to enable the users of this OSIS
> file to search for one or the other without getting ambiguous or extra
> results.
> 
> So, if you encode the range properly your search should then go for the range/passage  rather than simply for a string of text which happen to look like a reference. Does this make sense?
> 
>   2012/11/9 Chris Little <chrislit at crosswire.org
> <mailto:chrislit at crosswire.org>>
> 
> 
>         How would you suggest that an exception like this should be
>         coded? Add
>         some custom type attribute value to indicate special handling in
>         layout?
> 
> 
>     This was exactly the case for which <chapter> was made milestonable.
>     You can switch all of your chapter elements to milestones:
> 
> 
> I was hoping for some other solution. My impression is that these
> milestone versions of structure indicators weaken the value and
> usability of markup: I'd guess there are numerous tools that assume
> "strong" markup where at least the basic structures are marked with
> proper start and end tags instead of milestones.
> 
> You are right wrt generic xml tools. Specifically a DOM query or an XPATH expression based query which picks up what is easily described as a "child" of a verse or chapter is a lot more complicated to create if start and end tag are milestoned.
> 
> But - bear in mind again, the Sword library is a entirely different tool, not an XML tool and it is set up to give you fine grained access. Irrespective of XML niceties.
> 
> Peter
> 
> 
> 
> 
> _______________________________________________
> osis-users mailing list
> osis-users at crosswire.org
> http://www.crosswire.org/mailman/listinfo/osis-users
> 
> _______________________________________________
> osis-users mailing list
> osis-users at crosswire.org
> http://www.crosswire.org/mailman/listinfo/osis-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/osis-users/attachments/20121114/34a6d8c5/attachment-0001.html>


More information about the osis-users mailing list