[osis-users] osis.py

Weston Ruter westonruter at gmail.com
Sat Jun 26 07:23:36 MST 2010


Excellent questions, Robert.

The OSIS XML Schema has the following regular expression for the osisWork
type:

((\p{L}|\p{N}|_)+)((\.(\p{L}|\p{N}|_)+)*)?

Which I'm simplifying in Python to (with re.UNICODE):

\w+(\.\w+)*

Note that this is even more restrictive than the passage part of an osisID:

(\p{L}|\p{N}|_|(\\[^\s]))+)(\.(\p{L}|\p{N}|_|(\\[^\s]))*)*

Which again I'm simplifying in Python to:

(\w|\\\S)+(\.(\w|\\\S)+)*

Note that for both osisWork and osisPassage, not even a bare hyphen is
technically allowed, so using "Bible.en.OET-LV" would be illegal.
Furthermore, osisWorks also don't allow escapes (but osisPassages do), so
this would also be illegal "Bible.en.OET\-LV". So backslash escapes are
allowed in osisPassages but not osisWorks, and quoted segments are allowed
in neither (Bible.en."Freely-Given.org".OET-LV.2011). I am not sure why the
osisWork is a more limited subset of the pattern used in osisPassage. Not
being able to include a domain name in an osisWork seems like a big
drawback.

So far as encoding "OET-LV" in the osisWork, since hyphens aren't allowed,
an alternative option is to use "OET_LV". But actually, it would probably be
best to just break it up into two segments: "OET.LV". Multi-segment work
names aren't yet supported by osis.py (it allows a single segment for
publisher and a single segment for the work name).

Troy mentioned that nothing was generally agreed upon beyond
"Type.lang.ABBR" (e.g. Bible.en.KJV), but I have been thinking [1] about
standard ways to indicate version, revision, and edition numbers or names,
like perhaps:

v2_1
r2341
edName

[1]
http://github.com/openscriptures/api/blob/92b6ee5420c269830baf85503270ccd4cdf4d6c5/osis.py#L451

Troy and Chris: any more insights into the osisWork identifier?

Thanks!
Weston


On Fri, Jun 25, 2010 at 9:39 PM, Robert Hunt <hunt.robertj at gmail.com> wrote:

>  On 21/06/10 19:28, Weston Ruter wrote:
>
> All of the objects are now built out for osis.py, a Python module for
> representing OSIS "things". These include:
>
>    - OsisWork (Bible.en.ChurchOfEngland.KJV.1611)
>     - type (Bible)
>        - language (en)
>        - publisher (ChurchOfEngland)
>        - name (KJV)
>        - pub_date (1611)
>        - pub_date_granularity (1)
>
>  I'm planning to start studying, testing and using Weston's code in two
> weeks time, but in just re-reading this email I have some questions. I am
> working to start a new Bible translation. The details would be:
>
>    - type (Bible)
>     - language (en)
>     - publisher (Freely-Given.org)
>     - name (OET-LV)
>     - pub_date (2011)
>     - pub_date_granularity (1) ??? What's this
>
> My main question is: What if the publisher name has a dot in it like the
> above? Can it be quoted (or have the dot escaped)?
>     e.g., OsisWork (Bible.en."Freely-Given.org".OET-LV.2011) or OsisWork
> (Bible.en.Freely-Given\.org.OET-LV.2011)
>
> Other questions include:
>     What if there's a version number? e.g, 0.2 or 1.0.1
>     What if there's an edition name? e.g., Men's Study Edition. (but maybe
> that's irrelevant if the Biblical text remains constant and it's only a
> "packaging" decision regarding additional notes and side-boxes???).
>
> Just thinking out loud,
> Robert.
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "Open Scriptures" group.
> To post to this group, send email to open-scriptures at googlegroups.com.
> To unsubscribe from this group, send email to
> open-scriptures+unsubscribe at googlegroups.com<open-scriptures%2Bunsubscribe at googlegroups.com>
> .
> For more options, visit this group at
> http://groups.google.com/group/open-scriptures?hl=en.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/osis-users/attachments/20100626/b05809ac/attachment.html>


More information about the osis-users mailing list