[sword-devel] GenBook osisID and URIs

Chris Little chrislit at crosswire.org
Tue May 13 17:14:15 MST 2008

DM Smith wrote:
> Chris Little wrote:
>> I think simply
>> sword://Josephus/The War of the Jews/Book 1/Chapter 2/Section 3
>> should work, or
>> sword://Josephus/The%20War%20of%20the%20Jews/Book%201/Chapter%202/Section%203
>> encoded as an URL.
> This does not answer the osisID question. If one had to encode the 
> GenBook key into an osisID for an OSIS encoded GenBook how would it be 
> represented, given that spaces are not allowe and periods have reserved 
> meaning?
> Based upon the answer to that, how would the URL be?

Ok, I understand now, but I don't really have an answer. They way we 
would want this GenBook key to be represented as an osisID is something 
like "Josephus:Wars.1.2.3". (The title is wrong in the module. Wars is 
supposed to be plural.)

If I were to redo the module (which I probably will), I would do the 
osisIDs like that, and we would end up with a TreeKey of /Wars/1/2/3.

At Perseus, the URL is 

So their hierarchy is: "book 1":"whiston chapter 2":"whiston section 3". 
(Whiston is the translator, and his divisions of the works of Josephus 
represent one of two significant systems that I know of.)

But they also let you do lookup of "J. BJ 1.2.3" (J = Josephus, BJ = De 
Bello Judico) to get that passage.

In their TEI source, they have:
   <div1 type="Book" n="1" org="uniform" sample="complete">
     <milestone n="2" unit="Whiston chapter" />
       <milestone n="3" unit="Whiston section" />
       <milestone n="54" unit="section" />

Since we want a solution here and now, we need to find a way to pass 
encode our TreeKeys as osisID and figure out how to pass those back and 
forth as URIs.

osisIDs can use Letters and Numbers (in the document's encoding, UTF-8 
for example). All other characters are to be escaped. I _believe_, but 
would like to confirm this somehow, that '_' is the replacement 
character for space, and all other characters can be expressed via 
\{character} escapes. (So if you want '_' you have to encode it as '\_'.)

That said, I think our osisID for the CURRENT version of Josephus would 
be the rather ugly:


We can either encode that directly as an URI:


or decode it and re-encode as:


(ignoring any changes we might decide to make to my previous proposal 
regarding embedded '/'.)


