[sword-devel] GenBook osisID and URIs

Chris Little chrislit at crosswire.org
Tue May 13 16:35:38 MST 2008


Karl Kleinpaste wrote:
> (As it happens, GnomeSword understands sword:// and bible://
> equivalently, but I suspect we should do away with the latter.)

I had thought that BibleCS handled bible:// too, but when I checked 
earlier, it didn't.

I'm open to adding bible://. It's certainly an easy addition. But I 
don't know whether we would gain anything from it.

>>> Josephus:The_War_of_the_Jews/.Book_1/.Chapter_2/.Section_3/
> 
> That's profoundly icky.
> 
>> I think simply
>> sword://Josephus/The War of the Jews/Book 1/Chapter 2/Section 3
>> should work, or
>> sword://Josephus/The%20War%20of%20the%20Jews/Book%201/Chapter%202/Section%203
> 
> I have URLs like this in actual use...
> sword://Josephus/%2FThe+Antiquities+of+the+Jews%2FBook+17%2FChapter+2%2FSection+4
> ...because embedded `/' makes me nervous and `+' is the URL space character.

I guess this comes down to parsing, which we'll probably want to build 
into the Sword API to ensure uniform handling across frontends. In other 
words, the application gets "sword://{module(s)}/{key(list)}", calls a 
URI parser, which hands back a list of modules and a list of keys, and 
the application does whatever it likes with that information. And we'll 
want to do a function to perform the reverse, too, with module + key 
list --> URI.

The embedded '/' wouldn't cause much of a problem except as the first 
character of the key (in GenBooks). So, we could either percent-encode 
the '/' characters or just ensure that we don't include the leading '/'. 
I don't think it matters which we pick, but stripping the leading '/' 
certainly lends greater readability. (A third possibility would be to 
just encode the initial '/' when encoding URIs. That would make the 
unlikely case of dictionary keys with a leading '/' safe as well.)

The current URI RFC (3986) actually specifies that spaces are to be 
encoded by %20, but we should probably bear in mind that various 
applications (like older web browsers) might use the older style '+' to 
encode space. So in decoding, we'll want to turn '+' into space, and on 
encoding, we'll want to turn space into %20 and '+' into %2B.

And since we haven't discussed it yet, we would also need to 
percent-encode all other non-safe URI characters. Then we can pass UTF-8 
character strings through URIs, albeit completely unreabably.

--Chris



More information about the sword-devel mailing list