[sword-devel] GenBook osisID and URIs

Tue May 13 16:54:50 MST 2008

On May 13, 2008, at 7:35 PM, Chris Little wrote:

> Karl Kleinpaste wrote:
>> (As it happens, GnomeSword understands sword:// and bible://
>> equivalently, but I suspect we should do away with the latter.)
>
> I had thought that BibleCS handled bible:// too, but when I checked
> earlier, it didn't.
>
> I'm open to adding bible://. It's certainly an easy addition. But I
> don't know whether we would gain anything from it.
>
>>>> Josephus:The_War_of_the_Jews/.Book_1/.Chapter_2/.Section_3/
>>
>> That's profoundly icky.
>>
>>> I think simply
>>> sword://Josephus/The War of the Jews/Book 1/Chapter 2/Section 3
>>> should work, or
>>> sword://Josephus/The%20War%20of%20the%20Jews/Book%201/Chapter%202/Section%203
>>
>> I have URLs like this in actual use...
>> sword://Josephus/%2FThe+Antiquities+of+the+Jews%2FBook+17%2FChapter+2%2FSection+4
>> ...because embedded `/' makes me nervous and `+' is the URL space  
>> character.
>
> I guess this comes down to parsing, which we'll probably want to build
> into the Sword API to ensure uniform handling across frontends. In  
> other
> words, the application gets "sword://{module(s)}/{key(list)}", calls a
> URI parser, which hands back a list of modules and a list of keys, and
> the application does whatever it likes with that information. And  
> we'll
> want to do a function to perform the reverse, too, with module + key
> list --> URI.
>
> The embedded '/' wouldn't cause much of a problem except as the first
> character of the key (in GenBooks). So, we could either percent-encode
> the '/' characters or just ensure that we don't include the leading  
> '/'.
> I don't think it matters which we pick, but stripping the leading '/'
> certainly lends greater readability. (A third possibility would be to
> just encode the initial '/' when encoding URIs. That would make the
> unlikely case of dictionary keys with a leading '/' safe as well.)
>
> The current URI RFC (3986) actually specifies that spaces are to be
> encoded by %20, but we should probably bear in mind that various
> applications (like older web browsers) might use the older style '+'  
> to
> encode space. So in decoding, we'll want to turn '+' into space, and  
> on
> encoding, we'll want to turn space into %20 and '+' into %2B.
>
> And since we haven't discussed it yet, we would also need to
> percent-encode all other non-safe URI characters. Then we can pass  
> UTF-8
> character strings through URIs, albeit completely unreabably.

The other thing to allow is user input which is not encoded.