[sword-devel] InstallMgr details.

Fri May 15 07:03:03 MST 2009

> Short term for SWORD, that is likely to remain an important difference.
> Long term, either a local repo just returns "OK, done, nothing changed"
> when asked to refresh, or you can rethink whether this is really needed...

If local sources were really handled as the others were, then a
refresh would probably also be needed for them as well.

> Thinking of other counter-examples: We do not "web update" before we can
> browse to a new web page.  Nor do we "pdf update" before we can browse
> to a new PDF file, or even "video update" for a video file.  If YouTube
> is considered a large repository of video files, one does not "youtube
> update" before one can watch a new video :)

It is somewhat different from those examples, because in this case, we
are talking about versioned files that may have updates available.
Whether this 'updating' should be transparent to the end-user is a
design decision for the front-end, but I believe it's important that
the library itself should not decided to connect whenever it feels
like it.

> Some (probably too idealistic and blue sky) ideas and thoughts for the
> distant future of SWORD that arise when I think about this:
>
> (1) If Peter von Kaehne's idea that "a SWORD module is like a PDF" is
> accurate and appropriate, then the whole idea of "installing" a SWORD
> module is an unhelpful anachronism that can go away at some point in
> future development.  The end user does not really want to "install" a
> SWORD module, they want to use (read/search/annotate/etc.) it!

An installation containing all modules would be prohibitively large
for most people, so the concept of only installing certain ones is
important. Whether it should be called something else, is again a
front-end decision, but to the library it is an installation.

>
> (2) Or, if modules are always going to be only "installable" entities,
> for whatever reason, then it seems to me to make little sense to provide
> them online as a tree of files per module.  It is surely simpler, more
> efficient, and maybe more logical(?) to provide them as a single
> compressed archive file per module.  Then you let the "install" process
> also decompress them (either after transport to the machine running the
> application, or decompress the byte stream as it arrives, if that is
> better for overall performance).  Remote network transport time and disk
> write time is likely to dwarf any decompression time, even on embedded
> low power CPUs.

Of course, this destroys the idea of "any valid local installation can
also be a remote install source", a concept that is used (eg, in my
network share scenario I mentioned in my previous email).

> Unless you allow direct remote access (RPC-like, or maybe even
> NFS-like?) to the items in the remote SWORD repo (potentially a nice
> blue sky idea, but not currently implemented!), what is the benefit of
> the "unpacked tree of files" format for repo owners, for front end
> developers, or for end users?

As above, any local installation is a valid remote installation as well.

> Right now, without knowing all the history, my understanding is that
> SWORD sort of does both, and so (to me) is confusing... online
> repositories are unpacked, but there is also a  "raw zip" standardized
> way to store (and so transport) SWORD modules.  When does the user pick
> one rather than the other?  Why is the user being asked to make that choice?

The users never have to make that choice. The C++ engine downloads the
different files via ftp, jsword downloads the raw zips. The raw zips
are also available on the web site for frontends who do not have
install capabilities or for users who can't get them to work (for
example, prohibitive firewalls).

> Is there really enough added value in having both to justify the
> additional system complexity that ensues from this "do both" approach to
> SWORD module storage in repositories?

There is no added complexity for the end user. It is more complex for
the library, but I believe Troy explained well the purpose of it. The
bare minimum valid remote source should not have to have any of this
stuff, which is ideal for someone without technical expertise or money
to set up a fancy hosting environment. For a place like CrossWire,
hosting many different modules, going to the extra work to make a
mods.tar.gz and zipped module files is worth it.

>
> (3) Ignoring backward compatibility (!), one could in future make SWORD
> modules available as .zip files (or some other defined compressed
> archive file format), *only*.  An installer would then use URLs to find
> collections of these archive files (and the related repo metadata if
> such is needed/useful), and more specific URLs to download the
> individual archive files, and then install them locally.  This (as Greg
> pointed out) would allow for a very nicely abstracted set of methods
> that could expand to encompass any desired number of different URI
> schemes, from http: to ftp: to file: to sshfs: to something not yet
> invented.

I would certainly be for using .zip files and allowing access to them
over http (and making it all transparently done in the library). But
for the same reasons as above, it won't be the *only* method available
to someone setting up a repo.

> I'd think that all of this can either be in the URL (username and
> password) or else a systemwide config option (proxies, passive vs active
> FTP -- though the "good default" these days for FTP seems to be try
> passive, and if it fails in a certain way, fall back to active).  This
> probably needs a way for the "open a URL" method to prompt the user for
> authentication information (username and pw, usually), but that's all.
> The underlying subsystems (and control panels for users to set proxies,
> etc.) for doing it that way already exist, on most and perhaps all
> platforms, as far as I know, so making them preferences that SWORD front
> ends need to handle specially seems like extra work for both SWORD
> developers and SWORD end users, for no real benefit?

Actually, it is by no means guaranteed that curl will use the proxy
setup of any system. I have had it fail on Windows and linux both,
even though my "systemwide configuration" was set up correctly. I do
not think there is any standard way of providing this information.
curl will look at certain environment variables (ftp_proxy,
http_proxy), which may be standard on linux (but not on Windows), but
they have to be formatted a certain way. Therefore it is really
important to allow users to be able to provide this info manually if
their system configuration doesn't work.

> For instance, longer term still, given a fast enough network pipe, why
> download and install any modules at all -- one should conceivably be
> able to have more of an RPC style approach to accessing a remote
> module... a little like accessing a remote SQL database today... or even
> just a file on a network share... you don't have to copy the entire
> database (or file) to your PC first, before you can use it :)

This would be very expensive for large numbers of modules.

> Going even further, is it necessary or helpful for the API to have the
> concept of "libraries" at all, other than as bookmarks to open modules
> to install?  We don't normally expect PDF files to exist grouped into
> "libraries"; why would we expact SWORD modules to be so grouped, if they
> are in effect just like PDFs? (Even if they are in some ways perhaps
> more like databases, with all the searching and indexing stuff that they
> need... we don't generally group databases into sets of databases based
> on their physical or network location, either).

I suspect that most people tend to think of their collection of
modules as a library rather than individual files. You don't typically
just have one module open; you need an entire collection of modules to
do anything interesting (at minimum, for me, you need a Bible,
commentary, general dictionary, Greek and Hebrew dictionaries all open
at once and working together). Of course, people do organize their
documents into libraries (eg, on Ubuntu, you have Documents->Music,
Documents->Pictures). These are not much different I think except that
the engine/frontend provide a method for keeping them organized (like
Picasa/f-spot for modules). So in this sense, we're ahead of the curve
from looking at them as individual documents.

> Jonathan

I don't disagree with everything you said, but the parts I agreed with
I had already answered elsewhere :)

Matthew