[sword-devel] Progress on SOT format
Troy A. Griffitts
scribe at crosswire.org
Tue May 25 14:38:27 MST 2010
To add to what Barry has said,
In summary soText is a "Stand Off" markup concept.
Currently in SWORD when doing a non-indexed, brute-force search, we need
to strip all tags away from each entry before performing the search.
This is fine on a decently powered system and a complete search of a
richly marked-up Bible is done in seconds, but on a lower powered
device, the filter stripping can take some time-- in fact about 2/3rds
the search time can be spent preparing the text by extracting tags.
soText is optimized for the search function by storing the text in the
form that is used for searching. The tags are stored separately with
their offset of where to insert into the plain text to reconstruct the
So soText /rendering/ takes a little longer, but simple brute-force
/searching/ is faster.
On 05/25/2010 12:15 PM, Barry Drake wrote:
> On Tue, 2010-05-25 at 11:13 -0700, David Haslam wrote:
>> Please provide some further background. What is soT format? Where is it being
> Background: a system proposed by Lynn Allen (he uses it in BerBible) and
> has given permission to incorporate his principle though not his code,
> into Sword.
> Structure: Each line begins with a two byte length pointer followed by
> the text. This is followed by a series of strings containing the
> non-biblical information such as footnotes etc each with markers. I do
> have a text with full information somewhere.
> Overall, the verse-line is structured to allow fast searching on a slow
> system as only the text need be searched as footnotes etc are not
> embedded in the verse text.
> Where used: Nowhere so far, but the concept is excellent, and modules
> would be as standard Sword modules, but using a different filter to
> render the verses. Robin Randall offers his perl scripts to make a
> SoText module in (by default) Sword import format (use imp2mod) and to
> provide a proof of concept. Next stage would be to offer SoText
> versions of popular modules that work via a SoText filter transparently
> alongside regular modules. SoText format might be the preferred format
> on slow machines such as phones.
> God bless, Barry.
More information about the sword-devel