[sword-devel] Updating Clarke commentary to become readable

Karl Kleinpaste karl at charcoal.com
Sun Sep 24 14:10:54 MST 2006


The nasty little script below takes the current Clarke content and
strips the extraneous <br /> elements out in a coherent fashion.  This
makes the Clarke content actually readable, as opposed to its current
state, which (unless you allow for a very wide commentary subwindow)
is thoroughly unreadable.

Along the way, it also converts his (excessive) use of "&c." into
"etc.", which makes some sections work that do not work under the
current Clarke incarnation.  Cf. James 5:20, ¾ down, a paragraph
beginning, "1. I have already conjectured...", and observe odd
paragraph break and grammatical failure -- Sword libs are not
preserving `&' properly; proper content is present, but it's simply
not handled properly.  See also Gen 1:11, for which Clarke displays
nothing at all in WinSword/BibleCS, even though there is content.
(GnomeSword displays Clarke's Gen 1:11 content, but incompletely so.)

#!/bin/sh -x
mod2imp Clarke |
sed -e 's|&c\.|etc.|g' \
    -e 's|\([A-Za-z0-9€-ÿ),.?!:;"]\)<br /> \([A-Za-z0-9€-ÿ(,.?!:;"]\)|\1 \2|g' \
    -e 's|</i><br /> \+<i>| |g' \
    -e 's|\([A-Za-z0-9€-ÿ),.?!:;"]\) \?<br /> <\([is]\)|\1 <\2|g' \
    -e 's|\([fi]\)><br /> \([A-Za-z0-9€-ÿ(,.?!:;"]\)|\1> \2|g' \
    -e 's|]<br /> |] |'g \
    -e 's|<br /> \[| [|'g |
imp2vs /dev/stdin . 2>&1 | egrep -v '^from file: |^adding entry: '
chmod go+r nt nt.vss ot ot.vss
exit 0

The modified clarkenobr.conf I'm using:

[ClarkeNoBr]
DataPath=./modules/comments/rawcom/clarke-nobr/
ModDrv=rawCom
Lang=en
Encoding=UTF-8
SourceType=ThML
Description=Adam Clarke's Commentary on the Bible (without forced line breaks)
About=Adam Clarke's 1810/1825 commentary and critical notes on the Bible, with forced line breaks removed.
LCSH=Bible. Commentaries.
DistributionLicense=Public Domain

--karl



More information about the sword-devel mailing list