[sword-devel] Detecting duplicate verse text in Bibles

David Haslam dfhmch at googlemail.com
Tue Mar 13 07:57:21 MST 2012

Some time ago I developed a TextPipe filter to detect and report duplicate
verse text in Bibles.

It began with a focus on ThML files, but has some wider application.

The original reason for making this tool was related to how three of our
SWORD utilities handle linked verses.  mod2imp, mod2osis and diatheke.

This morning I happened to apply the same filter to an electronic Bible that
has not yet been made into a SWORD module.
It's one that I'm helping a friend in SE Asia as he works towards getting it
published in several formats for various Bible applications.
These will eventually also include SWORD and Go Bible.

To my surprise it detected 63 locations in which there are verses containing
text identical to the preceding verse.
These can be traced to data entry errors - a real risk arising from the
particular Windows application being used for the labour intensive task of
transcribing the printed edition.

I just thought that this observation was worth sharing in this list.
It may help others involved in preparation of electronic Bibles for
situations where there is no existing digital edition available, not even in
some ancient legacy software.

Best regards,


View this message in context: http://sword-dev.350566.n4.nabble.com/Detecting-duplicate-verse-text-in-Bibles-tp4469279p4469279.html
Sent from the SWORD Dev mailing list archive at Nabble.com.

More information about the sword-devel mailing list