[sword-devel] USFM conformance in usfm2osis.py

Kahunapule Michael Johnson kahunapule at mpj.cx
Wed Aug 1 12:36:22 MST 2012

On 07/31/2012 10:11 PM, David Haslam wrote:
> Peter and I have collected a substantial body of real world USFM suites
> which you could probably use for testing your conversion script.

Note that there is a collection of real-world USFM projects at ftp://eBible.org/pub/Scriptures/ in files named *_usfm.zip. The same projects are there in USFX (*_usfx.zip). Those are all either Creative Commons BY-ND-NC+permission to change file format but not the text or punctuation or Public Domain Scriptures. The first 3 letters in the file name are the Ethnologue code for the language. You can disregard eng-aus and eng-uk as subsets of eng-webbe.

Although those USFM files may not conform to everyone's reading of USFM, they are all good enough for Haiola software to read and produce valid HTML output from.

One problem with converting from USFM to Bible study software modules is that USFM doesn't directly support machine-readable links for cross references in cross reference notes and footnotes, and the mechanism to determine vernacular abbreviations for automatic conversion (\toc3) is almost never used in real projects. For now, I plan to populate that data in an auxiliary file in the USFX .zip files.

