Talk:File Formats

From CrossWire Bible Society

Jump to: navigation, search

Contents

non-SWORD software

I couldn't see the value of listing other Bible software in our Wiki, but tolerated it since it was supposed to be further developed into something more worthwhile. However, now that it's become largely an ad for non-Sword software, I'm going to remove the whole section.

Only material related, relevant, or useful to CrossWire and/or the SWORD Project need go in our Wiki. Material not fitting those categories can go in someone else's Wiki and will be deleted from this one.

--Osk

Go Bible & collaboration with CrossWire

I have just added some information about Go Bible. Please visit my user page to see why, rather than removing this. David Haslam 15:47, 7 August 2008 (MDT)

Tessaract OCR software

Having just added a stub section about Tessaract, I'd like to suggest that Troy adds a few words about it. David Haslam 20:03, 9 January 2009 (UTC)

Zefania XML

A couple of days ago, there was a message from Wolfgang Schultz in the Sword dev mailing list which announced that he has removed all websites relating to Zefania XML. Certainly the link now goes to a "page under construction" message. David Haslam 19:25, 27 April 2009 (UTC)

The sourceforge repository for various modules is still available at [1]. The most recently uploaded module is the Zürcher Bibel, dated April 12, 2009. David Haslam 10:00, 15 May 2009 (UTC)
The admins for this sourceforge project are Tom Baccei and Mathieu Delarue. David Haslam 10:03, 15 May 2009 (UTC)
This repository is still active. Many more Zefania XML files have been uploaded since I last reported. David Haslam 16:14, 25 July 2009 (UTC)

PMD files?

Anyone know a way (short of obtaining and installing an old version of PageMaker) how to extract the text from a PMD file? David Haslam 09:13, 14 June 2009 (UTC)

Anyone tried Create Adobe® PDF Online? This online service is only available to subscribers in the US & Canada. I live in the UK. David Haslam 09:33, 14 June 2009 (UTC)

Please discuss before reverting - the page is now become inaccurate

I see that 7 of my recent edits have been reverted by Osk. In future, please discuss before reverting - as the page is now become inaccurate. David Haslam 11:58, 24 July 2009 (UTC)

I reverted because one of your last edits erased text following "$$$". This is the second time you've done this kind of careless edit. Last time, I (and I believe another editor) corrected all of the deletions you performed. --Osk 15:47, 24 July 2009 (UTC)
I have no idea why that occurred, when all I was doing was changing a redlink in the previous paragraph, and that only during my last edit before your reversion. I'm sure this wasn't through carelessness. I normally check my edits using preview before saving. This is quite weird! Could this be due to a subtle bug in the wiki software? David Haslam 19:44, 24 July 2009 (UTC)
I've been thinking about this further. The two items of text that were unintentionally deleted were both Bible references. A couple of weeks ago I installed a Firefox add-on called Bible Refalizer. I wonder if that was the cause? I'll do some tests in the wiki sandbox, and report later. David Haslam 21:09, 24 July 2009 (UTC)
With Bible Refalizer enabled, when a wiki page or section is edited, Bible references in the edit box get removed. With Refalizer disabled, everything is OK. This is a serious bug in the Refalizer add-on for Firefox. I have reported it to the programmer, James Anderson. David Haslam 16:31, 29 July 2009 (UTC)

usfm2osis.pl

Is the word 'rudimentary' still needed before usfm2osis.pl ? David Haslam 12:48, 26 October 2009 (UTC)

Creating and copying from PDF files

Providing the document's security properties permit copying, the entire content of a PDF file can be copied using Adobe Reader 9.x (I have successfully used this to paste a complete Bible text into Wordpad). Couple this concept with the fact that there are several printer drivers that permit printing from any Windows application to a PDF file, and let your imagination run riot. Two such printer drivers are the commercial program called pdfFactory from Fineprint, and the free program called PDFCreator. David Haslam 11:21, 19 November 2009 (UTC)

modwrite / treeidxutil / xmlcatalog

Please would someone provide a suitable description of these SWORD tools. I have added a line for each under Miscellaneous. David Haslam 13:28, 7 December 2009 (UTC)

Still waiting. David Haslam 15:57, 19 December 2009 (UTC)

SGML section

I added the SGML section after learning that one of my contacts in SIL has a task for converting some Folio View files to Logos format, and he's doing it via XML. He's probably not using SP, yet I thought it helpful to record what I have found. David Haslam 17:50, 22 April 2010 (UTC)

XHTML-TE and Go Bible Creator

I am in contact with the SIL employee who is adapting Go Bible Creator to add the option to specify XHTML-TE as a source text format. Email me if you'd like further details. David Haslam 12:21, 25 May 2010 (UTC)

imp2osis.pl

Here is a copy of the help output for imp2osis.pl

imp2osis.pl -- IMP (Sword Import) format to OSIS 2.1.1 converter version 2.0.1
Revision 227 (2009-10-30)
Syntax: imp2osis.pl <osisWork> <input filename> [-o OSIS-file] [-m]

The -m option will produce milestoned <verse/> elements, 
which are more likely to produce valid OSIS from Bibles with OSIS markup internally.

No attempt is made to convert markup present in the verse entries themselves, 
so this tool is appropriate for converting Bibles that already contain OSIS markup or plaintext markup.

This tool is ONLY intended for VersKey-type Sword texts, namely Bibles and commentaries.

Lightly edited to avoid having to use horizontal scrolling. David Haslam 16:37, 4 December 2010 (UTC)

Why I have added the section about ODF

Works that are supplied as word processing files are sometimes difficult to convert to OSIS XML. The XML content of an ODF file may prove to be a useful intermediate step for format shifting. In theory, it should be feasible to develop an XSLT script to perform such a transformation. David Haslam 13:41, 23 March 2011 (UTC)

Request for info: SWORD internal file formats

I arrived at this page looking for info on SWORD's internal file formats.

There's some detail on "other" formats and references to existing tools that convert between some of them and "SWORD format" but it would be useful to tell more about what "SWORD format" actually IS.

Is there documentation available anywhere about what "SWORD format" looks like externally (file names and/or directory structure) and internally (file layout)?

What if someone wanted to develop a document for use in a sword application?

This page seems to suggest that they'd be best advised to format that information in one of these other formats -- because those are documented -- and trust one of the listed conversion routines to somehow make the data usable in sword apps.

Is it really the case that a direct formatting into Sword format is not worth documenting/considering?

Even the indirect path of importing from one of these other formats leaves questions about the resulting "external" structure like:

How do I identify the result -- like to tell if there's any chance that a conversion succeeded or to distribute it to another machine?

Should I expect to see a file? multiple files? a directory? a directory tree?

How, as in by what file name(s), file extension(s), directory name(s) etc. does an application identify such resources to the sword library?

Pmartel60 15:12, 1 October 2011 (MDT)

Sword's format is not and will not be documented (except in the sense that it is documented in formal languages, namely C++ & Java). Our source code is open source, so you are welcome to read it, with the understanding that any reimplementation of our code that results from reading our code (such as a Sword format reader) would necessarily be bound by the GPL. Sword's format is not an open format. It is a proprietary format, prone to change without notice, according to our current and future needs. --Osk 17:12, 1 October 2011 (MDT)

imp2osis.pl

imp2osis.pl is a conversion tool that only converts what's already there in the IMP file. If the relevant lines are missing from the IMP file, it does not automatically supply the required OSIS elements. So for example, if these lines are missing:

$$$[ Testament 1 Heading ]
$$$[ Testament 2 Heading ]

it does not add the bookGroup (or x-testament) divisions; see OSIS Bibles#Body and OSIS Tutorial#Text_Divisions. David Haslam 05:42, 13 February 2012 (MST)

Haiola

Haiola appears to be solely at the moment a way of getting stuff into USFX. Which we will not support, nor does anyone else. The rest of the description is vapourware. I see no reason to link to it. We are more than unlikely ever to encounter original texts in USFX format. refdoc:talk 14:34, 4 March 2012 (MST)

Haiola is no longer 'vapourware'! It's an active project that will at some stage (later this year) include a conversion option to export Biblical text in OSIS format. David Haslam 00:49, 27 April 2012 (MDT)

Removal of "cruft" ?

What was removed as "cruft" actually contained notable and well researched information relevant to what other Bible agencies and individuals have been working on in the past, and in some cases continue to work on in the present. If we are to collaborate more with such ministries in the future, it is important that what was elicited is not totally discarded and lost to CrossWire developers referring to these wiki pages.

I was about to add a note about the USX to OXES converter recently developed by Jim Albright of SIL (using XSLT) which could form the basis (i.e. a good model) for creating a similar USX to OSIS converter that would help Paratext users to create SWORD modules. As translations being made 'publication ready' under the ETEN framework are stored in the Digital Bible Library (DBL) in USX format, we should do all that we can to learn more about the USX format.

Likewise, we should remember that Go Bible is a CrossWire application, and therefore that input formats for use by Go Bible Creator should not be deleted, even if they are are deprecated for SWORD module creation. This is particularly the case for ThML format, which, though now far less suitable than OSIS for module making, remains a well trodden means to make Go Bible applications.

Rather than merely deleting the sections containing this information, it would have made more sense to move them to a separate wiki page, suitably entitled to make it clear that they are formats not currently in use for SWORD module creation, or have been in the past, but are now deprecated.

David Haslam 00:35, 27 April 2012 (MDT)

I'm pleased that neither USX nor ThML sections were removed, and I have just added the note about USX to OXES. David Haslam 00:51, 27 April 2012 (MDT)
Cruft wasn't deleted. I just moved it to the cruft page: File Formats Cruft --Osk 13:18, 27 April 2012 (MDT)
Personal tools
Namespaces
Variants
Actions
Navigation
Miscellaneous
Toolbox