[sword-devel] Maximum line length 8192 in RawGenBook modules‽‽‽

David Haslam dfhdfh at protonmail.com
Mon Jun 1 14:05:24 EDT 2020


The help for Sword utility xml2gbs reads as follows:

xml2gbs 1.0 OSIS/ThML/TEI General Book module creation tool for the SWORD Project
  usage:
   xml2gbs [-l] [-i] [-fT|-fO|-fE] <filename> [modname]
  -l uses long div names in ThML files
  -i exports to IMP format instead of creating a module
  -fO, -fT, and -fE will set the importer to expect OSIS, ThML, or TEI format respectively
    (otherwise it attempts to autodetect)

This is the tool used to build a Generic Book module such as Westminster.

There's nothing in the syntax help about the apparent 8192 character line width limitation for the XML file,
nor about how such very long lines are split during module build in ways that can even break some XML elements,
let alone insert a space in a word of text that would cause it to fail spell check.

There's nothing in our wiki to suggest that module developers must ensure that XML lines are kept narrower than 8192 characters.

My guess is that it's due to an incorrect variable type having been defined, but I'm not a coder.

8192 = 0x2000

Is it just a bug in the utility or does it reflect anything in the API ?

Best regards,

David

Sent with ProtonMail Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Sunday, 31 May 2020 21:27, David Haslam <dfhdfh at protonmail.com> wrote:

> To developers of the SWORD API and the utilities for building modules...
>
> I have just examined the raw GenBook module Westminster.
>
> SourceType=OSIS
>
> I found that 6 of the 5771 reference elements are "broken" by having a line break occurring badly part way through the XML item.
>
> Upon closer inspection, these line breaks occurred at column 8192 when the .bdt file was open in a text editor.
>
> This indicates that there is a problem if a line of text in an OSIS XML file exceeds 8192 characters in length!
>
> There are 933 lines of text in the .bdt file, of which 24 are length 8192. There are no lines longer than this.
> It's merely fortuitous that I happened to find 6 of these 24 lines while I was researching reference elements.
>
> What is the root cause of this issue?
> How can this issue be fixed?
>
> Best regards,
>
> David
>
> Sent with ProtonMail Secure Email.




More information about the sword-devel mailing list