Modules
  1. Modules
  2. MOD-254

Configuration file not well encoded

    Details

      Description

      The configuration file (in mods.d) of several modules are not encoded in UTF8 (altought theEncoding field is set to UTF-8).

      I noticed some module descriptions contain invalid characters displayed as �.

      To find all the invalid modules, I run the command file --mime-encoding * | grep -v utf-8 | grep -v us-ascii in the folder ~/.sword/InstallMgr/ftp.crosswire.org/mods.d.

      Here is what I get :

      ab.conf: iso-8859-1
      chamorro.conf: iso-8859-1
      finbiblia.conf: iso-8859-1
      finpr.conf: iso-8859-1
      gerelb1871.conf: iso-8859-1
      gerlutherpredigten.conf: iso-8859-1
      gersch.conf: unknown-8bit
      globals.conf: binary
      imitation.conf: iso-8859-1
      jubilee2000.conf: iso-8859-1
      rieger.conf: iso-8859-1
      sml_BL_2008.conf: iso-8859-1
      spavnt.conf: iso-8859-1
      swe1917.conf: iso-8859-1
      ukjv.conf: unknown-8bit
      viet.conf: iso-8859-1
      wulfila.conf: iso-8859-1

      Note that others modules are concerned in other repositories.

      I thought it doesn't worth to open bugs for all of these modules.

        Activity

        Hide
        Chris Little added a comment -

        Only about half of these files had the reported issue. The rest were entirely correct. Please realize that windows-1252 (and by extension iso-8859-1) is a valid Sword encoding. In fact, it Sword's default encoding.

        Show
        Chris Little added a comment - Only about half of these files had the reported issue. The rest were entirely correct. Please realize that windows-1252 (and by extension iso-8859-1) is a valid Sword encoding. In fact, it Sword's default encoding.
        Hide
        Y. D. added a comment -

        Thank you for the corrections.
        Indeed some conf files did not contain the Encoding field unlike what I said.

        I found a new (minor) issue, the file sorano.conf contains twice the field Encoding.

        There is currently an issue with crosswire.org if the conf file is not encode in UTF-8.
        See https://www.crosswire.org/sword/modules/ModInfo.jsp?modName=Chamorro for instance
        If UTF-8 is preferred (see quotation below), why are these conf files still encode in latin1 ? (6 files concerned)

        The preferred encoding of texts is UTF-8.
        This encoding indicates how the conf and the module are encoded.
        https://www.crosswire.org/wiki/DevTools:conf_Files

        Note that the SWORD Project requires all submitted texts to be Unicode (UTF-8) encoded documents.
        https://www.crosswire.org/wiki/DevTools:Modules#Encoding

        Show
        Y. D. added a comment - Thank you for the corrections. Indeed some conf files did not contain the Encoding field unlike what I said. I found a new (minor) issue, the file sorano.conf contains twice the field Encoding. There is currently an issue with crosswire.org if the conf file is not encode in UTF-8. See https://www.crosswire.org/sword/modules/ModInfo.jsp?modName=Chamorro for instance If UTF-8 is preferred (see quotation below), why are these conf files still encode in latin1 ? (6 files concerned) The preferred encoding of texts is UTF-8. This encoding indicates how the conf and the module are encoded. https://www.crosswire.org/wiki/DevTools:conf_Files Note that the SWORD Project requires all submitted texts to be Unicode (UTF-8) encoded documents. https://www.crosswire.org/wiki/DevTools:Modules#Encoding
        Hide
        Y. D. added a comment -

        There are similar issues is the av11n repository :

        Encoded in latin1 and containing the field Encoding=UTF-8 :

        • frekhan.conf
        • hunuj.conf
        • vulgclementine.conf
        • vulgconte.conf
        • vulghetzenauer.conf
        • vulgsistine.conf

        sorani.conf also contains twice the Encoding field (I made a typo in my previous message).

        Show
        Y. D. added a comment - There are similar issues is the av11n repository : Encoded in latin1 and containing the field Encoding=UTF-8 : frekhan.conf hunuj.conf vulgclementine.conf vulgconte.conf vulghetzenauer.conf vulgsistine.conf sorani.conf also contains twice the Encoding field (I made a typo in my previous message).
        Hide
        Chris Little added a comment -

        Latin-1 (actually its superset windows-1252) is the default encoding for Sword and is implied in any case where no Encoding is specified. The reasons for this are historical and there is no possibility of it changing. UTF-8 is now preferred and all new releases use UTF-8. Modules that still use windows-1252 encoding are generally very old.

        Show
        Chris Little added a comment - Latin-1 (actually its superset windows-1252) is the default encoding for Sword and is implied in any case where no Encoding is specified. The reasons for this are historical and there is no possibility of it changing. UTF-8 is now preferred and all new releases use UTF-8. Modules that still use windows-1252 encoding are generally very old.
        Hide
        Chris Little added a comment -

        avraw modules are now corrected.

        Show
        Chris Little added a comment - avraw modules are now corrected.
        Hide
        Y. D. added a comment -

        I think you forget to update sorani.conf in raw repository.
        Thank you for explanations and updates.

        Show
        Y. D. added a comment - I think you forget to update sorani.conf in raw repository. Thank you for explanations and updates.
        Hide
        Y. D. added a comment -

        I found modules with invalid BlockType value.

        According to the wiki allow values are VERSE, CHAPTER and BOOK.
        Some .conf files in the default repository (i.e. raw) use the value Book (instead of BOOK).
        Here is the list :

        • netfree.conf
        • nettext.conf
        • lithuanian.conf
        • pohnpeian.conf
        • godsword.conf
          If Book is also correct you can ignore this message.
        Show
        Y. D. added a comment - I found modules with invalid BlockType value. According to the wiki allow values are VERSE, CHAPTER and BOOK. Some .conf files in the default repository (i.e. raw) use the value Book (instead of BOOK). Here is the list : netfree.conf nettext.conf lithuanian.conf pohnpeian.conf godsword.conf If Book is also correct you can ignore this message.

          People

          • Assignee:
            Chris Little
            Reporter:
            Y. D.
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: