[sword-devel] .conf files encoding/tags

DM Smith dmsmith555 at yahoo.com
Wed Oct 3 17:53:28 MST 2007

Background on my comment about the encoding.

About 2/10/2005, it was noted that a conf in utf-8 worked.

Late Aug 2005 it was decided that a conf that's UTF-8 should allow  
UTF-8 and not latin-1.

There are many fields that have non-ascii. Some are UTF-8 and some  
are latin-1. It does not seem to correspond to the Encoding= field.  
If you wish me to enumerate them, I'll look.

In the conf, there are other rtf codes such as \b and \i. I can  
enumerate all the exceptions to \par,\pard,\qc and \uxxxxx, if you  
wish. The only modules that have the \u codes are for utf-8 modules.  
The jsp code that displays the confs on the website are relatively  
braindead and only handle a few codes. The rest, such as \u show up  

Further, a few modules have html in the about field. The <a href="">  
is only in the short promo at the time.

Personally, I don't have a problem with whatever decision is made,  
but I'd like the conf's to be fixed to be consistent with what ever  
is/was decided.


On Oct 3, 2007, at 7:46 PM, Chris Little wrote:

> .conf files are entirely plain text except for the About field,  
> which is
> RTF. RTF is only used in the About field and only RTF (or no  
> markup) may
> be used in the About field.
> The use of RTF here is basically a legacy issue carried from BibleCS.
> It shouldn't be a big deal for you because I believe we only use 4
> different tags:
> \qc (center the following)
> \par (paragraph break)
> \pard (paragraph break + reset formatting)
> \uXXXXX? (non-CP1252 characters expressed as UTF-16)
> Eeli Kaikkonen wrote:
>> I browsed through the beta area module .conf files. It's great to see
>> so many new ones with new features.
>> One rant I have. Why on earth is stupid braindead rtf or other
>> strange formatting used in .conf files? See this example from TurNTB:
>> About=New Turkish Bible translation, jointly translated and  
>> published by
>> K\u00305?tab\u00305? Mukaddes \u00350?irketler  
>> (www.kitabimukaddes.com)
>> and Yeni Ya\u00351?am Yay\u00305?nlar\u00305?  
>> (www.yyyayinlari.com). We
>> are grateful for the permission by Yeni Ya\u00351?am
>> Yay\u00305?nlar\u00305? to distribute this translation.
>> How are the frontends supposed to display this correctly? This is  
>> year
>> 2007 and this kind of project should use utf8 in conf files also.  
>> \par
>> is quite easy to replace but why not use <br> instead, that kind  
>> of html
>> tagging is used in many places even without real html browsers.  
>> Rtf is
>> M$ proprietary format.
> So you propose that we abandon a markup system already in place and
> convert everything to a different arbitrarily selected markup  
> format? As
> a result, every user will have to update every module, or the about  
> text
> will be mis-rendered. Existing strategies for rendering RTF will  
> have to
> be re-written to handle <insert arbitrarily chosen markup  
> language>. And
> all to solve a problem that doesn't exist.
> Suggesting that we should not use RTF because it comes from MS is  
> silly.
> RTF is quite well documented and widely used. It's cross-platform and
> wasn't even developed by MS. A full description may be found at
> http://www.biblioscape.com/rtf15_spec.htm.
> Converting the RTF we use to HTML requires about three lines of Perl,
> which you can port to the language of your choice:
> $about =~ s/\u(\d+)\?/pack("U", $1)/eg; # assumes no surrogate pairs
> $about =~ s/\qc ?(.*?)(\pard|$)/<center>$1<\/center>$2/g;
> $about =~ s/\pard? ?/<br\/>/g;
> --Chris
