<html><head><meta http-equiv="Content-Type" content="text/html charset=windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">The encoding of the conf is either cp1252 (the default, but called latin 1) or utf-8. The encoding of the conf matches that of the module. This may cause the conf to be read twice once for the default and once for UTF-8, if the module encoding is set to UTF-8.<div><br></div><div>There have been confs that are incorrect with regard to this rule.<br><div><br><div>In Him,</div><div><span class="Apple-tab-span" style="white-space:pre">        </span>DM</div><div><br><div><div>On May 21, 2014, at 8:59 AM, Jaak Ristioja &lt;<a href="mailto:jaak@ristioja.ee">jaak@ristioja.ee</a>&gt; wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div style="font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">-----BEGIN PGP SIGNED MESSAGE-----<br>Hash: SHA1<br><br>So this means that actually we want non-standard RTF (someone should<br>update the wiki). Should we assume UTF-8? Are you sure we don't have any<br>modules with ISO-8859-something encoded values?<br><br>If we choose any ASCII superset encoding we have to consider at least<br>the two points:<br><br>&nbsp;* Since the RTF control words and delimeters are specified in ASCII<br>only, we need to decide whether how the bytes of the superset act as<br>delimeters and parts of "RTF" control words. For example, whether the<br>Unicode letter, number, spacing, punctuation, control etc characters<br>constitute parts of RTF control words or act as delimiters.<br><br>&nbsp;* In case of encodings where characters may consist of multiple bytes<br>(e.g. the variable-length UTF-8) we must consider the character<br>bondaries. We can't just pass through any non-ASCII byte values. For<br>example, the following bit sequence wouldn't make sense:<br><br>&nbsp;11100010 01011100 10000010 01110001 10101100 01100011<br><br>which is an UTF-8 encoded Euro sign, €, interleaved with bytes of the<br>ASCII string "\qc". It just doesn't make sense, whereas the following<br>sequences would be correct:<br><br>&nbsp;11100010 10000010 10101100 01011100 01110001 01100011 (€\qc)<br>&nbsp;01011100 01110001 01100011 11100010 10000010 10101100 (\qc€)<br><br>So depending on the encoding it were correct to detect such cases,<br>otherwise we end up with invalid Unicode output.<br><br>Blessings,<br>Jaak<br><br>On 21.05.2014 15:19, Chris Burrell wrote:<br><blockquote type="cite">I believe some conf files have direct unicode (rather than escaped<br>sequences) in them and that is preferred.<br><br>On 20 May 2014 23:28, "Jaak Ristioja" &lt;<a href="mailto:jaak@ristioja.ee">jaak@ristioja.ee</a><br>&lt;<a href="mailto:jaak@ristioja.ee">mailto:jaak@ristioja.ee</a>&gt;&gt; wrote:<br><br>&nbsp;&nbsp;&nbsp;I've never done BiDi, but I'm not sure I need to take that into account<br>&nbsp;&nbsp;&nbsp;while fixing the RTF parsing. As I currently understand it, this<br>&nbsp;&nbsp;&nbsp;particular piece of code does not support any part from the RTF spec<br>&nbsp;&nbsp;&nbsp;dealing with bidirectional text handling. Hence all BiDi information<br>&nbsp;&nbsp;&nbsp;contained in the configuration file strings (e.g. About=) is contained<br>&nbsp;&nbsp;&nbsp;either in the plain ASCII text or the \u&lt;num&gt; Unicode escapes which this<br>&nbsp;&nbsp;&nbsp;algorithm should pass through unmodified.<br><br>&nbsp;&nbsp;&nbsp;...except for HTML entities which should actually be escaped. This bug<br>&nbsp;&nbsp;&nbsp;in the algorithm I previously failed to notice. Additionally I forgot<br>&nbsp;&nbsp;&nbsp;that non-ASCII characters in the input string should also lead to<br>&nbsp;&nbsp;&nbsp;parsing failure.<br><br>&nbsp;&nbsp;&nbsp;Jaak<br><br><br>&nbsp;&nbsp;&nbsp;On 20.05.2014 21:01, David Haslam wrote:<br><blockquote type="cite">Take care with Right to Left languages such as Hebrew.<br><br>i.e. After any patches to the filter, please include some testing<br></blockquote>&nbsp;&nbsp;&nbsp;for BiDi<br><blockquote type="cite">text in the About= field and others.<br><br>David<br><br><br><br>--<br>View this message in context:<br></blockquote>&nbsp;&nbsp;&nbsp;<a href="http://sword-dev.350566.n4.nabble.com/RTFHTML-filter-bugs-tp4653969p4653970.html">http://sword-dev.350566.n4.nabble.com/RTFHTML-filter-bugs-tp4653969p4653970.html</a><br><blockquote type="cite">Sent from the SWORD Dev mailing list archive at <a href="http://Nabble.com">Nabble.com</a>.<br><br>_______________________________________________<br>sword-devel mailing list: <a href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org</a><br></blockquote>&nbsp;&nbsp;&nbsp;&lt;<a href="mailto:sword-devel@crosswire.org">mailto:sword-devel@crosswire.org</a>&gt;<br><blockquote type="cite"><a href="http://www.crosswire.org/mailman/listinfo/sword-devel">http://www.crosswire.org/mailman/listinfo/sword-devel</a><br>Instructions to unsubscribe/change your settings at above page<br><br></blockquote><br><br><br>&nbsp;&nbsp;&nbsp;_______________________________________________<br>&nbsp;&nbsp;&nbsp;sword-devel mailing list:<span class="Apple-converted-space">&nbsp;</span><a href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org</a><br>&nbsp;&nbsp;&nbsp;&lt;<a href="mailto:sword-devel@crosswire.org">mailto:sword-devel@crosswire.org</a>&gt;<br>&nbsp;&nbsp;&nbsp;<a href="http://www.crosswire.org/mailman/listinfo/sword-devel">http://www.crosswire.org/mailman/listinfo/sword-devel</a><br>&nbsp;&nbsp;&nbsp;Instructions to unsubscribe/change your settings at above page<br><br><br><br>_______________________________________________<br>sword-devel mailing list:<span class="Apple-converted-space">&nbsp;</span><a href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org</a><br><a href="http://www.crosswire.org/mailman/listinfo/sword-devel">http://www.crosswire.org/mailman/listinfo/sword-devel</a><br>Instructions to unsubscribe/change your settings at above page<br><br></blockquote><br>-----BEGIN PGP SIGNATURE-----<br>Version: GnuPG v2.0.22 (GNU/Linux)<br><br>iQgcBAEBAgAGBQJTfKM/AAoJELozJlbjIn79gXpAAMxwoq17dvVzCikAplQUjON0<br>xDJXlDFfKK14w8xj11NSUvJEPjVWlwTi82WzEplQBKfkxtFY09010ZB5IKotEtSP<br>dcJMjzc4FmuJmPifB7s3gtEOQ81OThMArlnq/aFHvGj6+5D8qjFkQiqOzSJeaORS<br>C8dPobXSnJkJ/g3zKCdJf/k5msphFbmuIQOD4Ovco2ZHHlukL8QNd8pt3RcPN4Hy<br>BMxYx9glw3+YJK5Jj63isdsmOGLeRory3PDcHZoPJzu8zssW78Chlsgoh+xWlfkn<br>zI5PdP1ARhq7K/kUnPp7jXx3LDFiEbmPjrNBi/A03k+n7s2oZWdxm9uBfEEq5VpB<br>DpdCA19msaEE+fOWOyAAvvZstnCxYrrd01j+HxXUGoA4JHBBVQo01H5udfOdbiBu<br>nSI5M0GUKBjSSfLSmrh2oTC0qniVMRw4t+IAIJU1chjfBCsoNAx6xTiDE8x+hpjd<br>A+s8wvgBU0gNbqeOMvWXkHeOWSu7O0oPEp0vVl+6fUPPFDHGR1+2vPXLnCcbASwj<br>pEJwls9IBis7touUlIt4stlois1Imtw8zKGXXU8h0UmSgRHK0G2Ck8clNptClkMY<br>+9xP+TGXZI0q+WlzA7M4aD2puQAiJ0iJTm/kV+QGF/1RiaWNGWTG7Oxfufz5XdDn<br>xqTrAkYoVw3a+ZRgZPs4YbyK3ysVqncvAOFKuqLcEEwiA4zEYztGxPMAhcypQJFH<br>n6ORlF3/Kmkukj3eapanznmcvoZ+H/APKNWmo2b+TZ10WABCtZVDO+pd1Ed+l2U5<br>EytGhMYEqNSMqV109k3It9Ll7a8GVQa6k7AX8/BSXlh6/GaaoIzkSgGJBFAU8Zsj<br>dW7u6O7wBOTBmE+lUUrwA3igveDhTDhzjORE7Ek74xkhoNVwh1DmqWwJGZbIGb5R<br>47yWwxql4pqS4jq3M+TM8SUZaeY/NTjRTn+WLFBGahKVH5Gg/NiB6onfBBRLyYwK<br>iorFYngEhpKDNJBPp8rfSIg4NxhbupwG9B1Bbrdg6Kj+E+kGsXDuDkBWQEgf1Jwv<br>3XbiDBEjUf2wr4TdbUx9GrwrBNP7q9YW0RmbQGlvIahVwtr3/PJGhiU/kS47fAZf<br>HQMac1US7eYgtW5hzH/YG+41cCI9J0byZBEuSJS2GuSd0LD0Of4bPLxyOxiXqvTU<br>kwSPIQwsBOZpFIA5Qfc35x5KxVqCGUYBvXhglpZtZGlGr8uIPpshc1gz9ukCejuz<br>754upiYTlCzocKpvPbER9QpMZFYb+iDTdc4bU8whmxkP8ATKSDQmYIqUS2ohLKV8<br>co5X0741kRaG5oNOBBrM7kn/9nWgFNspFBkJAvGLbD8h6R8S11cu7INrXzJjxv/e<br>bCAxGXb2UQXXUe18FCYeqUvl5VdQOQt3f7gja3XbitCKkJjUA6i7t1+5vjuMQsAY<br>NFliiFxNeNjNE4hIIpvA7G3N+2t0W8IjGsystXm6ONN0lM78eLZLLlsrfkPi8NgR<br>Nydc78zEJfGr8APkiYleIYTi6ftgtDrI9927wNWqgIPqO4vqA1TZngX8wx6YPJou<br>uF8cSnI0PlcOfEKtsBgZedOpbZlqAt61wvMGMW0YUfiL5LhuP95KQekqDMMBDCQX<br>mGMehJHRJ5PvoDt8485lGOWdwXn6T7PlakZ1UCtYeMV0Nx2PfPBfU7bnCwSRFQKg<br>vpUhPCkW5qpvlkBLOpPLwkqcZGiSyLL/YSGp6cVExeeQVHc2hI169zGY9dUHBEMN<br>CaKwI9Wjn5V95bax3gsMlHnY9c1TB/6yLWnVEJAilm5ijgWW5KxstWoJMd/OptY8<br>QvbsOA7K36HfwOwNCblQCGbUrPjikhXTw8ew1aap4OHqGIKUWCMm3z/eHOPRU5mD<br>Ce2Z86vwYb9T2PcyqUiZOs1WW9TBZx70Hr2JQmRwgMyWpT4DERjofP83IA8vxZdP<br>9uKT4j+EBUGoI2zGgE2lapLL/VWrzt6OBMv5iUmR4OIFLdnHevAAy5w53c4+tWjs<br>SNmjAz8tW5FWiVFR99FQBN6KWXIjKdJGQl+zccOlE0zBQe2grnqFmUeuuBbPiojb<br>Wch+hqrKDX/VLr/gIP9EErMJ7ZvZ7st+gwPZlFwC7Evf3OCrUnRYIbMI6iLGLoZ6<br>c9YLbK67hj1Ho+X99XTeoQj8l2V14TSRCFZBmO7Os5L2kXOEiw0yeV8Dn87LJPFp<br>4VcfgFGLi9FRnI36K4+h5JWoyhrGhNHrHsO60Xs2U3a02fRfeUgn/T1Xf0xXbVMC<br>gX8zJ3aC15pUy/dJaqJ4HIszzPe5ErO7J9GB7AhjVnx8pEE0xayoJkA4VM0YF8Lk<br>b/IF04rm/dNlsLL7zRzdGpr2uo9esMzFJDYcHnhInhaE7t2iGR4+cgUdRJKA7NJW<br>ZumxNz3a1EjeZHRLqRxfT8O6Cc55hG4GwVO7JxUnXJtRMx+ENXZslf4ExGdhcTdf<br>ntjsfngGemyKYv8aMJ9pDlLFVyR+91xSpFp8QYRDtcP14y5Dfh/jh4Kmdu0BqTzt<br>Wt0KUUZQlx8Qu8XJbatPiieDmjtQ8HPmhsHQAA+QmLzrhEmakrAjTfpWq5eNYQeQ<br>ei6tawFllPyuNrez2BOP3nfXuSBlfn2+yBfi3H1mJc8urrFwDtt/zqTHdoOtyCNO<br>PVaqMROmVzgdKg7yyXTBek3UBe8TxMWigvepRvxkGlmMZQkW42/5ft0269esY/bw<br>tuy57vDPyvQfrJzpN62y<br>=RNpJ<br>-----END PGP SIGNATURE-----<br><br>_______________________________________________<br>sword-devel mailing list:<span class="Apple-converted-space">&nbsp;</span><a href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org</a><br><a href="http://www.crosswire.org/mailman/listinfo/sword-devel">http://www.crosswire.org/mailman/listinfo/sword-devel</a><br>Instructions to unsubscribe/change your settings at above page</div></blockquote></div><br></div></div></div></body></html>