[sword-devel] Sapphire, module cipher

Sun Mar 5 15:40:17 MST 2006

Martin,

I think that your algorithm will work for many cases, but not all. If 
the sample is sufficiently large then I think that it should work with a 
very high degree of success. I'm not sure what sufficiently large would 
be. The way the cipher works is that it has a sliding window of 256 
bytes that it works with at a time. If the cipher is highly "random" in 
its creation of the next byte then within a short order a non-printable 
should show up. But like tossing a coin, you can get 10 heads in a row. 
It's not likely, but it is possible. It would be possible for the cipher 
to produce many "printables" in a row.

I'm not a UTF-8 expert so some of what I say might be a bit inaccurate, 
but it should be close enough for argument's sake;)

In the case of Chinese, nearly all bytes will be > 128
Some UTF-8 bytes > 128 are non printable, (e.g. 128 - 159)
Some byte sequences are defined as not in UTF-8 (e.g. reserved regions) 
so these would be non-printable as well.

In cp1252 (the version of "latin1" used in Sword modules), some bytes in 
the range of 128-159 are not defined and are not printable.

DM

Martin Gruner wrote:
>> Still, the simpler route is Martin's check for non-printables after
>> deciphering the first 100 or so characters. (I'm assuming that it is fully
>> UTF-8 aware.)
>>     
>
> DM,
>
> atm the routine treats the data as Latin1 byte sequence. This should work 
> because all nonprinting characters are <= 127 (first byte 0), and all higher 
> unicode UTF-8 encoded characters consist of bytes >= 128 (first byte 1). I 
> found this better than parsing the stream as UTF-8, because it might contain 
> rubbish without the valid key.
>
> mg
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
>
>
>