public class Sapphire extends Object
The Sapphire II Stream Cipher is designed to have the following properties:
The Sapphire Stream Cipher is very similar to a cipher I started work on in November 1993. It is also similar in some respects to the alledged RC4 that was posted to sci.crypt recently. Both operate on the principle of a mutating permutation vector. Alledged RC4 doesn't include any feedback of ciphertext or plain text, however. This makes it more vulnerable to a known plain text attack, and useless for creation of cryptographic check values. On the other hand, alledged RC4 is faster.
The Sapphire Stream Cipher is used in the shareware product Quicrypt, which is available at ftp://ftp.csn.net/mpj/qcrypt10.zip and on the Colorado Catacombs BBS (3037721062). There are two versions of Quicrypt: the exportable version (with a session key limited to 32 bits but with strong user keys allowed) and the commercial North American version (with a session key of 128 bits). A variant of the Sapphire Stream Cipher is also used in the shareware program Atbash, which has no weakened exportable version.
The Sapphire II Stream Cipher is a modification of the Sapphire Stream Cipher designed to be much more resistant to adaptive chosen plaintext attacks (with reorigination of the cipher allowed). The Sapphire II Stream Cipher is used in an encryption utility called ATBASH2.
The Sapphire Stream Cipher is based on a state machine. The state consists of 5 index values and a permutation vector. The permutation vector is simply an array containing a permutation of the numbers from 0 through 255. Four of the bytes in the permutation vector are moved to new locations (which may be the same as the old location) for every byte output. The output byte is a nonlinear function of all 5 of the index values and 8 of the bytes in the permutation vector, thus frustrating attempts to solve for the state variables based on past output. On initialization, the permutation vector (called the cards array in the source code below) is shuffled based on the user key. This shuffling is done in a way that is designed to minimize the bias in the destinations of the bytes in the array. The biggest advantage in this method is not in the elimination of the bias, per se, but in slowing down the process slightly to make brute force attack more expensive. Eliminating the bias (relative to that exhibited by RC4) is nice, but this advantage is probably of minimal cryptographic value. The index variables are set (somewhat arbitrarily) to the permutation vector elements at locations 1, 3, 5, 7, and a key dependent value (rsum) left over from the shuffling of the permutation vector (cards array).
Key setup (illustrated by the function initialize(), below) consists of three parts:
The keyrand() function returns a value between 0 and some maximum number based on the user's key, the current state of the permutation vector, and an index running sum called rsum. Note that the length of the key is used in keyrand(), too, so that a key like "abcd" will not result in the same permutation as a key like "abcdabcd".
Each encryption involves updating the index values, moving (up to) 4 bytes around in the permutation vector, selecting an output byte, and adding the output byte bitwise modulo2 (exclusiveor) to the plain text byte to produce the cipher text byte. The index values are incremented by different rules. The index called rotor just increases by one (modulo 256) each time. Ratchet increases by the value in the permutation vector pointed to by rotor. Avalanche increases by the value in the permutation vector pointed to by another byte in the permutation vector pointed to by the last cipher text byte. The last plain text and the last cipher text bytes are also kept as index variables. See the function called encrypt(), below for details.
If you want to generate random numbers without encrypting any particular ciphertext, simply encrypt 0. There is still plenty of complexity left in the system to ensure unpredictability (if the key is not known) of the output stream when this simplification is made.
Decryption is the same as encryption, except for the obvious swapping of the assignments to last_plain and last_cipher and the return value. See the function decrypt(), below.
The original implimentation of this cipher was in Object Oriented Pascal, but C++ is available for more platforms.
For a fast way to generate a cryptographic check value (also called a hash or message integrity check value) of a message of arbitrary length:
There are several security issues to be considered. Some are easier to analyze than others. The following includes more "hand waving" than mathematical proofs, and looks more like it was written by an engineer than a mathematician. The reader is invited to improve upon or refute the following, as appropriate.
There are really two kinds of user keys to consider: (1) random binary keys, and (2) pass phrases. Analysis of random binary keys is fairly straight forward. Pass phrases tend to have much less entropy per byte, but the analysis made for random binary keys applies to the entropy in the pass phrase. The length limit of the key (255 bytes) is adequate to allow a pass phrase with enough entropy to be considered strong.
To be real generous to a cryptanalyst, assume dedicated Sapphire Stream Cipher cracking hardware. The constant portion of the key scheduling can be done in one cycle. That leaves at least 256 cycles to do the swapping (probably more, because of the intricacies of keyrand(), but we'll ignore that, too, for now). Assume a machine clock of about 256 MegaHertz (fairly generous). That comes to about one key tried per microsecond. On average, you only have to try half of the keys. Also assume that trying the key to see if it works can be pipelined, so that it doesn't add time to the estimate. Based on these assumptions (reasonable for major governments), and rounding to two significant digits, the following key length versus cracking time estimates result:
Key length, bits Time to crack   32 35 minutes (exportable in qcrypt) 33 1.2 hours (not exportable in qcrypt) 40 6.4 days 56 1,100 years (kind of like DES's key) 64 290,000 years (good enough for most things) 80 19 billion years (kind of like Skipjack's key) 128 5.4E24 years (good enough for the clinically paranoid)
Naturally, the above estimates can vary by several orders of magnitude based on what you assume for attacker's hardware, budget, and motivation.
In the range listed above, the probability of spare keys (two keys resulting in the same initial permutation vector) is small enough to ignore. The proof is left to the reader.
For a stream cipher, internal state space should be at least as big as the number of possible keys to be considered strong. The state associated with the permutation vector alone (256!) constitutes overkill.
If you have a history of stream output from initialization (or equivalently, previous known plaintext and ciphertext), then rotor, last_plain, and last_cipher are known to an attacker. The other two index values, flipper and avalanche, cannot be solved for without knowing the contents of parts of the permutation vector that change with each byte encrypted. Solving for the contents of the permutation vector by keeping track of the possible positions of the index variables and possible contents of the permutation vector at each byte position is not possible, since more variables than known values are generated at each iteration. Indeed, fewer index variables and swaps could be used to achieve security, here, if it were not for the hash requirements.
The change in state altered with each byte encrypted contributes to an avalanche of generated check values that is radically different after a sequence of at least 64 bytes have been encrypted. The suggested way to create a cryptographic check value is to encrypt all of the bytes of a message, then encrypt a sequence of bytes counting down from 255 to 0. A single bit change in a message causes a radical change in the check value generated (about half of the bits change). This is an essential feature of a cryptographic check value.
Another good property of a cryptographic check value is that it is too hard to compute a message that results in a certain check value. In this case, we assume the attacker knows the key and the contents of a message that has the desired check value, and wants to compute a bogus message having the same check value. There are two obvious ways to do this attack. One is to solve for a sequence that will restore the state of the permutation vector and indices back to what it was before the alteration. The other one is the socalled "birthday" attack that is to cryptographic hash functions what brute force is to key search.
To generate a sequence that restores the state of the cipher to what it was before the alteration probably requires at least 256 bytes, since the index "rotor" marches steadily on its cycle, one by one. The values to do this cannot easily be computed, due to the nonlinearity of the feedback, so there would probably have to be lots of trial and error involved. In practical applications, this would leave a gaping block of binary garbage in the middle of a document, and would be quite obvious, so this is not a practical attack, even if you could figure out how to do it (and I haven't). If anyone has a method to solve for such a block of data, though, I would be most interested in finding out what it is. Please email me at <m.p.johnson@ieee.org> if you find one.
The "birthday" attack just uses the birthday paradox to find a message that has the same check value. With a 20 byte check value, you would have to find at least 80 bits to change in the text such that they wouldn't be noticed (a plausible situation), then try the combinations until one matches. 2 to the 80th power is a big number, so this isn't practical either. If this number isn't big enough, you are free to generate a longer check value with this algorithm. Someone who likes 16 byte keys might prefer 32 byte check values for similar stringth.
Let us give the attacker a keyed black box that accepts any input and provides the corresponding output. Let us also provide a signal to the black box that causes it to reoriginate (revert to its initial keyed state) at the attacker's will. Let us also be really generous and provide a free copy of the black box, identical in all respects except that the key is not provided and it is not locked, so the array can be manipulated directly.
Since each byte encrypted only modifies at most 5 of the 256 bytes in the permutation vector, and it is possible to find different sequences of two bytes that leave the five index variables the same, it is possible for the attacker to find sets of chosen plain texts that differ in two bytes, but which have cipher texts that are the same for several of the subsequent bytes. Modeling indicates that as many as ten of the following bytes (although not necessarily the next ten bytes) might match. This information would be useful in determining the structure of the Sapphire Stream Cipher based on a captured, keyed black box. This means that it would not be a good substitute for the Skipjack algorithm in the EES, but we assume that the attacker already knows the algorithm, anyway. This departure from the statistics expected from an ideal stream cipher with feedback doesn't seem to be useful for determining any key bytes or permutation vector bytes, but it is the reason why postconditioning is required when computing a cryptographic hash with the Sapphire Stream Cipher. Thanks to Bryan G. Olson's <olson@umbc.edu> continued attacks on the Sapphire Stream Cipher, I have come up with the Sapphire II Stream Cipher. Thanks again to Bryan for his valuable help.
Bryan Olson's "differential" attack of the original Sapphire Stream Cipher relies on both of these facts:
I have not yet figured out if Bryan's attack on the original Sapphire Stream Cipher had complexity of more or less than the design strength goal of 2^64 encryptions, but some conservative estimations I made showed that it could possibly come in significantly less than that. (I would probably have to develop a full practical attack to accurately estimate the complexity more accurately, and I have limited time for that). Fortunately, there is a way to frustrate this type of attack without fully developing it.
Denial of condition 1 above by increased alteration of the state variables is too costly, at least using the methods I tried. For example, doubling the number of index variables and the number of permutation vector items referenced in the output function of the stream cipher provides only doubles the cost of getting the data in item 1, above. This is bad cryptoeconomics. A better way is to change the output function such that the stream cipher output byte is a combination of two permutation vector bytes instead of one. That means that all possible output values can occur in the differential sequences of item 1, above.
Denial of condition 2 above, is simpler. By making the initial values of the five index variables dependent on the key, Bryan's differential attack is defeated, since the attacker has no idea which elements of the permutation vector were different between data sets, and exhaustive search is too expensive.
Are there any? Take you best shot and let me know if you see any. I offer no challenge text with this algorithm, but you are free to use it without royalties to me if it is any good.
This is a new (to the public) cipher, and an even newer approach to cryptographic hash generation. Take your best shot at it, and please let me know if you find any weaknesses (proven or suspected) in it. Use it with caution, but it still looks like it fills a need for reasonably strong cryptography with limited resources.
The intention of this document is to share some research results on an informal basis. You may freely use the algorithm and code listed above as far as I'm concerned, as long as you don't sue me for anything, but there may be other restrictions that I am not aware of to your using it. The C++ code fragment above is just intended to illustrate the algorithm being discussed, and is not a complete application. I understand this document to be Constitutionally protected publication, and not a munition, but don't blame me if it explodes or has toxic side effects.
___________________________________________________________   \ /   Michael Paul Johnson Colorado Catacombs BBS 3037721062   \/ o  PO Box 1151, Longmont CO 805021151 USA John 3:1617     / _  mpj@csn.org aka mpj@netcom.com m.p.johnson@ieee.org   / /_\  ftp://ftp.csn.net/mpj/README.MPJ CIS: 71331,2332   \ (  ftp://ftp.netcom.com/pub/mp/mpj/README .   ....    \ \_/  PGPprint=F2 5E A1 C1 A6 CF EF 71 12 1F 91 92 6A ED AE A9  ___________________________________________________________Regarding this port to Java and not the original code, the following license applies:
The GNU Lesser General Public License for details.
Modifier and Type  Field and Description 

private int 
avalanche 
private int[] 
cards 
private int 
keypos 
private int 
lastCipher 
private int 
lastPlain 
private int 
ratchet 
private int 
rotor 
private int 
rsum 
Constructor and Description 

Sapphire(byte[] aKey)
Construct a Sapphire Stream Cipher from a key, possibly null or empty.

Modifier and Type  Method and Description 

void 
burn()
Destroy the key and state information in RAM.

byte 
cipher(byte b)
Decipher a single byte, presumably the next.

void 
hashFinal(byte[] hash) 
private void 
hashInit()
Initialize nonkeyed hash computation.

private void 
initialize(byte[] key)
Initializes the cards array to be deterministically random based upon the
key.

private int 
keyrand(int limit,
byte[] key) 
private int[] cards
private int rotor
private int ratchet
private int avalanche
private int lastPlain
private int lastCipher
private int keypos
private int rsum
public Sapphire(byte[] aKey)
aKey
 the cipher keypublic byte cipher(byte b)
b
 the next byte to decipherpublic void burn()
public void hashFinal(byte[] hash)
hash
 the destinationprivate void initialize(byte[] key)
Key size may be up to 256 bytes. Pass phrases may be used directly, with longer length compensating for the low entropy expected in such keys. Alternatively, shorter keys hashed from a pass phrase or generated randomly may be used. For random keys, lengths of from 4 to 16 bytes are recommended, depending on how secure you want this to be.
key
 used to initialize the cipher engine.private void hashInit()
private int keyrand(int limit, byte[] key)