10.7. LFSR Sum#

We’ve already seen that a single LFSR are vulnerable to known-plaintext attacks. However, by combining multiple LFSRs we can start to create a system which is at least a little more resistant to known-plaintext attacks.

XORing Two LFSR Streams#

Suppose you have two LFSR systems, a 3-bit and a 5-bit system, each with an ideal definition that results in a maximal length bitstream from each. Therefore, the first system has definition of: \(b_3 \leftarrow b_1' + b_2'\) with a period of \(7\), and the second system has definition of: \(b_5 \leftarrow b_1' + b_3'\) with a period of 31. Since \(7\) and \(31\) share no common factors, the two bitstreams won’t repeat together until \(7 \times 31 = 217\) bits. To illustrate this fact, let’s generate the corresponding bitstreams.

Choosing seed values of 101 and 11010, let’s print twice the theoretical period to verify each LSFR system does repeat as predicted.

3-bit LFSR, \(b_3 \leftarrow b_1' + b_2'\), seed: 101

10111001011100

5-bit LFSR, \(b_5 \leftarrow b_1' + b_3'\), seed: 11010

01011001111100011011101010000100101100111110001101110101000010

Next, let’s generate \(217\) bits from each system and XOR them together and inspect the output to see if there’s any repetition.

1110000010000011010111110100111100100100110011010010100110110000000101010010001100100001100001011110000111010001011011010101011101111010110100000011111101100101011000101000111000100010011100111011011111111001100011110

Which, while difficult to confirm visually, has not yet begun to repeat.

Below is \(2 \times 217 = 434\) bits of the same stream. You should see about half-way through the bitstream begins to repeat.

11100000100000110101111101001111001001001100110100101001101100000001010100100011001000011000010111100001110100010110110101010111011110101101000000111111011001010110001010001110001000100111001110110111111110011000111101110000010000011010111110100111100100100110011010010100110110000000101010010001100100001100001011110000111010001011011010101011101111010110100000011111101100101011000101000111000100010011100111011011111111001100011110

This is a helpful result:

If LFSR systems have period of \(p_1\) and \(p_2\), where \(p_1\) and \(p_2\) share no common factors, then the XOR of these streams will have period \(p_1 \times p_2\)

This result provides an easy way to generate LFSR streams with long periods using several LFSR systems with shorter periods. For example, the A5/1 cipher which is used to protect GSM cell phone signals in Europe is composed of 3 LFSR streams of different sizes: 19-bit, 22-bit and 23-bit. The periods of each cipher are \(2^19 - 1 = 524,287\), \(2^22 - 1 = 4,194,303\), and \(2^23 - 1 = 8,388,607\) respectively. These numbers do not share any factors, so by XORing their streams together the cipher can create a keystream with a period of:

\[524,287 \times 4,194,303 \times 8,388,607 = 18,446,702,292,280,803,327\]

Defining LFSRsum#

We’ll formalize this combining of LFSR systems as the LFSRsum cipher, inspired by the book The Mathematics of Encryption.

LFSRsum: The LFSR system uses two LFSR systems in tandem to create a single pseudo-random stream of 1’s and 0’s.

  • The LFSRsum systems works by adding two streams together modulo 2 (XOR) to form a keystream to be used with the XOR cipher.

  • The last bit (\(b_1\)) of the seed values of each LFSR must be \(1\). This will ensure that a row will never result in all 0’s, which the LFSR would never be able to move out of.

  • If the two systems are \(n\)-bits and \(m\)-bits, the key for the LFSRsum cipher is \(n-1\) + \(m-1\) bits in length, where \(n-1\) and \(m-1\) refer to all but \(b_1\) in the seed value in each LFSR respectively.

This method also helps security, since if an attacker were to use a known-plaintext attack to determine some of the keystream, they will still need to determine the seed values for each of the component LFSRs that generate the keystream. While this may frustrate an attacker, it’s certainly possible to do, which will be detailed below.

Example of LFSRsum#

Encipher the message Leia using a 5-bit and 7-bit LFSR with key of 1000001011. The LFSRs have definitions of \(b_5 \leftarrow b_2' + b_3'\) and \(b_7 \leftarrow b_1' + b_2'\).

Solution: Converting base64 plaintext Leia to binary yields: 001011 011110 100010 011010, a 24-bit message.

The key of 1000001011 implies seeds of 10001 and 0010111 for the 5-bit and 7-bit LFSR respectively. Generating 24-bits from each LFSR yields.

5-bit Stream: 100010011010111100010011
7-bit Stream: 111010000111000100100110

XORing these streams together to create the keystream.

print( XOR('100010011010111100010011', '111010000111000100100110') )
011000011101111000110101

Now, use this keystream, 001011011110100010011010, to encipher the plaintext message, 011000011101111000110101 with another XOR:

print( XOR('001011011110100010011010', '011000011101111000110101') )
010011000011011010101111

Converting this binary stream back to base64 characters:


ciphertext binary: 010011 000011 011010 101111
ciphertext base64: TDav

Cracking LFSRsum#

Let’s attempt to deduce the seed values for both the 5-bit and 7-bit LFSR ciphers used in the previous example based on a known-plaintext attack. Again, we can assume that the definition for each LFSR is known to the attacker, \(b_5 \leftarrow b_2' + b_3'\) and \(b_7 \leftarrow b_1' + b_2'\).

The attacker would know that the 5-bit LFSR would start in an initial state of:

\(b_5\)

\(b_4\)

\(b_3\)

\(b_2\)

\(b_1\)

\(k_1\)

\(k_2\)

\(k_3\)

\(k_4\)

1

\(\vdots\)

\(\vdots\)

\(\vdots\)

\(\vdots\)

\(\vdots\)

and the 7-bit LFSR would start in an initial state of:

\(b_7\)

\(b_6\)

\(b_5\)

\(b_4\)

\(b_3\)

\(b_2\)

\(b_1\)

\(k_5\)

\(k_6\)

\(k_7\)

\(k_8\)

\(k_9\)

\(k_{10} \)

1

\(\vdots\)

\(\vdots\)

\(\vdots\)

\(\vdots\)

\(\vdots\)

\(\vdots\)

\(\vdots\)

where \(\{ k_1, k_2, k_3, ..., k_{10} \}\) represent the digits of the key for the system, \(k_1k_2k_3k_4k_5k_6k_7k_8k_9k_{10}\). We know (but the attacker does not) that this key is 1000001011.

Using the definition of each LFSR, the attacker can fill in subsequent rows of each table to create a generalized structure of each LFSR stream. Keep in mind that when using bitwise addition, \(k_1 \equiv k_1 + 2\) since \(2 \equiv 0 \pmod{2}\). Also, note that \(2k_1 \equiv 0 \pmod{2}\) regardless of if \(k_1\) is a 1 or 0. These facts will help keep the generalized LFSR streams as simple as possible.

5-bit LFSR

\(b_5\)

\(b_4\)

\(b_3\)

\(b_2\)

\(b_1\)

\(k_1\)

\(k_2\)

\(k_3\)

\(k_4\)

\(1\)

\(k_3 + k_4\)

\(k_1\)

\(k_2\)

\(k_3\)

\(k_4\)

\(k_2 + k_3\)

\(k_3 + k_4\)

\(k_1\)

\(k_2\)

\(k_3\)

\(k_1 + k_2\)

\(k_2 + k_3\)

\(k_3 + k_4\)

\(k_1\)

\(k_2\)

\(k_1 + k_3 + k_4\)

\(k_1 + k_2\)

\(k_2 + k_3\)

\(k_3 + k_4\)

\(k_1\)

\(k_2 + k_4\)

\(k_1 + k_3 + k_4\)

\(k_1 + k_2\)

\(k_2 + k_3\)

\({k_3 + k_4}\)

\(k_1 + k_3\)

\(k_2 + k_4\)

\(k_1 + k_3 + k_4\)

\(k_1 + k_2\)

\(k_2 + k_3\)

\(k_2 + k_3 + k_4\)

\(k_1 + k_3\)

\(k_2 + k_4\)

\(k_1 + k_3 + k_4\)

\(k_1 + k_2\)

\(k_1 + k_2 + k_3\)

\(k_2 + k_3 + k_4\)

\(k_1 + k_3\)

\(k_2 + k_4\)

\(k_1 + k_3 + k_4\)

\(k_1 + k_2 + k_3 + k_4\)

\(k_1 + k_2 + k_3\)

\(k_2 + k_3 + k_4\)

\(k_1 + k_3\)

\(k_2 + k_4\)

\(k_1 + k_2 + k_4\)

\(k_1 + k_2 + k_3 + k_4\)

\(k_1 + k_2 + k_3\)

\(k_2 + k_3 + k_4\)

\(k_1 + k_3\)

7-bit LFSR

\(b_7\)

\(b_6\)

\(b_5\)

\(b_4\)

\(b_3\)

\(b_2\)

\(b_1\)

\(k_5\)

\(k_6\)

\(k_7\)

\(k_8\)

\(k_9\)

\(k_{10}\)

\(1\)

\(1 + k_{10}\)

\(k_5\)

\(k_6\)

\(k_7\)

\(k_8\)

\(k_9\)

\(k_{10}\)

\(k_9 + k_{10}\)

\(1 + k_{10}\)

\(k_5\)

\(k_6\)

\(k_7\)

\(k_8\)

\(k_9\)

\(k_8 + k_9\)

\(k_9 + k_{10}\)

\(1 + k_{10}\)

\(k_5\)

\(k_6\)

\(k_7\)

\(k_8\)

\(k_7 + k_8\)

\(k_8 + k_9\)

\(k_9 + k_{10}\)

\(1 + k_{10}\)

\(k_5\)

\(k_6\)

\(k_7\)

\(k_6 + k_7\)

\(k_7 + k_8\)

\(k_8 + k_9\)

\(k_9 + k_{10}\)

\(1 + k_{10}\)

\(k_5\)

\(k_6\)

\(k_5 + k_6\)

\(k_6 + k_7\)

\(k_7 + k_8\)

\(k_8 + k_9\)

\(k_9 + k_{10}\)

\(1 + k_{10}\)

\(k_5\)

\(1 + k_5 + k_{10}\)

\(k_5 + k_6\)

\(k_6 + k_7\)

\(k_7 + k_8\)

\(k_8 + k_9\)

\(k_9 + k_{10}\)

\(1 + k_{10}\)

\(1 + k_9\)

\(1 + k_5 + k_{10}\)

\(k_5 + k_6\)

\(k_6 + k_7\)

\(k_7 + k_8\)

\(k_8 + k_9\)

\(k_9 + k_{10}\)

\(k_8 + k_{10}\)

\(1 + k_9\)

\(1 + k_5 + k_{10}\)

\(k_5 + k_6\)

\(k_6 + k_7\)

\(k_7 + k_8\)

\(k_8 + k_9\)

\(k_7 + k_9\)

\(k_8 + k_{10}\)

\(1 + k_9\)

\(1 + k_5 + k_{10}\)

\(k_5 + k_6\)

\(k_6 + k_7\)

\(k_7 + k_8\)

XORing these two general LFSR streams together results in the following keystream, all modulo 2, which the attacker can deduce to be equivalent to 01100001110... using a known-plaintext attack.

general LFSR keystream

keystream from known-plaintext

\(0\)

\(0\)

\(k_4 + k_{10}\)

\(1\)

\(k_3 + k_9\)

\(1\)

\(k_2 + k_8\)

\(0\)

\(k_1 + k_7\)

\(0\)

\(k_3 + k_4 + k_6\)

\(0\)

\(k_2 + k_3 + k_5\)

\(0\)

\(1 + k_1 + k_2 + k_{10}\)

\(1\)

\(k_1 + k_3 + k_4 + k_9 + k_{10}\)

\(1\)

\(k_2 + k_4 + k_8 + k_9\)

\(1\)

\(k_1 + k_3 + k_7 + k_8\)

\(0\)

Which means that the attacker now has a system of 10 linear equations with 10 unknown values. This is a system that can be solved through various algebraic or computational techniques.

\[\begin{split} \begin{cases} & & & k_4 & & & & & & + k_{10} &\equiv 1 \pmod{2} \\ & & k_3 & & & & & & +k_9 & &\equiv 1 \pmod{2} \\ & k_2 & & & & & & +k_8 & & &\equiv 0 \pmod{2} \\ k_1 & & & & & & +k_7 & & & &\equiv 0 \pmod{2} \\ & & k_3 & +k_4 & & +k_6& & & & &\equiv 0 \pmod{2} \\ & k_2 & +k_3 & & +k_5 & & & & & &\equiv 0 \pmod{2} \\ k_1 &+k_2 & & & & & & & & +k_{10} &\equiv 0 \pmod{2} \\ k_1 & & +k_3 & +k_4 & & & & & +k_9& &\equiv 1 \pmod{2} \\ & k_2 & & +k_4 & & & & +k_8& +k_9& &\equiv 1 \pmod{2} \\ k_1 & & +k_3 & & & & +k_7 & +k_8& & &\equiv 0 \pmod{2} \\ \end{cases} \end{split}\]

In this case, the attacker can utilize a variety of technology tools to implement Row Reduction, a very efficient way to solve systems of linear equations. Most mathematical software (Python, Maple, Mathematica, MATLAB, etc.) can do this with a bit of configuration to solve the system of equations quickly and determine that \(k_1k_2k_3k_4k_5k_6k_7k_8k_9k_{10}\) is equal to: 1000001011. This solution is equivalent to the seed values of 10001 and 0010111.

At this point the attacker has all the information needed to compute as much of the keystream as needed to completely decipher the message.