11.3. RSA#

The RSA Algorithm was first published by Ron Rivest, Adi Shamir, and Leonard Adleman in 1977, and as such named using the first letter of each of their last names. It was one of the first public-key cryptosystems and is still used today for secure data transmission. This algorithm is relatively slow, and is therefore usually used only to transmit keys to symmetric ciphers (like AES, Blowfish, and Triple DES) which can encrypt data for secure storage much quicker than RSA could. So while RSA does not encrypt a large volume of data, it does encrypt very important data!

The Phi Function#

RSA key generation will use a common function in Number Theory, \(\varphi \left( n \right)\) which is known as Euler’s Totient Function. This function outputs the number of integers between \(0\) and \(n\) that are relatively prime to \(n\) (meaning they don’t share a common divisor). For example, \(\varphi(8) = 4\) because \(1\), \(3\), \(5\), and \(7\) are all relatively prime with \(8\). Likewise, \(\varphi(7) = 6\) because there are \(6\) numbers between \(0\) and \(7\) that do not share a divisor with \(7\): \(1\), \(2\), \(3\), \(4\), \(5\), and \(6\).

Functions for computing the greatest common divisor and Euler’s Totient Function using Python are provided below.

def gcd(a, b): 
    if (a == 0): 
        return b 
    return gcd(b % a, a) 
  
def phi(n): 
    count = 1
    for i in range(2, n): 
        if (gcd(i, n) == 1): 
            count += 1
    return count 

In general, it’s not possible to write an equation to directly compute \(\varphi(n)\). The code above essentially tries all numbers less than \(n\) to determine if they share a factor other than \(1\) with \(n\). You could probably predict that this process is very inefficient, much like trying all the possible values of \(d\) to see if it’s a multiplicative inverse of \(e\) when trying to crack KidRSA.

There are a few shortcuts, however, for computing \(\varphi(n)\). For a prime number, \(p\), it follows that \(\varphi(p) = p - 1\) since all numbers less than \(p\) are relatively prime with \(p\). Additionally, \(\varphi(n)\) is a multiplicative function, meaning that \(\varphi(a \cdot b) = \varphi(a) \cdot \varphi(b)\) as long as \(a\) and \(b\) are relatively prime. So, if you let \(p_1, p_2, p_3, \ldots\) be the prime factors of \(n\), then \(\varphi(n) = \varphi(p_1) \cdot \varphi(p_2) \cdot \varphi(p_3) \ldots = \left( p_1 - 1\right) \left( p_2 - 1 \right) \left( p_3 - 1 \right) \ldots\). This particular shortcut will be helpful in implementing the RSA algorithm.

Key Generation#

Let’s generate some keys:

  • Choose two different prime numbers, \(p\) and \(q\). These are kept secret!

  • Set \(n = pq\)

  • Compute \(\varphi(n) = \left( p-1 \right) \left( q-1 \right)\). This is kept secret!

  • Choose any value of \(e\) such that \(e\) and \(\varphi(n)\) are relatively prime.

    • For efficient encryption, \(e\) should also have a small number of bits, and a small number of 1’s in the binary representation. As a result, the number \(65,537\) is a common choice since it can be represented as 10000000000000001

  • Set \(d = e^{-1} \pmod{ \varphi(n) }\), which can be done quickly using Extended Euclidean Algorithm from the previous section

Just like in KidRSA, a public key will be formatted as \(\left( e,n \right)\) and the private key formatted as \(\left( d,n \right)\).

Encryption#

To encrypt a message using RSA, convert your text message to an integer, \(m\), and create the ciphertext, \(c\) by computing:

\[ c = m^e \pmod{n}\]

and then convert the integer \(c\) back to text.

\(m^e\) could be a very large number to compute before reducing modulus \(n\), so large that it may take a very long time to calculate, or be too large for your computer to handle. As a result, you’ll need to use a function that implements modular exponentiation. Modular exponentiation reduces the integer after each multiplication that results when raising a base to a power, instead of reducing after all the multiplications. This process keeps each intermediate step a smaller number that your computer can work with quickly and efficiently. Python has a built in function for modular exponentiation, pow(x, y, [z]). x is the base, y is the exponent and z is the modulus.

Decryption#

To decrypt a message using RSA, convert your ciphertext message to an integer, \(c\), and return the plaintext \(m\) by computing:

\[m = c^d \pmod{n}\]

The proof of why these operations are inverses of each other is beyond the scope of this course, and uses Euler’s Theorem of which Fermat’s Little Theorem is the special case when \(n\) is prime.

Example#

We’ll use the RSA algorithm to encrypt and decrypt the message hi

  • Choose two prime numbers, \(p = 89\) and \(q = 101\).

  • Set \(n = 89 \cdot 101 = 8989\)

  • Compute \(\varphi(8989) = \left( 89-1\right)\left(101-1\right) = 88 \cdot 100 = 8800\)

    • Note that the prime factorization of \(8800\) is \(2^5 \times 5^2 \times 11\)

  • Choose \(e = 1323\).

    • Since the factorization of \(8800\) contains \(2\), \(5\), and \(11\) as the prime factors, \(1323\) was constructed using only prime factors of \(3\) and \(7\), ensuring it shares no factors with \(8800\)

    • The prime factorization of \(1323\) is \(3^3 \times 7^2\)

  • Use the Extended Euclidean Algorithm to compute \(d = 2787\)

The public key is \(\left( 1323, 8989 \right)\) and the private key is \(\left( 2787, 8989 \right)\).

Using base64, the plaintext message hi becomes 100001100010 in binary and \(m = 2146\) as an integer.

To encrypt the message to create the ciphertext we need to complete the following calculation:

\[c = 2146^{1323} \pmod{8989}\]

The value of \(2146^{1323}\) would have \(4408\) digits if we attempted to calculate it without modular exponentiation, resulting in approximately \(5.573 \ldots \times 10^{4407}\). Using python to help find the reduction in modulus \(8989\) yields

print( pow(2146, 1323, 8989) )
8298

So the ciphertext is \(c = 8298\). Converting to binary 10000001101010, padding with leading zeros to create 6-bit characters, 000010 000001 101010, and then converting to the base64 characters, CBq. This is the message we can transmit using a public channel.

Let’s verify that the receiver can decrypt this message using the private key. Taking the ciphertext value \(c = 8298\), decrypt by:

\[m = 8298^{2787} \pmod{8989}\]

Using Python again for the calculation:

print( pow(8298, 2787, 8989) )
2146

We receive back the plaintext message as an integer, which can be converted to base64 characters to read.

Why is This Secure?#

So, why is this algorithm any more secure than KidRSA? Let’s take a look at what an attacker can determine using only the public key, \(\left( e,n \right)\).

In order to calculate the private key \(\left( d,n \right)\), the attacker would need to know \(\varphi(n)\), since that is the modulus in which \(e\) and \(d\) are multiplicative inverses. To determine \(\varphi(n)\) quickly you would need to know the prime factors of \(n\) which are the secret numbers \(p\) and \(q\). So an attacker either needs to get these values for find some other way to calculate \(\varphi(n)\).

For large values of \(n\), it becomes extremely difficult and time-consuming to factor \(n\) to it’s prime factors. Directly computing \(\varphi(n)\), also takes an extremely long time when \(n\) becomes large. It’s important then, that \(n\) be sufficiently large so that it would take an attacker too long to factor it back to \(p\) and \(q\). The table below shows how long it would take to factor different size values of \(n\) using off-the shelf computing hardware and the open-source software YAFU.

Bits

Time

128

0.4886 seconds

192

3.9979 seconds

256

103.1746 seconds

300

1175.7826 seconds

The largest value of \(n\) that has been successfully factored (and publicly disclosed) is \(768\)-bits. This value of \(n\) had \(232\) decimal digits and was factored on December 12, 2009 over the span of two years as part of the RSA Factoring Challenge. The CPU time spent on finding these factors by a collection of parallel computers amounted approximately to the equivalent of almost 2000 years of computing on a single-core 2.2 GHz AMD Opteron-based computer, so don’t try this at home! The advances in computing has resulted in most RSA systems now using at least \(2048\) bit values for \(n\), which have \(617\) decimal places. There is currently a \(200,000\) USD cash prize for factoring such a value of \(n\).

2020 Update: An \(829\)-bit value has been factored as of Feb 28, 2020.

To reiterate, the security of RSA does not come from mathematical proof but rather that it would take too much time and computing power to factor large values of \(n\). As of now, there isn’t a method. However, if someone does learn how to quickly factor large numbers using some new mathematical algorithm then RSA would be instantly insecure, much like how the Extended Euclidean Algorithm made KidRSA insecure.

Exercises for the Reader#

  • Can you write a function that generates key pairs using the RSA algorithm when provided 2 prime numbers? Use can use a default value of \(e = 65,537\) but double check that it is relatively prime with \(\varphi(n)\) before use.

  • Can you write a function that takes in a message and a key and implements the RSA algorithm to either encrypt or decrypt the message?