Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

RSA

The RSA Algorithm was first published by Ron Rivest, Adi Shamir, and Leonard Adleman in 1977, and as such named using the first letter of each of their last names. It was one of the first public-key cryptosystems and is still used today for secure data transmission. This algorithm is relatively slow, and is therefore usually used only to transmit keys to symmetric ciphers (like AES, Blowfish, and Triple DES) which can encrypt data for secure storage much quicker than RSA could. So while RSA does not encrypt a large volume of data, it does encrypt very important data!

The Phi Function

RSA key generation will use a common function in Number Theory, φ(n)\varphi \left( n \right) which is known as Euler’s Totient Function. This function outputs the number of integers between 0 and nn that are relatively prime to nn (meaning they don’t share a common divisor). For example, φ(8)=4\varphi(8) = 4 because 1, 3, 5, and 7 are all relatively prime with 8. Likewise, φ(7)=6\varphi(7) = 6 because there are 6 numbers between 0 and 7 that do not share a divisor with 7: 1, 2, 3, 4, 5, and 6.

Functions for computing the greatest common divisor and Euler’s Totient Function using Python are provided below.

def gcd(a, b): 
    if (a == 0): 
        return b 
    return gcd(b % a, a) 
  
def phi(n): 
    count = 1
    for i in range(2, n): 
        if (gcd(i, n) == 1): 
            count += 1
    return count 

In general, it’s not possible to write an equation to directly compute φ(n)\varphi(n). The code above essentially tries all numbers less than nn to determine if they share a factor other than 1 with nn. You could probably predict that this process is very inefficient, much like trying all the possible values of dd to see if it’s a multiplicative inverse of ee when trying to crack KidRSA.

There are a few shortcuts, however, for computing φ(n)\varphi(n). For a prime number, pp, it follows that φ(p)=p1\varphi(p) = p - 1 since all numbers less than pp are relatively prime with pp. Additionally, φ(n)\varphi(n) is a multiplicative function, meaning that φ(ab)=φ(a)φ(b)\varphi(a \cdot b) = \varphi(a) \cdot \varphi(b) as long as aa and bb are relatively prime. So, if you let p1,p2,p3,p_1, p_2, p_3, \ldots be the prime factors of nn, then φ(n)=φ(p1)φ(p2)φ(p3)=(p11)(p21)(p31)\varphi(n) = \varphi(p_1) \cdot \varphi(p_2) \cdot \varphi(p_3) \ldots = \left( p_1 - 1\right) \left( p_2 - 1 \right) \left( p_3 - 1 \right) \ldots. This particular shortcut will be helpful in implementing the RSA algorithm.

Key Generation

Let’s generate some keys:

  • Choose two different prime numbers, pp and qq. These are kept secret!

  • Set n=pqn = pq

  • Compute φ(n)=(p1)(q1)\varphi(n) = \left( p-1 \right) \left( q-1 \right). This is kept secret!

  • Choose any value of ee such that ee and φ(n)\varphi(n) are relatively prime.

    • For efficient encryption, ee should also have a small number of bits, and a small number of 1’s in the binary representation. As a result, the number 65,53765,537 is a common choice since it can be represented as 10000000000000001

  • Set d=e1(modφ(n))d = e^{-1} \pmod{ \varphi(n) }, which can be done quickly using Extended Euclidean Algorithm from the previous section

Just like in KidRSA, a public key will be formatted as (e,n)\left( e,n \right) and the private key formatted as (d,n)\left( d,n \right).

Encryption

To encrypt a message using RSA, convert your text message to an integer, mm, and create the ciphertext, cc by computing:

c=me(modn)c = m^e \pmod{n}

and then convert the integer cc back to text.

mem^e could be a very large number to compute before reducing modulus nn, so large that it may take a very long time to calculate, or be too large for your computer to handle. As a result, you’ll need to use a function that implements modular exponentiation. Modular exponentiation reduces the integer after each multiplication that results when raising a base to a power, instead of reducing after all the multiplications. This process keeps each intermediate step a smaller number that your computer can work with quickly and efficiently. Python has a built in function for modular exponentiation, pow(x, y, [z]). x is the base, y is the exponent and z is the modulus.

Decryption

To decrypt a message using RSA, convert your ciphertext message to an integer, cc, and return the plaintext mm by computing:

m=cd(modn)m = c^d \pmod{n}

The proof of why these operations are inverses of each other is beyond the scope of this course, and uses Euler’s Theorem of which Fermat’s Little Theorem is the special case when nn is prime.

Example

We’ll use the RSA algorithm to encrypt and decrypt the message hi

  • Choose two prime numbers, p=89p = 89 and q=101q = 101.

  • Set n=89101=8989n = 89 \cdot 101 = 8989

  • Compute φ(8989)=(891)(1011)=88100=8800\varphi(8989) = \left( 89-1\right)\left(101-1\right) = 88 \cdot 100 = 8800

    • Note that the prime factorization of 8800 is 25×52×112^5 \times 5^2 \times 11

  • Choose e=1323e = 1323.

    • Since the factorization of 8800 contains 2, 5, and 11 as the prime factors, 1323 was constructed using only prime factors of 3 and 7, ensuring it shares no factors with 8800

    • The prime factorization of 1323 is 33×723^3 \times 7^2

  • Use the Extended Euclidean Algorithm to compute d=2787d = 2787

The public key is (1323,8989)\left( 1323, 8989 \right) and the private key is (2787,8989)\left( 2787, 8989 \right).

Using base64, the plaintext message hi becomes 100001100010 in binary and m=2146m = 2146 as an integer.

To encrypt the message to create the ciphertext we need to complete the following calculation:

c=21461323(mod8989)c = 2146^{1323} \pmod{8989}

The value of 21461323 would have 4408 digits if we attempted to calculate it without modular exponentiation, resulting in approximately 5.573×1044075.573 \ldots \times 10^{4407}. Using python to help find the reduction in modulus 8989 yields

print( pow(2146, 1323, 8989) )
8298

So the ciphertext is c=8298c = 8298. Converting to binary 10000001101010, padding with leading zeros to create 6-bit characters, 000010 000001 101010, and then converting to the base64 characters, CBq. This is the message we can transmit using a public channel.

Let’s verify that the receiver can decrypt this message using the private key. Taking the ciphertext value c=8298c = 8298, decrypt by:

m=82982787(mod8989)m = 8298^{2787} \pmod{8989}

Using Python again for the calculation:

print( pow(8298, 2787, 8989) )
2146

We receive back the plaintext message as an integer, which can be converted to base64 characters to read.

Why is This Secure?

So, why is this algorithm any more secure than KidRSA? Let’s take a look at what an attacker can determine using only the public key, (e,n)\left( e,n \right).

In order to calculate the private key (d,n)\left( d,n \right), the attacker would need to know φ(n)\varphi(n), since that is the modulus in which ee and dd are multiplicative inverses. To determine φ(n)\varphi(n) quickly you would need to know the prime factors of nn which are the secret numbers pp and qq. So an attacker either needs to get these values for find some other way to calculate φ(n)\varphi(n).

For large values of nn, it becomes extremely difficult and time-consuming to factor nn to it’s prime factors. Directly computing φ(n)\varphi(n), also takes an extremely long time when nn becomes large. It’s important then, that nn be sufficiently large so that it would take an attacker too long to factor it back to pp and qq. The table below shows how long it would take to factor different size values of nn using off-the shelf computing hardware and the open-source software YAFU.

BitsTime
1280.4886 seconds
1923.9979 seconds
256103.1746 seconds
3001175.7826 seconds

The largest value of nn that has been successfully factored (and publicly disclosed) is 768-bits. This value of nn had 232 decimal digits and was factored on December 12, 2009 over the span of two years as part of the RSA Factoring Challenge. The CPU time spent on finding these factors by a collection of parallel computers amounted approximately to the equivalent of almost 2000 years of computing on a single-core 2.2 GHz AMD Opteron-based computer, so don’t try this at home! The advances in computing has resulted in most RSA systems now using at least 2048 bit values for nn, which have 617 decimal places. There is currently a 200,000200,000 USD cash prize for factoring such a value of nn.

2020 Update: An 829-bit value has been factored as of Feb 28, 2020.

To reiterate, the security of RSA does not come from mathematical proof but rather that it would take too much time and computing power to factor large values of nn. As of now, there isn’t a method. However, if someone does learn how to quickly factor large numbers using some new mathematical algorithm then RSA would be instantly insecure, much like how the Extended Euclidean Algorithm made KidRSA insecure.

Exercises for the Reader

  • Can you write a function that generates key pairs using the RSA algorithm when provided 2 prime numbers? Use can use a default value of e=65,537e = 65,537 but double check that it is relatively prime with φ(n)\varphi(n) before use.

  • Can you write a function that takes in a message and a key and implements the RSA algorithm to either encrypt or decrypt the message?