5.4. Programming the Caesar Cipher#

At this point you should have a good feel for the Caesar cipher algorithm. In this section we’ll learn how to use our Python knowledge to program the Caesar cipher. To get started, let’s create some helper functions that perform operations that we think we may use in other ciphers, not just the Caesar cipher. By writing these helper functions, we can keep the main caesar function very modular, which will allow us to reuse these helper functions in other places. That way if we tweak the code in a helper function, those changes will be implemented everywhere else that function is called from automatically

text_clean#

The first thing we should do when working with text is clean (or sanitize) the text to ensure it doesn’t have any symbols or characters in it that we don’t want to or can’t work with. For now, we want to limit ourselves to the 26 character Latin alphabet used in the English language. Because Python is case-sensitive we should pick just upper-case or just lower-case to work with. In this course/resource we will clean our text by transforming it to the 26 upper-case English language characters: ABCDEFGHIJKLMNOPQRSTUVWXYZ. Let’s write a function that can do this for us.

def text_clean( text, LETTERS = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'):
    """
    Arguments:
        text (str): a piece of text for cleaning
    Returns:
        (str): text with only the characters also found in LETTERS
               lower-case letters in text will be made upper-case  
    """
    
    cleaned_text = ''
    
    for character in text:
        if character.upper() in LETTERS:
            cleaned_text += character.upper()
    
    return cleaned_text

This function starts initializing an empty string cleaned_text which will eventually contain all the upper-case version of the letters found in text. Next, the for loop iterates over each character in the string text. The body of the loop is checking if the upper-case version of the character (character.upper()) is found in the string LETTERS. If it is, it concatenates character.upper() to the string cleaned_text. There is no code that’s run if character.upper() is not found in LETTERS. Once the loop does this for every character in text it returns the string assigned to cleaned_text.

print( text_clean('This should be cleaned!') )
THISSHOULDBECLEANED
print( text_clean('L0Ts 0F nu/\/\b3rs') )
LTSFNURS

This function uses an “allowed characters” approach to ensure only the characters in LETTERS make it through to cleaned_text. This is an easier method to use than a “banned characters” approach which seeks for characters that shouldn’t be there and remove them. It’s much easier to list out all the characters that should be kept than trying to think up of all the characters that shouldn’t be kept. Notice that this function uses LETTERS as an optional keyword argument that can be changed later if we ever want to use other alphabets.

char_to_int and int_to_char#

Every substitution cipher relies on performing an operation on the numerical representation of characters. We use the convention that A = 0, B = 1, and so on. We should write functions that can quickly convert between a character and it’s equivalent numerical representations.

def char_to_int( character, LETTERS = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' ):
    """
    Arguments:
        character (str): A single character
    Returns:
        (int): the integer representation of the character
    """
    integer = LETTERS.find(character)
    return integer
def int_to_char( integer, LETTERS = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' ):
    """
    Arguments:
        integer (int): An integer between 0 and len(LETTERS)
    Returns:
        (str): a single character representation of the integer
    """
    character = LETTERS[integer]
    return character

Both of these functions use string methods and indexing with the ordered string LETTERS to correct convert between character and integer representations of the alphabet. Notice that this function uses LETTERS as an optional keyword argument that can be changed later if we ever want to use other alphabets.

print( char_to_int('L') )
11
print( int_to_char(17) )
R

text_block#

Lastly, we need to ensure our ciphertext messages are blocked into groups of 5 characters. We’ll write a function to perform this operation as well.

def text_block( text, size = 5 ):
    """
    Arguments:
        text (str): text to block
        size (int, optional): # of characters in a block
    Returns:
        (str): text blocked into groups of specified size
    """
    
    blocked_text = ''
    
    for character in text:
        if len(blocked_text.replace(' ', '') ) % size == 0 and len(blocked_text) != 0:
            blocked_text += ' '

        blocked_text += character
    
    return blocked_text
print( text_block('HELLOFRIENDS') )
HELLO FRIEN DS
print( text_block('SMALLERGROUPS', size = 3))
SMA LLE RGR OUP S

This function starts by initializing an empty string blocked_text which will hold the text that has been blocked into groups of size characters, which has a default of 5. Then, the for loop iterates over the string text one character at a time. The body of the loop checks to see if the length of the string without any spaces (blocked_text.replace(' ', '')) is divisible by size. If it is, then the code will insert a space to start the next block. For example, if size = 5 then if len(blocked_text.replace(' ','')) is 5, 10, 15, etc, a space should be inserted before the next character is concatenated onto the end. There is also a check to make sure that the string isn’t empty (len(blocked_text) != 0),so a space isn’t added at the start of the message.

The caesar Function#

We’re just about ready to program the actual caesar function. But first let’s plan out some features we’d like our function to have, as it may impact how we start writing the code. It would be nice if our one function could:

  • take in an “unclean” message, but still be able to work

  • perform both encryption and decryption

  • format the output appropriately for if it is plaintext or ciphertext

These requirements should be enough to get started by defining the function and setting up some conditional branches:

def caesar( message, key, encipher = True ):
    """
    Arguments:
        message (str): either a plaintext or ciphertext
        key (int): key to use
        encipher (bool, optional): when True, encrypts the message
                                   when False, decrypts the message
        LETTERS (str, optional): the alphabet used for encryption
    Returns:
        (str): the plaintext or ciphertext formatted appropriately
    """
    
    message = text_clean( message )
    output = ''
    
    if encipher == True:
        # encipher code goes here
        return text_block( output )
    else:
        # decipher code goes here
        return output.lower()

This incomplete function creates the docstring which details the 3 different arguments that will be passed to the function, cleans the provided message using the text_string function, initializes an empty string output that will eventually hold the message to be returned, and creates the conditional branching needed. Notice that the return statements are using either text_block or .lower() to ensure the plaintext and ciphertext are formatting correctly.

Next, let’s focus on the enciphering branch of the function. We can write some code that will iterate over the string message character by character and determine the corresponding ciphertext character. We’ll use the helper function char_to_int to assist.

def caesar( message, key, encipher = True ):
    """
    Arguments:
        message (str): either a plaintext or ciphertext
        key (int): key to use
        encipher (bool, optional): when True, encrypts the message
                                   when False, decrypts the message
        LETTERS (str, optional): the alphabet used for encryption
    Returns:
        (str): the plaintext or ciphertext formatted appropriately
    """
    
    message = text_clean( message )
    output = ''
    
    if encipher == True:
        for plaintext_character in message:
            plaintext_numerical = char_to_int( plaintext_character )
            ciphertext_numerical = (plaintext_numerical + key) % 26
            ciphertext_character = int_to_char( ciphertext_numerical )
            output += ciphertext_character
        return text_block( output )
    else:
        # decipher code goes here    
        return output.lower()

Now the caesar function should work for enciphering messages. We can add similar code to the other branch of the function to decipher the message. The only difference besides updating the variable names to accurate titles is that instead of adding key to the numerical representation, the code will subtract the value of the key.

def caesar( message, key, encipher = True ):
    """
    Arguments:
        message (str): either a plaintext or ciphertext
        key (int): key to use
        encipher (bool, optional): when True, encrypts the message
                                   when False, decrypts the message
        LETTERS (str, optional): the alphabet used for encryption
    Returns:
        (str): the plaintext or ciphertext formatted appropriately
    """
    
    message = text_clean( message )
    output = ''
    
    if encipher == True:
        for plaintext_character in message:
            plaintext_numerical = char_to_int( plaintext_character )
            ciphertext_numerical = (plaintext_numerical + key) % 26
            ciphertext_character = int_to_char( ciphertext_numerical )
            output += ciphertext_character
        return text_block( output )
    else:
        for ciphertext_character in message:
            ciphertext_numerical = char_to_int( ciphertext_character )
            plaintext_numerical = (ciphertext_numerical - key) % 26
            plaintext_character = int_to_char( plaintext_numerical )
            output += plaintext_character   
        return output.lower()

caesar should not operate on enciphering and deciphering any message we pass into it.

print( caesar('sample message', 10) )
CKWZV OWOCC KQO
print( caesar('CKWZV OWOCC KQO', 10, encipher=False) )
samplemessage
print( caesar('This message has 5 words!', 3) )
WKLVP HVVDJ HKDVZ RUGV
print( caesar('WKLVP HVVDJ HKDVZ RUGV', 3, encipher=False) )
thismessagehaswords

Notice that if your plaintext message has numbers or other characters, those will not be present when deciphering the message. We will update our text_clean function later in the course so it can include numerical inputs, but for now be mindful that you’ll lose any information that isn’t one of the 26 characters in the English language.

Code Visualization#

Below you’ll find the code visualization for how the Caesar cipher function (and it’s helpers) work together to encipher a message.

Next Steps#

This framework for the caesar function will be very helpful in creating similar functions for other substitution ciphers. Since the only difference between the Caesar, Multiplicative, and Affine ciphers are the mathematical operations performed, similar functions should be able to be written for those ciphers as well with only a little modification.