Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Converting Text to Binary

In order to encipher text-based messages using binary numbers, you’ll need to use a standard convention to represent letters as numbers. Up until now, the numerical representation of letters has always used decimal numbers. Sticking with the same numerical values, only written as binary numbers would yield the following table.

CharacterDecimalBinaryCharacterDecimalBinary
A000000N1301101
B100001O1401110
C200010P1501111
D300011Q1610000
E400100R1710001
F500101S1810010
G600110T1910011
H700111U2010100
I801000V2110101
J901001W2210110
K1001010X2310111
L1101011Y2411000
M1201100Z2511001

Each of the 26 upper case letters can be represented by a 5-bit number. There are a few unused 5-bit numbers (26-31) that we can decide to use for other characters if we wish. One example could be to include some punctuation.

CharacterDecimalBinary
.2611010
!2711011
?2811100
(2911101
)3011110
-3111111

Note that these choices are arbitrary. Someone else may choose to use these remaining 5-bit numbers in a completely different way. There are several standard text to decimal and decimal to text standards that exist. We’ll explore a few in the following sections.

ASCII

The American Standard Code for Information Interchange (ASCII) was one of the first widely used standards for representing text in computers as binary numbers, dating back to the 1960s. The 7-bit binary codes allow for 128 different characters which were originally used for controlling printers via telegraph. As a result, The first 32 characters (0-31 in decimal) are not printable characters, but rather what are known as control characters that determined how printer should operate. For example, character 10 represents the “line feed” function which causes a printer to advance its paper, character 11 represents “horizontal tab”, and character 8 represents “backspace”.

An old ASCII, or USASCII as it was sometimes called, code chart is found below.

ASCII Chart

The column would determine the left-most 3 digits of the character, while the row would determine the right-most 4 digits of the character. For example, A would be 1000001 and t would be 1110100.

Unicode

As computers evolved and eventually overtook the telegraph for everyday communications, 8-bit representations became preferred. 8-bit numbers worked well with the newer 8, 16, 32, and now 64 bit processors found in computers. The 1 additional bit of data allowed for 128 additional character choices. As a result, ASCII evolved into many different variations that retained the original 128 characters, with very different options for the new 128 characters. Some variations were regional ( ISCII in India), VISCII in Vietnam ), others were for creating new characters that could be used to draw computer graphics. It wasn’t until the early 1990s that an 8-bit standard was widely adopted, Universal Coded Character Set (Unicode) Transformation Format, also known as UTF-8. As of September 2019, solely UTF-8 characters are used on 94.0% of all web pages in the world.

One benefit of UTF-8 is that you can use multiple 8-bit codes together to generate even more characters. In fact, emojis can be represented with Unicode characters. The Smiling Face with Sunglasses Emoji 😎 is represented as 11110000 10011111 10011000 10001110.

While incredibly powerful and customizable, Unicode is more complicated than we need to illustrate how binary operations can encrypt text based messages.

Base64

While ASCII and Unicode are impressive for the amount of different characters they can represent with 8-bits, in this course we’ll focus on using smaller, 6-bit numbers to keep examples easier to understand. Fortunately, there’s a standard for what might be considered the essential printable characters. It consists of the 26 uppercase letters, the 26 lowercase letters, the 10 numerals, and the + and / symbols. This set of 64 characters is known as Base64 and is widely used when sending and receiving information over the internet. Base64’s primary use is to convert binary information into text so it can be sent through many established text-based communications channels such email and HTML. When received, the text is turned back into binary where it might represent an image file, audio file, or any other file that can be read by a computer.

We’ll go against the norm, and use the Base64 table below to convert text to binary for use in our ciphers for the remainder of this chapter.

IndexBinaryCharIndexBinaryCharIndexBinaryChar
0000000A23010111X46101110u
1000001B24011000Y47101111v
2000010C25011001Z48110000w
3000011D26011010a49110001x
4000100E27011011b50110010y
5000101F28011100c51110011z
6000110G29011101d521101000
7000111H30011110e531101011
8001000I31011111f541101102
9001001J32100000g551101113
10001010K33100001h561110004
11001011L34100010i571110015
12001100M35100011j581110106
13001101N36100100k591110117
14001110O37100101l601111008
15001111P38100110m611111019
16010000Q39100111n62111110+
17010001R40101000o63111111/
18010010S41101001p
19010011T42101010q
20010100U43101011r
21010101V44101100s
22010110W45101101t

While we’ll be working with 6-bit numbers in Base64, the methods described in the remainder of the chapter would still work with numbers represented with more or less than 6-bits.

Using Python to Convert Between Base64 and Binary

Python has a built-in binary data type that can store binary data. However, it requires a careful understanding of the syntax and operations that pertain to binary, far beyond the scope of this course. As such, instead of using the binary data type in Python, this course will store binary information as strings of 1’s and 0’s. To facilitate quick conversions between Base64 characters and 6-bit binary, use the following functions.

charToBinary()

This function takes in a single base64 character and returns the corresponding 6-bit binary representation as a string.

Notes:

  • If a string with more than 1 character is input to the function, it will only convert the first character.

  • If a non base64 character is input to the function, it will return an empty string.

  • The output will always be a 6-bit binary number, even if fewer bits are needed to represent the character.

def charToBinary(char):
    if len(char)>1:
        char = char[0]
    if char in 'ABCDEFGHIJKLMNOPQRSTUVWXYZ':
        return '{:06b}'.format( ord(char) - 65 )
    elif char in 'abcdefghijklmnopqrstuvwxyz':
        return '{:06b}'.format( ord(char) - 71 )
    elif char in '0123456789':
        return '{:06b}'.format( ord(char) + 4 )
    elif char == '+':
        return '{:06b}'.format( 62 )
    elif char == '/':
        return '{:06b}'.format( 63 )
    else:
        return ''
print( charToBinary('A') )
000000
print( charToBinary('z') )
110011
print( charToBinary('+') )
111110
print( charToBinary('zebra') )
110011
print( charToBinary('?') )

binaryToChar()

This function takes in a string containing the 6-bit binary number and returns the corresponding base64 character representation as a string.

Notes:

  • The function will strip any spaces in the input string.

  • If the input string contains less than 6 bits, the function will pad the input out to 6-bits by using 0’s.

  • If the input string contains more than 6 bits, the function will return an empty string.

  • The output will always be a single base64 character.

def binaryToChar(binary):
    binary = binary.replace(' ','')
    if len(binary) < 6:
        binary = binary.zfill(6)
    if len(binary) > 6:
        return ''
    num = int(binary,2)
    if (num >= 0) and (num <= 25):
        return chr(num + 65)
    elif (num >= 26) and (num <= 51):
        return chr(num + 71)
    elif (num >= 52) and (num <= 61):
        return chr(num - 4)
    elif num == 62:
        return '+'
    elif num == 63:
        return '/'
    else:
        return ''
print( binaryToChar('000101') )
F
print( binaryToChar('101') )
F
print( binaryToChar('0100101') )

print( binaryToChar('100111') )
n

XOR()

This function takes two strings that both contain binary data of arbitrary length and returns a single that represents the XOR of the input strings.

Notes:

  • The function will strip any spaces from the input strings.

  • The output will be padded to be equal in length to the longer input string.

def XOR( binary1, binary2):
    binary1 = binary1.replace(' ','')
    binary2 = binary2.replace(' ','')
    result = format(int(binary1, 2) ^ int(binary2, 2), 'b')
    return result.zfill(max(len(binary1), len(binary2) ))
print( XOR( '1110', '0001' ) )
1111
print( XOR( '1110', '1' ) )
1111
print( XOR( '10110 01000 01100 10010 10100 01001', '11010 11001 00011 11010 11001 00011' ) )
011001000101111010000110101010