Pi code: an encryption scheme

« Back to blog :: June 1, 2011

When I was in high school and still harbouring goals of one day becoming a spy, I made up this code revolving around the first twenty-six digits of pi (which I conveniently had memorised) and the Fibonacci sequence. I figured, the more esoteric and nonsensical, the better, so that if I ever needed to send a message to my future self I could rest assured knowing that no one except me would ever [want to] decipher it. This code was evidentally created when I believed in the idea of security through obscurity, and also when I knew next to nothing about cryptography except that you could encrypt stuff and then, like, decrypt it and stuff; also, public keys. (The latter still holds true.)

The setup

So here’s the idea. First, write out the first 26 digits of pi:

31415926535897932384626433

Then, underneath each digit, write out the number of times that digit will have appeared thus far:

31415926535897932384626433
11121111223121332422233356

Now, it’s pretty clear what we can do - assign each letter of the alphabet to a sequence of two letters. Pretty simple, but perhaps too simple. Now let’s make it a bit tricker with Fibonacci. If we write it as a sequence without spaces, it looks like this:

1123581321345589144233377...

We can then add that sequence to the ciphertext, resulting in an added layer of confusion and rendering frequency-based analysis almost useless. And there you go.

Encoding and decoding a sequence

So let’s say you wanted to encode the following sequence: lol mudkips lol

First, let’s revisit our alphabet:

31415926535897932384626433
11121111223121332422233356
ABCDEFGHIJKLMNOPQRSTUVWXYZ

So we have L = 81, O = 93, M = 92, U = 62, D = 12, K = 53, I = 52, P = 33, S = 82 and that’s all we need. This monoalphabetic substitution looks like this:

819381 92621253523382 819381

The spaces don’t look so good. So let’s disguise them by replacing them with 0’s. You could also replace them with any sequence of digits beginning and ending with a 0, which would result in some junk characters, but would also helps to ensure that you correctly decoded a message and nothing got lost, because you if you have an odd number of spaces then something is wrong. You could alternatively use any sequence of two digits starting with 0, or even something else entirely, but 0 is a good bet because it happens to not appear in the first 26 digits of pi. (It appears around digit #33, incidentally.) In this version, we’ll keep it simple by replacing spaces with 0s.

8193810926212535233820819381

A bit better, but there’s still a pattern with the 0s, in that there is always an even number of digits between them. And this is still a monoalphabetic cipher, barely a step above the Polybius cipher once you figure out that 0 = space (not a difficult conclusion to make). So, let’s add Fibonacci:

8193810926212535233820819381
1123581321345589144233377610 +
----------------------------
9317392247558124378054196991

Cool. Now there’s no pattern between 0’s at all. This even acts as a bit of a salt, although it’s not an actual salt. To make it even more devious, we could make it look a bit like a longer md5 hash by adding random letters:

9t3173e9r2r247558ib12437l805419e6991

Go for lowercase, to prevent confusion with O and 0. Who decided to make the letter O look just like 0, anyway?

And there you go. To decode it, you first need to remove all the letters. Then write out the Fibonacci sequence. How many digits you need depends on the encoded string - if the first two characters are 12 or less then you probably need one digit more than the length of the string. (Basically, you need the difference to be positive.) Then subtract it, and separate it into words depending on the 0’s, then refer to the pi-code chart to figure out the individual letters.

Give it a try - how would you decode this?

225h86e159lp485t1r7116a2pp652e7d5992i71n518p028i4c8291791co1d4e028636f0a58270c82t32918991o896893r5y540755544669

Implementation in Python

picode.py (below), and picode_utils.py - the latter was too long for this page. (For that matter, picode.py is also kind of too long for this page, but I wanted to try out Pygments.)

#!/usr/bin/env python

import sys, re
from picode_utils import *

# Prints out the usage text along with any error messages
def error(text=''):
    print text
    print "Usage: ./picode.py [plaintext|ciphertext] (options: [-c|--characters] [characters], [-r|--reverse])"
    exit(1)

def main(argv):
    if len(argv) < 2:
        # Give it either plaintext or ciphertext; it will determine which it is
        error("Not enough arguments")

    text = argv[1]
    code = PiCoder()
    
    # See if we need to reverse the Fibonacci sequence (for decoding and encoding)
    if '-r' in argv or '--reverse' in argv:
        reverse = True
    else:
        reverse = False
        
    # If it has numbers, then it must be ciphertext
    if re.match('[0-9]+', text) is not None:
        # Mode: decode
        try:
            print "Plaintext: " + code.decode(text, reverse)
        except CiphertextError:
            error("Invalid ciphertext (must be alphanumeric)")
    else:
        # If it doesn't, then it must be plaintext
        try:
            # If the -c flag is set, put the given characters randomly in there
            characters = ''
            for i in range(len(argv)):
                if argv[i] == '-c' or argv[i] == '--characters':
                    try:
                        characters = argv[i+1]
                        if characters[0] == '-':
                            raise IndexError # whatever it works
                    except IndexError:
                        error("Missing characters following the -c flag")
            
            print "Ciphertext: " + code.encode(text, characters, reverse)
        except PlaintextError:
            error("Invalid plaintext (may only contain alphabetic characters and spaces)")

if __name__ == "__main__":
    main(sys.argv)

Pros:

If you remember pi and are good with arithmetic, then it’s pretty simple
Difficult to decode unless you already know how it works
Can’t do letter-based frequency analysis, or even word-based if you use 0 for spaces and then use Fibonacci as a salt
Lots of different ways to go about implementing it (see alternatives)

Cons:

No easy way of implementing punctuation or numbers in a consistent manner, unless you want to use more digits of pi (which could get messy) or do something with zeros
Decoding by hand is messy, time-consuming and easy to do incorrectly
Words starting with the same sequence of characters will look similar (unless you use reverse Fibonacci)
Useless

Alternatives:

Instead of using pi, use some other sort of substitution cipher - any that involves numbers would work
Or, add an extra layer - caesar cipher or reverse cipher it first
Start numbering at 0 instead of 1 (although you’d have to do spaces differently), or even number backwards (9876 etc)
Add Fibonacci in the other direction (i.e. reverse the sequence - try running the Python script with the -r or –reverse flag)
Using something other than Fibonacci for the salt - say, powers of 2 (although that has the disadvantage of always being even) or powers of 3 (you probably wouldn’t want powers of 1)
Converting the numbers to letters, somehow, and sprinkling in random numbers for extra deviousness
Using an actually secure form of encryption