Understanding Base64 Encoding

Base64 encoding appears all over the web. What is it and why is it so prominent?

A function that base64 encodes takes in binary data as input, and outputs a string that is limited to to each character being 6 bits. As a result, the output is 33% larger than the input. The character range is a-z, A-Z, 0-9, +, / and a special purpose character = (the 65th character).

Why?

If you send raw bytes over the wire, these can be interpreted incorrectly by various protocols that happen at various levels of the transport. By limiting the character set to ‘safe’ characters, we can be more confident that the bytes won’t be mangled or used in other layers. It prevents characters like the control characters seen at the start of the ASCII set.

How?

If the input is divisible by 24, then no padding character will exist in the encoding. However if this is not the case then 1 or more padding characters will be present.

Size

You might have noticed that our characters YWI= are not ASCII codes, but instead follow the specs table:

     Value Encoding  Value Encoding  Value Encoding  Value Encoding
         0 A            17 R            34 i            51 z
         1 B            18 S            35 j            52 0
         2 C            19 T            36 k            53 1
         3 D            20 U            37 l            54 2
         4 E            21 V            38 m            55 3
         5 F            22 W            39 n            56 4
         6 G            23 X            40 o            57 5
         7 H            24 Y            41 p            58 6
         8 I            25 Z            42 q            59 7
         9 J            26 a            43 r            60 8
        10 K            27 b            44 s            61 9
        11 L            28 c            45 t            62 +
        12 M            29 d            46 u            63 /
        13 N            30 e            47 v
        14 O            31 f            48 w         (pad) =
        15 P            32 g            49 x
        16 Q            33 h            50 y

Now we have these values. We can now encode these characters to ASCII. This is where the 33% size increase occurs. The final bits sent are: 1011001 1010111 1001001 111101.