Understanding Base64 Encoding
Base64 encoding appears all over the web. What is it and why is it so prominent?
A function that base64 encodes takes in binary data as input, and outputs a string that is limited to to each character being 6 bits. As a result, the output is 33% larger than the input. The character range is a-z, A-Z, 0-9, +, / and a special purpose character = (the 65th character).
Why?
If you send raw bytes over the wire, these can be interpreted incorrectly by various protocols that happen at various levels of the transport. By limiting the character set to ‘safe’ characters, we can be more confident that the bytes won’t be mangled or used in other layers. It prevents characters like the control characters seen at the start of the ASCII set.
How?
- Given the string
ab
, the binary data is01100001 01100010
- Chunking this binary data to 6 bits instead gives us
011000 010110 0010
- As the last chunk is only 4 bits, pad:
011000 010110 001000
- Given the spec defining the characters, this will provide the following characters:
YWI
- Because the spec requires groups of 24 bits, this is padded out to be
011000 010110 001000 000000
. Padding is encoded as=
, resulting inYWI=
If the input is divisible by 24, then no padding character will exist in the encoding. However if this is not the case then 1 or more padding characters will be present.
Size
You might have noticed that our characters YWI=
are not ASCII codes, but instead follow the specs table:
Value Encoding Value Encoding Value Encoding Value Encoding
0 A 17 R 34 i 51 z
1 B 18 S 35 j 52 0
2 C 19 T 36 k 53 1
3 D 20 U 37 l 54 2
4 E 21 V 38 m 55 3
5 F 22 W 39 n 56 4
6 G 23 X 40 o 57 5
7 H 24 Y 41 p 58 6
8 I 25 Z 42 q 59 7
9 J 26 a 43 r 60 8
10 K 27 b 44 s 61 9
11 L 28 c 45 t 62 +
12 M 29 d 46 u 63 /
13 N 30 e 47 v
14 O 31 f 48 w (pad) =
15 P 32 g 49 x
16 Q 33 h 50 y
Now we have these values. We can now encode these characters to ASCII. This is where the 33% size increase occurs. The final bits sent are: 1011001 1010111 1001001 111101.