Encoding and Decoding Bytes Explained
A computer can only store bytes.
This means that if we want to store anything at all in a computer, we must first convert it to bytes, or encode it.
What’s an encoding?
Different types of data have different available encodings:
Data | Encoding |
---|---|
Image | JPEG , PNG , etc. |
Video | AVI , MP4 , etc. |
Music | MP3 , WAV , etc. |
Text | ASCII , UTF-8 , etc. |
To store any of the data above, we must first encode this data using any of its respective encodings.
For instance, to store an image, we must first encode it using JPEG
, PNG
, etc.
MP3
, AVI
, PNG
, ASCII
and all the others listed above are also examples of encodings.
As we can see, an encoding is a format to represent images, video, audio, text, etc. in bytes.
It’s all just bytes?
This all means that all data on our disk is just a bunch of bytes. The bytes could represent a string of text, a video, an image, we don’t know.
And we won’t know until we know what encoding this data is in.
A string of bytes is pretty much useless to us unless we know its encoding.
We can see an example of this in Python.
Python Example
To encode, or convert to a byte string, we can use encode(format)
, format
being the encoding we want to use.
bytestring = 'Random string'.encode('utf-8')
print(bytestring) # b'Random string'
Here, we are converting 'Random string'
to its byte representation using the encoding UTF-8
.
When we print this out, we’ll get b'Random string'
. The b
is Python’s way of denoting a byte string.
However, note that we can’t actually read these bytes in bytestring
. The only reason why it says b'Random string'
and not some byte-gibberish after we encode it is because Python decodes the string from the UTF-8
format when printing. We only know it’s a byte string from the b
.
Given the encoding, we can decode a byte string in Python using decode(format)
.
bytestring.decode('utf-8') # 'Random string'