What's Unicode, anyway?

Characters from different languages and scripts are encoded, which means that each character is mapped to a distinct binary number.

Why Unicode?

There are different character encodings like Unicode (used by over 80% of the web) and ASCII. Their main difference is the amount of characters they can encode.

For example, ASCII uses only 7 bits, so it can encode at most 127 characters (i.e. 2^7 - 1) whereas Unicode can use as many as 32 bits to encode a character.

Because of that size difference, websites like Facebook and Twitter that are used by people around the globe to communicate in their own languages use Unicode. Even emojis are in unicode! Countries can also have local encoding systems, such as Guobiao from China and Big5 from Taiwan, which each have their own conversions to Unicode.

Even bits can be political

It’s interesting to note that characters are not just added by anyone – there is a board of members ranging from tech companies like Shopify and Google to national governments and expert linguists for Unicode. They decide what is included in the language (and what isn’t).

There can be political and social implications for which characters are included, as well as inconsistencies between how people write in the language and how they are represented in unicode.


Resources

To learn more about Unicode,

This project is maintained by KarishmaDaga