Crazy Simple Computer Science Series/3 (Machine Language/ASCII and UNICODE)
Welcome to the Crazy Simple Computer Science Series!
*This series introduces readers to the basics of computer science in a way that anyone can understand. This series, which aims to make computer science and working principles fun, aimed to give the basic logic of computer science and helpful information that you can use in daily life in an understandable language.
In the previous articles of the Crazy Simple Computer Sciences series, we talked very simply about how computers work, what the Matrix is, the worlds consisting of 0 and 1, and the name of this system (binary notation, and the tools that enable computers to communicate in this way (transistors, machine language).
This article will talk about some of the standards we use to convert machine language into understandable numbers and symbols.
This article is directly linked to previous articles in the series. Therefore, first of all;
I recommend you to read the first 2 articles of the series first if you are interested.
The American System Code for Information Interchange (ASCII) is a character encoding standard for electronic communication. In computers, telecommunications equipment, and other devices, ASCII codes represent text. Although they enable a large number of extra characters, most current character encoding methods are based on ASCII.
Unicode (and its encodings, such as UTF-8) are standards that are independent of any operating system or computer.
The decimal numbers 65 and 66 represent capital A and capital B, respectively, as well as the numbers 97 and 98 for lowercase a and lowercase b, and everything in between. As a result, this system, this code, ASCII, is simply a collective agreement that whenever you’re using a text-based program rather than a numeric-based program, any patterns of bits, such as those that might represent 65 in a calculator, should be interpreted as a letter in Microsoft Word, an SMS message, or an iMessage. As a result, how bits are understood is totally reliant on the context.
The same sequence of bits can be interpreted in different ways depending on the software you’re using — as numbers, letters, or even something else. So, how do we represent all of the other symbols that aren’t A through Z? With the American Standard Code for Information Interchange.
“Originally based on the English alphabet, ASCII encodes 128 specified characters into seven-bit integers as shown by the ASCII chart above. Ninety-five of the encoded characters are printable: these include the digits 0 to 9, lowercase letters a to z, uppercase letters A to Z, and punctuation symbols.”
There are hundreds, if not thousands of letters with accents and other symbols, let alone punctuation marks, and there are hundreds, if not thousands, of different characters in other languages that you must represent ideally in order to convey that text and produce a document.
ASCII uses 7 or 8 bits in total. So, how many possible values can you represent with 8 bits — or binary digits, if you will — 2 times 8? That leaves you 256 letters to choose from. That’s more than enough for the 26 letters of the English alphabet, both capital, and lowercase, but once you start adding punctuation, accents, and other characters, you’ll rapidly run out of space.
But ASCII couldn’t handle it on its own. Because ASCII used 7 or possibly 8 bits in total.
Unicode- Universal Code
As a result, the world created Unicode, which offered a variable-length encoding of letters rather than only using 8 bits or 1 byte, if you will — 1 byte simply meaning a bit. As a result, certain letters may occupy 8 bits or a single byte. 2 bytes or 16 bits may be used in other letters. Other letters may take three or even four bytes. And you can express 4 billion different values with only four possible bytes — two to the 32 bits — which is a large amount. But how does the encoding scheme operate in practice? Consider the following scenario:
You might represent A with — uppercase A with 65, capital B with 66, and so on in both ASCII and Unicode, with ASCII now being a subset of Unicode insofar as it only requires 1 byte per character. And dot-dot-dot means we’ve taken care of the others as well, and they’re all back-to-back-to-back. So, imagine you got a sequence of 0’s and 1’s in a text message that, if you performed the arithmetic in the one's place, the two's place, and so on, turned out to be the decimal number 72, 73, 33.
So, what was the last text message you got? Of course, computers only comprehend binary at the end of the day, and that information is transferred from phone to phone through the internet — more on that later. However, in the context of text messaging software, those 0s and 1s will most likely be understood as ASCII or, more broadly, Unicode, rather than binary or decimal.
“Unicode is a standard encoding system that is used to represent characters from almost all languages. Every Unicode character is encoded using a unique integer code point between 0 and 0x10FFFF .”
Fun Fact: Unicode is Capable of encoding about at least 1,110,000. :)
For other articles in the series;
This is original content from NewsBreak’s Creator Program. Join today to publish and share your own content.