Thursday, 22 September 2016

Technical Blog 1-Machine code and Assembly

Introduction
In this blog I aim to learn and describe the lowest level coding language there is: Machine code. Invariably tied to machine code is the idea of compilers which act as a translator from the high level code that we can write and read relatively easily into the dense and impractical language of the machines. Therefore I will also be learning about compilers and exactly how they function on a practical level.

The most basic question that arises when talking about machine code is “why?”. Why would anyone in this day and age want to learn or even understand machine code? Computing as a whole has essentially developed beyond the point where anyone will ever write a program in machine code or even assembly for any practical reason. These languages are incredibly error prone, inflexible, difficult to debug and take forever to write due to having to look up the binary codes required to do each command. However there are reasons to at least understand the basics of these languages, primarily it is good from a general understanding point of view. It shows higher-level coding in a new light and gives a better appreciation for how easy coding can be these days. In a more practical sense there are times when low level languages can achieve efficiency that compiler-driven high-level code cannot as the code can be tailored to perform very specific tasks in very specific ways. Personally though I just find the whole topic interesting, like a dark and mysterious cavern filled with 1s and 0s where most people do not dare to enter.

Decimal
Hexadecimal
Binary
00
00
000000
12
0C
001100
32
20
100000
45
2D
101111







Table 1: Shows a series of numbers being expressed in different forms including binary

So what is machine code? Machine code is the language which a computer can actually read. It is transmitted to the computer in binary form which is incredibly difficult to comprehend by humans. Normally when being directly interacted with by people it is viewed or written in hexadecimal form or similar. Table 1 shows how binary ends up being very unwieldy as lots of digits are required to show relatively simple numbers. Whilst it also looks alien at first, hexadecimal is generally preferred to decimal as it is base 16 which means it lines up better with binary being base 2 and enables much cleaner correlation between the two languages than decimal which is base ten.

Instruction sets
Different processors read machine code differently, that is to say that even at the lowest level there are a variety of different languages that can be written in, although in this case they the choice is based entirely on which language the processor is designed to read. This does mean that machine code that functions on one machine will likely be completely unusable on another, which is another one of the many reasons to not write code directly in machine language. These languages are known as instruction sets, they are a collection of instructions which essentially tell the computer: “if given this string of binary digits, perform this task.”

Instruction sets are themselves divided into many sub-categories such as Complex instruction sets and Reduced instruction sets which each have different attitudes to different types of instructions. For example the reduced instruction set only implements simple, common commands directly and performs complex commands using combinations of the commons commands. Contrastingly, complex instruction sets will be able to directly implement some of the more specialised tasks. This leads to being able to more efficiently perform those specialised tasks at the cost of being less efficient on average for the simple common tasks.

Opcode
Addresses
Other Information
Defines the operation type and how to divide the other 26 bits
Most operations will require some number of addresses, each 5 bits long they point to a specific memory address to either read or write
Different operations will require additional inputs of varying lengths to describe exactly what to do.
001100
10011
01010
0101000110101010
Table 2: Shows an example of an instruction in MIPS form.





Instruction sets are set up to take in a set number of bits per instruction to enable them to be read as a simple binary stream i.e. the computer will know that every so many bits is a new instruction. The most common instruction sets use either 32-bit or 64-bit instructions although pretty much any number can be used. These instructions will generally be divided into sub sequences of information depending on the language. For example, in MIPS which is a 32-bit reduced instruction set, the first 6 bits will determine the type of instruction being performed and how to use the rest of the bits, this includes defining the addresses to be read from and stored to, any additional information required and what to do with this information, Table 2 shows the structure of a MIPS instruction. All possible tasks for the computer to perform including file reading and writing, arithmetic operations and even jumping about within the program are defined within the 32 bit instructions in this way, although as I touched on before, many instructions will be combined to perform more complex operations as needed depending on the instruction set.

Conclusion

I have only begun to scratch the surface of machine code in this post. But it seems clear to me that there is a lot to talk about relating to the lowest level languages. Over this series I will continue to develop my own understanding of this topic. I will be investigating assembly language, compilers and other related subjects in order to break through the rift between people and machines.

References
1. MIPS® Architecture For Programmers Volume I-A: Introduction to the MIPS32® Architecture [Internet]. 1st ed. Imagination Technologies; 2014 [cited 22 September 2016]. Available from: https://imagination-technologies-cloudfront-assets.s3.amazonaws.com/documentation/MD00082-2B-MIPS32INT-AFP-06.01.pdf

2. Fairhead H. Hexadecimal [Internet]. I-programmer.info. 2016 [cited 22 September 2016]. Available from: http://www.i-programmer.info/babbages-bag/478-hexadecimal.html?start=1