Saturday, 22 October 2016

Technical Blog 3-Compilers

I have up to this point mentioned compilers a few times in previous blog posts, this is because they are a core part of understanding how computers are able to function and utilize the code we actually write. As I have said before, compilers are programs which convert the code in other programs into assembly language and machine code which can be read and executed by the computer itself. However whilst it is fairly obvious why we need compilers, most people do not understand how compilers actually go about doing this in a practical sense, which is what I will attempt to cover in this post today.

Ultimately a compiler takes in source code which is written in a high-level language and converts it to machine code. However this is a non-trivial exercise as unlike with assembly code there is no easy one to one conversion between high-level languages and machine code. Therefore many stages have to be individually performed by the compiler before outputting anything useful. Notably due to the nature of most coding languages allowing referencing to various sections of code throughout a program, often a compiler will have to pass over the code multiple times to accurately understand what is happening.

It is also important to understand that compilers do not exist in a vacuum. They are actually just one part of the overall language processing system. In general after a compiler has actually been through the code and produced the assembly language version of the original program, there are still a few stages that remain before the code itself is actually usable:

  • An assembler is required to convert the assembly code into machine code which will actually be readable by the computer. This process is minimal in effort but still notable as most compilers will output in assembly rather than machine code.
  • A linker is a very important part of this process as it collates together the various files associated and referenced within the program and makes sure they are all present and accessible. It will generally attempt to collect this data into a single executable file where possible. It also deals with memory allocation and ensuring that memory is available.
  • The loader loads the program into the machine's available memory and calculates the size of the program itself. It is generally a part of the operating system, which means the surrounding parts of the system need to be able to communicate with it and account for the specific demands of the loader.
This shows that the actual process of getting any code to run on a machine extends beyond the compiler itself. This is a relevant consideration when thinking about the optimisation of running code through any machine. It is even more interesting when you consider that each of these additional processes, the linker, the loader and the assembler each have to at some point have been compiled in some form or another. This means that somewhere in the past they were all written in machine code of some form, likely built up over time as I described in my last blog post. My next blog post will go into greater detail about compilers themselves, which are the most complex part of this language processing system. I will look into breaking down the various phases the compiler goes through and why each of them is important.

References
1. Bolton D. What is a Compiler? [Internet]. About.com Tech. [cited 20 October 2016]. Available from: http://cplus.about.com/od/introductiontoprogramming/p/compiler.htm
2. Compiler Design Tutorial [Internet]. www.tutorialspoint.com. [cited 20 October 2016]. Available from: https://www.tutorialspoint.com/compiler_design/index.htm

Thursday, 6 October 2016

Technical Blog 2- Assembly and Assemblers

Introduction
Assembly is at its simplest the first computer programming language designed to be written by people. It generally translates to machine code on a one for one basis with very simple compilers called "assemblers" being used. The primary advantage of assembly over directly writing in machine code is the lack of requirement for remembering lots of long numerical codes for each action to be performed. Instead, assembly has a series of simple words which each correspond directly to a line of machine code. For example:
mov [2352], 245
For some assembly variants this small snippet of code tells the computer to transfer the number 245 to the memory location 2352. This is much more readable than the binary equivalent which is a simple string of ones and zeros. However this does translate directly to an operation in machine code which makes it very simple to assemble. This contrasts to the high-level programming languages people are used to today where the logic being written is far removed from the machine operations being performed and required large amounts of complex compiling to become readable by a computer.

The main advantage then that assembly has over machine code is a much greater ability to be written and read directly by people. The key though is that there is minimal downside. Assuming an assembler is available, you do not lose any of the control gained from programming directly in machine code which enables you to develop the incredibly tight and efficient programs associated with direct memory manipulation as discussed in my first post. This means that in any practical sense if that was desired, assembly would likely be as low-level as anyone would want to go.

History

Assembly of various forms was historically very quick to follow the development of computers. Even back when people  first began using computers machine code was considered finicky and highly impractical. The question that springs to my mind when considering this is how the first assemblers were created as surely they must have been written in machine code of some form. It turns out that the earliest assemblers were themselves written in assembly, they were just translated by hand into machine code. Due to the relative simplicity of translating assembly to machine code on a one to one basis, hand conversion was common when computers were such that machine code had to input directly, through punch cards or similar. This meant that programming time was split between writing the code in assembly and then translating it word for word into binary.

In the modern world of course, assemblers are well developed and commonly used to perform the final translations in a compiler to actually output the bit code to be read by the computer. In general these compilers are written in languages for different operating systems or processors than the code being assembled. This is known as cross-compiling and enables quick access from high-level languages to machine code without having to build up the assembler in machine code which often takes many iterations in which the next level of assembler has to be converted by the current assembler until an actually useful language is built up over time.

References
1. Landley R. Introduction to cross-compiling for Linux [Internet]. Landley.net. [cited 6 October 2016]. Available from: https://landley.net/writing/docs/cross-compiling.html

2. Fomin A. Introduction to Assembly Language [Internet]. Swansontec.com. 2001 [cited 6 October 2016]. Available from: http://www.swansontec.com/sprogram.html