Friday, 4 November 2016

Technical Blog 4- More Compilers

In this post I am going to continue on directly from my previous post and continue to talk about the inner workings of compilers. Specifically today I will describe each of the phases of the actual compilation of the code and how they relate to each other.

Error Handler
It is worth mentioning that much of what a compiler ends up doing is checking for errors within the code, most of the stages in the process are able to identify errors of various kinds in the code. When error s do occur, depending on the severity the compiler will either continue or will fail and feedback to the user. It can be thought of as a side process of the compiler that is the error handler which receives errors from the various stages and ends the compilation process. This phase is connected to all the other phases and is not necessarily used at all during a successful compilation.

Lexical Analysis
This is the first stage of compiling, it involves evaluating the source code at its simplest level, It goes through the code and divides statements up based on the presence of white-space or some other divider. It then outputs these as tokens to the syntactical analyser. This process effectively removes spaces from the code and separates out each relevant item for example in the statement "x=4", '4', 'x' and '=' would each be a separate token.

Syntactical Analysis
This stage of the compiling process takes in the tokens from the lexical analysis and checks the grammar of the statements based on a predefined grammar for the language being compiled. It checks the validity of statements as a whole assuming the tokens themselves have been validated by the lexical analysis. In addition to checking for grammatical errors this stage outputs a parse tree which can then be used by the semantic analyser. The parse tree is made up of all of the tokens that were provided by the lexical phase, they are positioned based on the grammar of the language and a series of production rules.

Semantic Analysis
Semantic analysis is the phase were the actual usability of the code begins to be evaluated. Whereas previous phases were not concerned with the functionality of the code, merely the structure of it, semantics checks things like appropriate variable assignments and usage of reserved identifiers. For example the statement "int x = G" would not produce any errors in lexical or syntactical analysis, but would be picked out as an error by semantics as G is not an integer. This phase also checks that variables are properly defined if necessary depending on the language.

Code Optimization
This stage is the first where errors are not really checked for anymore. It is assumed that most errors have already been picked up by this point. Instead it begins the process of actually converting the code into a more machine-friendly language. This intermediary code is then evaluated for efficiency and optimization. That is to say it removes unnecessary duplicity and optimises memory usage where possible. This stage is one of the main differences between various compilers for the same language, as it contains the most room for actual improvement. This stage should help to speed up the program's execution and reduce memory usage. Frequently compilers will allow the user to choose a level of optimisation for their code which can drastically affect compilation time so can be useful for debugging and similar.

Machine Code Generation
This stage finally generates some actual machine code from the intermediary code used for optimisation. Frequently this will actually be assembly code which can then be rapidly assembled into machine code but either way it is still usable. It is often machine independent at this stage, that is to say it is designed for a specific machine code language but not optimised for use on the specific machine. It is only after this has been created that it is optimised for the machine that is currently being used. This enables the code to run as fast as possible whilst also having a relocatable stage which can allow it to run on similar but not identical processors.

So ultimately it is clear that compilers are highly complex pieces of software with many different layers and phases. There is also the potential for a variation in quality of compilers for the same language. What this actually shows is that high-level languages are remarkable in their ability to enhance human interaction with computers relative to the original interaction which involved hand-written assembly and machine code. 

References
1. Bolton D. What is a Compiler? [Internet]. About.com Tech. [cited 2 November 2016]. Available from: http://cplus.about.com/od/introductiontoprogramming/p/compiler.htm
2. Compiler Design Tutorial [Internet]. www.tutorialspoint.com. [cited 2 November 2016]. Available from: https://www.tutorialspoint.com/compiler_design/index.htm