In this post I am going to continue on directly
from my previous post and continue to talk about the inner workings of
compilers. Specifically today I will describe each of the phases of the actual
compilation of the code and how they relate to each other.
Error Handler
It is worth mentioning that much of what a compiler
ends up doing is checking for errors within the code, most of the stages in the
process are able to identify errors of various kinds in the code. When error s
do occur, depending on the severity the compiler will either continue or will
fail and feedback to the user. It can be thought of as a side process of the
compiler that is the error handler which receives errors from the various
stages and ends the compilation process. This phase is connected to all the
other phases and is not necessarily used at all during a successful
compilation.
Lexical Analysis
This is the first stage of compiling, it involves
evaluating the source code at its simplest level, It goes through the code and
divides statements up based on the presence of white-space or some other
divider. It then outputs these as tokens to the syntactical analyser. This
process effectively removes spaces from the code and separates out each
relevant item for example in the statement "x=4", '4', 'x' and '='
would each be a separate token.
Syntactical Analysis
This stage of the compiling process takes in the
tokens from the lexical analysis and checks the grammar of the statements based
on a predefined grammar for the language being compiled. It checks the validity
of statements as a whole assuming the tokens themselves have been validated by
the lexical analysis. In addition to checking for grammatical errors this stage
outputs a parse tree which can then be used by the semantic analyser. The parse
tree is made up of all of the tokens that were provided by the lexical phase,
they are positioned based on the grammar of the language and a series of
production rules.
Semantic Analysis
Semantic analysis is the phase were the actual
usability of the code begins to be evaluated. Whereas previous phases were not
concerned with the functionality of the code, merely the structure of it,
semantics checks things like appropriate variable assignments and usage of
reserved identifiers. For example the statement "int x = G" would not
produce any errors in lexical or syntactical analysis, but would be picked out
as an error by semantics as G is not an integer. This phase also checks that
variables are properly defined if necessary depending on the language.
Code Optimization
This stage is the first where errors are not really
checked for anymore. It is assumed that most errors have already been picked up
by this point. Instead it begins the process of actually converting the code
into a more machine-friendly language. This intermediary code is then evaluated
for efficiency and optimization. That is to say it removes unnecessary
duplicity and optimises memory usage where possible. This stage is one of the
main differences between various compilers for the same language, as it
contains the most room for actual improvement. This stage should help to speed
up the program's execution and reduce memory usage. Frequently compilers will
allow the user to choose a level of optimisation for their code which can
drastically affect compilation time so can be useful for debugging and similar.
Machine Code Generation
This stage finally generates some actual machine
code from the intermediary code used for optimisation. Frequently this will
actually be assembly code which can then be rapidly assembled into machine code
but either way it is still usable. It is often machine independent at this
stage, that is to say it is designed for a specific machine code language but
not optimised for use on the specific machine. It is only after this has been
created that it is optimised for the machine that is currently being used. This
enables the code to run as fast as possible whilst also having a relocatable
stage which can allow it to run on similar but not identical processors.
So ultimately it is clear that compilers are highly
complex pieces of software with many different layers and phases. There is also
the potential for a variation in quality of compilers for the same language.
What this actually shows is that high-level languages are remarkable in their
ability to enhance human interaction with computers relative to the original
interaction which involved hand-written assembly and machine code.
References
1. Bolton D. What is a Compiler? [Internet]. About.com Tech. [cited 2 November 2016]. Available from: http://cplus.about.com/od/introductiontoprogramming/p/compiler.htm
2. Compiler Design Tutorial [Internet]. www.tutorialspoint.com. [cited 2 November 2016]. Available from: https://www.tutorialspoint.com/compiler_design/index.htm
1. Bolton D. What is a Compiler? [Internet]. About.com Tech. [cited 2 November 2016]. Available from: http://cplus.about.com/od/introductiontoprogramming/p/compiler.htm
2. Compiler Design Tutorial [Internet]. www.tutorialspoint.com. [cited 2 November 2016]. Available from: https://www.tutorialspoint.com/compiler_design/index.htm