How C++ Works: Understanding Compilation

Publicado el - Última modificación el

Developers are often vague about what happens behind the scenes when a compiler creates C++ source code. Google “how does a compiler work in C++?” and you'll find a lot of jargon that doesn't clear it up for someone new to the subject.

Here's a simple breakdown of the process. A compiler converts source code (which you've written) to an object code the machine can read. This machine-readable code is usually made up of zeroes and ones since it's in binary form. Now the compiler can run the program as an executable file.

In reality, the process is much more complex. Learn about the nuts and bolts to avoid some common errors and bugs that are born from ignorance of the details of compilation.

If you are having trouble with C++ code you have written, or someone else has written for you, you can always find expert advice on Freelancer.com. Knowledgeable C++ certified pros will find the bugs in your code and clean them up for you. In the meantime, understand how compilation works.

First Step: Preprocessing

When you write a source code file in C++, you include header files with extensions .h, .hxx, or .hpp, and sometimes with no extensions. You use the directive #include to mark a header file. The source file usually has the extension .cc, .cxx or .cpp.

In the first step of compilation, the compiler sends the code to a preprocessor. Now, a preprocessor is simply a directive that starts with #. So, #define, #include, #if, #then, #else and #line are some of the preprocessors with which the compiler interacts.

Let's look at #define as a representation of what happens next. The operator # tells the compiler to carry out logical or mathematical manipulations. With #define, the compiler is told to create a symbolic constant (which is called a macro). It's usually used in the format:

#define macro-name replacement-text

When this line appears in a source file, all the places in which 'macro' appears will be replaced by the 'replacement-text' before compilation.Here's an example code:

#include <iostream>

using namespace std;

#define PI 3.14159

int main()

{
    cout << "Value of PI:" << PI << endl;
    return0;
}

In the preprocessing stage, the file will read like this:

int main()

{
    cout << "Value of PI:" << 3.14159 << endl;
    return 0;
}

You can see the preprocessed source code by passing -E to the g++ compiler, in the following way, where test.p is the name we've given to the preprocessed source file.

$gcc - E test.cpp > test.p

In this way, the compiler runs the preprocessor on each C++ source file. When it comes across #include, the preprocessor searches for the specified header file to include in the compilation.

At this stage, the preprocessor also takes a look at conditional compilation blocks such as #ifdef, #ifndef, #endif, and removes code that won't be needed. These conditional directives let you include or discard a part of a program if a specific condition is met.

Overall, in the preprocessor stage, the source code file is temporarily expanded to prepare for compilation. This file has a greater number of lines that your simple source code. You can print this preprocessed file on stdout. Header files add bulk to the code. The more header files you include, the longer the preprocessed file will be.

The preprocessor also adds some markers on the code to tell the compiler where each line came from. This helps to produce error messages that make sense to you.

Step Two: Compilation & Assembly

The next stage of compilation in C++ is very similar to what happens in C. The compiler takes each output from the preprocessor and creates an object file from it in two steps.

First, it converts the pure C++ code (without any # directives) into assembly code. Assembly code is binary code that we can read.

Sometimes, it can be useful to read assembly code. It is the stage in which the compiler optimizes the source code - and does a better job of it than humans do. Let us look at how compilation works through an example.

#include "print.hpp"

int main(int argc, char* argv[])

{
    printSum(2, 3);
    pringSum(2.5f, 3.5f);
    printSumInt(4, 5);
    printSumFloat(4.5f, 5.5f);
    return 0;
}

Format! Style:

C++ online code formatter © 2014 by KrzaQ

Powered by vibe.d, the D language and clang-format

Compile this code to get the cpp-main.o object file, and look at the imported and exported symbols. It looks something like this:

$ g++ - c cpp - main.cpp

                    $ nm
    - C cpp - main.o

              0000000000000000 T main

                  U printSumFloat

                      U printSumInt

                          U
                          printSum(float, float)

                              U printSum(int, int)

Here, you can see that the compiler exports the function main, and four other sum functions, which are symbols. In the above example, the compiler has created the object code from the source code. The object code contains the symbols that the input defined.

Note that object files can also refer to symbols that the source code hasn't defined. When you use a declaration but don't provide a definition, this is what happens. The compiler will still be able to produce an object file from the source code.

The compiler points out failed overload resolution errors, syntax errors, and other compiler errors at this stage.

Step 3: Assembly

Next, the assembler converts the assembly code into bit code, line by line. The output of this stage is a binary file in format COFF, ELF, a.out and similar. You can always stop compiling at this point, which is a useful feature since you can compile each code separately.

You can put every object file that you get out of this process into archives called static libraries. Later, when you want to use these object files, you can simply pull them out of your libraries without having to recompile all the source files if you only change one file.

Step Four: Linking

When you link and run the code that we've used as an example in step two, you get the result:

$ g++ - o cpp-app sum.o print.o cpp - main.o

$. /cpp-app
2 + 3 = 5

2.5 + 3.5 = 6

4 + 5 = 9

4.5 + 5.5 = 10

You would not get the result without linking the object files that the assembler produced in the previous stage. It is the job of the linker to produce either a dynamic (or shared) library or an executable file. Let's take a look at each of these outputs.

Shared or dynamic libraries have nothing to do with static libraries, which we spoke about before. Static libraries are archives of object code linked with an end-user application, that can become a part of an executable.

Dynamic libraries are modules that contain data and functions that can be used by another application. The linker links all the object files by replacing all the references to undefined symbols with their correct addresses. Each symbol can be defined in other libraries or in object files. If the symbols are defined in a library that is not a standard library, then you need to let the linker know about it.

The stage of linking may also produce some errors. These errors are typically related to duplicate or missing definitions. Missing definitions are not only definitions that you didn't write; a definition could also be missing if you haven't given the linker any reference to the library or the object file where it could find the definition. Duplicate definition errors occur when two libraries or object files contain the definition of the same symbol.

These are the stages that compilation takes your code through. There are more complexities in the process that we don't have space for here, but knowing how compilation is done can help you prevent some weird bugs in your code. For instance, understanding preprocessing will help you make good use of header guards. Header guards are snippets of code you can use to protect the header file contents from multiple inclusions.

Header guards can be placed using three pre-processor directives in a header file. You can place two of these at the beginning of a file, in the formats:

#ifndef MY_HEADER

#define MY_HEADER

These two lines follow each other at the top of the code. At the end of the file, you place the line:

#endif /*MY_HEADER*/

Here, the unique, user-defined symbol MY_HEADER serves as a marker. When the pre-processor comes across the symbol for the first time in a piece of code, the #ifndef is true, since the symbol is not defined. The pre-processor includes the code between the first and last lines of the header guard and sends it to the compiler.

There are other complexities in a compilation that will be useful to know. Do you think understanding the process of compilation in depth will help you write better code? Let us know in the comments!

 

Publicado 31 agosto, 2017

LucyKarinsky

Software Developer

Lucy is the Development & Programming Correspondent for Freelancer.com. She is currently based in Sydney.

Siguiente artículo

Twitter Data Mining: A Guide To Big Data Analytics Using Python