Cpp tutorial
What is the aim of this tutorial?
The aim of this tutorial is to provide you with an introduction to the C++ programming language. I assume that you have some background in programming, so I will skip topics like boolean logic. I'll also assume that you are not coming from a computer science background. We'll skip topics such as references and pointers, which may be the subject of a future workshop. Instead, we will focus on the core components involved in writing a C++ program.
By the end of this tutorial, you will be able to do the following:
- Understand data types: You will understand some of the different data types in C++ and when you might want to use one over the other.
- Understand the structure of C++ code: You'll understand how to structure your code and learn some of the best practices.
- Understand loops and statements: You'll understand how
for
andwhile
loops work. We'll also touch ondo while
loops. We'll also learn how to control flow usingif
-else if
-else
statements. - Use arrays and vectors: We'll look at how to create arrays and vectors. We'll learn about some best practices for memory safety.
- Functions: You'll understand how to write reusable code using functions. We'll learn the difference between a value and a reference when dealing with functions.
- Parsing files and strings: You'll understand how to print to the command line and to files. You'll learn how to read text files, parse them, and extract useful information.
How to run this tutorial
The code used in this tutorial is available on GitHub. I strongly recommend checking out the exercise branch and trying to work along with the notes here. Try to answer the problems before reading the solutions. If you get stuck, you can always check out the solutions branch.
To run this tutorial, you'll need g++
or clang++
installed. These can be installed through the gcc
or clang
packages. Instructions on how to install these can be found here. Alternatively, you can compile all the examples using the GCC Docker image. You can pull the latest Docker image using:
docker pull gcc
Simplifying Docker Setup
To save time and streamline your Docker setup, consider pre-downloading the necessary image. Once you have the image, you can easily create a container to compile and execute your code. Here's how to do it:
-
Pull the Docker Image:
Use the following command to pull the
gcc
image:This command downloads the image and stores it locally on your system.docker pull gcc
-
Create a Docker Container:
Now, you can create a Docker container for compiling and running your code. Use the following command:
docker run -it --rm -v $(pwd):/data -w /data gcc bash
Let's break down this command:
- `docker run`` initiates a new container.
-it
specifies that you want an interactive shell.--rm
automatically removes the container when you exit.-v $(pwd):/data
mounts the current directory into the container at the location /data.-w /data
sets the working directory to /data within the container.
What is C++
C++ is one of the most influential programming languages today, known for its exceptional performance. It empowers developers to create memory-efficient code that often outperforms native Python in terms of speed. Whether you are delving into the world of IoT or tackling high-performance computing, C++ remains a top choice for projects that prioritize efficient memory utilization and speedy execution.
Key Differences from Python
Coming from a language like Python, you will immediately notice several differences when working with C++:
-
Compiled Language: Unlike Python, which can be executed through an interpreter, C++ is a "compiled" language. This means that before running C++ code, it must be compiled. Compilation takes code that is relatively readable to humans and transforms it into low-level machine code. One of the benefits of a compiled language is that the compiled program can be highly optimized at compile time, resulting in high-performing code.
-
Statically-Typed Language: Unlike Python, where variable types are determined at runtime, C++ requires knowing the type of a variable at compile time. Python is a "dynamically-typed" language, allowing you to work with variables without specifying their type in advance. In C++, specifying the type during compilation may seem limiting, but it enables the compiler to optimize the executable for better performance.
-
Manual Memory Management: In Python, the interpreter regularly pauses execution to check which variables are no longer within scope and to free up memory, thanks to the "garbage collector." While this automated memory management is convenient, it incurs a significant performance overhead. In C++, when a variable goes out of scope, its "destructor" is typically called. The destructor is a function that handles memory cleanup. However, in C++, developers need to be mindful because they can allocate memory dynamically using
new
oralloc
. Such allocated memory must be explicitly released usingdelete
ordealloc
to avoid memory leaks. -
Limited Memory Safety: In Python, attempting to access an element beyond the boundaries of an array, such as the 11th element in a 10-element array, results in an error. In C++, however, such boundary violations allow access to memory that should not be touched. This leads to completely undefined behavior.
-
Thread-Friendly: Python imposes the "Global Interpreter Lock" (GIL), restricting the execution of CPU-bound tasks to one at a time, regardless of the available cores. This is a memory safety feature aimed at avoiding data races. In C++, there is no such constraint. Developers can create a nearly infinite number of logical threads, and the code can utilize the available CPU cores for temporally concurrent threads. Memory safety and data race avoidance are managed using atomic types or barriers (for more details, see Introduction to Parallel Programming in C++ with OpenMP).
These differences reflect the unique strengths and characteristics of C++, making it a powerful and versatile programming language.
Why use C++?
C++ allows us to write fast and memory efficient code that outpaces Python on most metrics. Being a compiled language, we can pass pre-compiled binaries to end users, providing highly optimized executables ready for use. The low foot print makes C++ ideal for devices such as microcontrollers (e.g. Arduinos), FPGAs and any memory limited devices.
C++ is very common in scientific programming. Complex memory intensive code (such as scientific simulations), often require huge resources. C++ naturally fufills these requirements, with code that easily scales.
The Anatomy of a C++ program
Hello World
Like all good tutorials, we start with the basic "Hello World" example:
hello_world.cpp | |
---|---|
1 2 3 4 5 6 7 |
|
Here we have an example code that prints "Hello World" to the terminal. Let's talk through this line-by-line.
-
#include <iostream>
Here we are "including" an external library to our code. We're using the library "iostream", a library that allows us to use functions relating to input and output.iostream
is part of the C++ standard library, a vast set of code that we can build upon. -
int main() {...}
Here we are defining ourmain
function. Executables should have amain
function. This tells the complier that this is the entry point to the program. -
std::cout << "Hello World" << std::endl;
Here we are usingstd::cout
to print a message to the screen. We are specifying the standard name spacestd
. We pass the string "Hello World" tostd::cout
using<<
. We then pass an additional arguementstd::endl
.endl
passes the "end line" command to thecout
essentially terminating the line. Finally we end the line with;
. -
return 0
Here we are ending ourmain
function by return 0. A program will return an exit code when it terminates. Exit codes tell the user about how the code finishes. An exit code of 0 means the code terminated successfully. We could return any number we want, but 0 typically means a successfull termination.
We can compile this code using either g++
or clang++
:
g++ hello_world.cpp -o hello_world
or
clang++ hello_world.cpp -o hello_world
Here the first argument hello_world.cpp
is the source code we want to compile. We specify the target binary with the -o
flag. We name the output executable hello_world
. We can run our code using:
./hello_world
Data Types
In C++, understanding different data types is crucial due to its static typing nature. While the actual size of these types may vary depending on the compiler and system, here are common data types:
int
: Used for integer numbers (e.g., -1, 0, 23). Typically, it occupies 4 bytes or 32 bits.bool
: Represents Boolean values, eithertrue
orfalse
, allowing for Boolean arithmetic.float
: Used for non-integer numbers (e.g., -0.2, 43.4, 12.0) with 32-bit precision.double
: Similar tofloat
but with double the precision (64 bits).char
: Represents a single character or small integer value and typically occupies 8 bits (1 byte) of memory.
Additional data types include:
unsigned int
: Integer numbers that are unsigned, expanding the maximum possible value of the integer.short int
: Signed integers with half the size (2 bytes or 16 bits).long
: Allows storage of larger numbers than standardint
, typically occupying 8 bytes or 64 bits.
Beyond these basic types, you will also encounter during this tutorial:
string
: A sequence of characters stored as an array ofchar
elements, used to represent text or character data.fstream
: Used for file input and output operations, enabling reading from and writing to files.
data_types.cpp | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
|
On line 3 we are including the limits
library. We are using the std::numeric_limits<T>::min()
and std::numeric_limits<T>::max()
method to get the minimum and maximum value of a data type T
. Don't worry too much about the sytax here, they indicate that we're using a template, allowing us to appy these functions to different types of data types.
When we compile this and look at the output we see:
integer a = 42
Minimum value for int: -2147483648
Maximum value for int: 2147483647
float b = 3.1415927410125732422
double c = 3.141592653589793116
Full 3.14159265358979323846
Minimum value for float: 1.175494350822287508e-38
Maximum value for float: 3.4028234663852885981e+38
Minimum value for double: 2.2250738585072013831e-308
Maximum value for double: 1.7976931348623157081e+308
We can determine the minimum and maximum values for int
, float
, and double
. It's evident that when using a float
instead of a double
, precision is reduced. In the example above, the value of \(\pi\) is accurate to the 6th decimal place for float
, but extends to the 15th decimal place for double
. This difference becomes significant when a high degree of precision is required.
Namespaces
In C++, namespaces serve as a way to organize and group related code elements, including variables, functions, and classes, into distinct logical scopes. This organization helps prevent naming conflicts and enhances the modularity of your code.
In our examples, we will primarily rely on the "standard" C++ library. It's beneficial to specify which namespace we are using to save us from having to write it out every time.
Consider the hello_world.cpp
example, where we can streamline the code by explicitly indicating the namespace we're using:
hello_world.cpp | |
---|---|
1 2 3 4 5 6 7 8 9 |
|
By employing using namespace std;
, we eliminate the need to specify that we are using cout
from the std
namespace. The compiler will automatically assume the std
namespace.
While the examples we use here may not involve multiple namespaces, it is generally considered good practice to be explicit and specify the namespace of the elements you are using. This helps avoid issues when working with libraries that may contain classes or functions sharing the same names.
Scopes
In C++ we can make use of multiple "Scopes". You can think of scopes as blocks of code. Within these blocks we can define variables, perform operations, enter nested scopes, etc. In general, when we exit a scope any objects created within that scope will have their destructor
called once we exit the scope (with the execption of objects created with new
or alloc
). Objects native to a scope cannot be accessed from outside that scope, but objects within a scope can be accessed from within a nested scope. We can define a scope in C++ by wrapping a body of code between curley brackets.
scopes.cpp | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
|
On line 3 we define a new scope associated with the main
function, this scope ends on line 40. We define integers a
and b
in the main
scope. We created a new scope on line 13 which spans until line 28. This scope is "nested" within the main scope. Within this scope we define the integer a
on line 19. What we are doing here is known as "shadowing" we are borrowing the use of the name within this scope. When we exit the scope the varable a
reverts back to what it was outside of the scope, this is evident on line 30. On line 22 we define an integer c
within the scope. This variable is local to the nested scope, but inaccessible to the parent scope (the main
scope). When we exit the nested scope the destructor
is called and the variable is dropped. If we uncommented line 36 we would get an error.
Scopes are general blocks of code, we can modify the behaviour of the scope using loops and control access to the scope using statements. We'll learn about these in the next section.
Flow Control
In this sections we'll look at how to control the flow of our code using for
, while
and do while
loops and control access to scopes using if
-else if
and else
statements. We won't touch goto
and neither should you. We may briefly discuss switch
.
For loops
for
loops are used when we have a loop with a known number of iterations or a regular pattern we want to iterate over. In C++ they are defined as follows:
for ( initialization ; termination ; iteration){
// block of code to be ran
}
The for
loop has 3 fields each separated by a ;
. You can think of them as the following:
- initialization : What we want to initialize at the start of the loop. This could be variables that we want to modify within the loop. For example int i = 0;
. We can also initilize multiple variables, for example int i = 0, j = 5;
. In this case we are defining i
within the scope of the for
loop, but j
would be local to the parent loop.
- termination : The condition on which we will terminate the loop. This needs to be a boolean. For example if we want to exit the loop when i passes a certain value: i < 10
. While i < 10
this will be true, however when i == 10
, this becomes false so we exit.
- iteration : This field allows us to change something at the end each iteration of the loop. So for example if we want to increase the value of i
by 1 we could use i++
Putting these together we can write a for loop, to loop from 0-9 as follows:
1 2 3 4 |
|
i = 0
we pass through the loop. When we reach the end of the loop on line 4, i++
will run and we will restart at the beginning of the loop with i = 1
. The loop will check if 1 < 10
, find true
and continue running the loop. This will contiune until i = 9
, when we finish this iteration of the loop we will apply i++
, so now i = 10
. When we try to reenter the loop the 10 < 10
returns false
and we exit the loop.
You can create an infinite loop by not passing a termination condition.
// This code will run forever!
for (int i = 0; ; i++){
// block of code
}
Creating a for loop using a range
Since C++ 11 (the 2011 update of C++), one can loop over ranges and arrays using the following format:
1 2 3 4 |
|
While loops
In C++, while
loops iterate over a block of code as long as a specified condition is true. The basic format of a while
loop is as follows:
1 2 3 |
|
false
or 0
is considered true
. For instance, while (1)
and while (1 < 10)
are equivalent to while (true)
. Even conditions like while ("apple")
are treated as true.
Here's an example of a while
loop that counts from 0 to 9:
1 2 3 4 5 |
|
You can also create infinite loops by providing a condition that is always true
:
1 2 3 4 5 |
|
In the latter example, the loop will continue indefinitely until manually terminated, often by sending a kill command to the program (e.g., Ctrl + C
).
To include a condition within the while
loop and control when it should exit, you can use the break
statement. Here's an example:
while_with_break.cpp | |
---|---|
1 2 3 4 5 6 7 8 |
|
i = 0
, prints the value of i
, increments i
by one (i++
), and checks if i
is greater than 9. After 10 iterations (0 through 9), the i > 9
condition becomes true
(10 > 9
), triggering the break
statement, which exits the loop.
There is another type of loop known as a do while
loop. This is very similar to a while
loop, except the condition is check at the end of the scope, rather than at the start of the scope.
Controlling flow with if
-else if
-else
You can use if
, else if
, and else
statements to control the flow of your program, directing it to different branches or code scopes based on conditions. Here's the format:
1 2 3 4 5 6 7 8 9 |
|
In this example, the program checks conditions one by one. If condition1
is true, it enters the scope of the first if
. If it's false
, it checks condition2
, and so on. If none of the conditions are true, it enters the else
block, which serves as the default code. Keep in mind that you always need to start with an if
statement, but you can omit else if
or else
as needed. In the while_with_break.cpp
example, only an if
statement was used.
The order of the else if
statements are also important. If condition2
and condition3
are both true
we will only enter the first block corresponding to condition2
.
Here's a more detailed example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
|
In this code, the %
operator is used for the modulo operation, which returns the remainder of a division operation. i % 2
will return 0
if the number is even or 1
if the number is odd. We increment the value of i
by one (line 13) and use a second set of if
statements (lines 15-21). The first if
provides a break if i > 10
, while the else if
allows us to skip to the start of the next iteration using a continue
command if i is 3. The else
block prints a message if neither of these conditions is true
.
This code will give us the following output:
0 is even!
Back to the start!
1 is odd!
Back to the start!
2 is even!
3 is odd!
Back to the start!
4 is even!
Back to the start!
5 is odd!
Back to the start!
6 is even!
Back to the start!
7 is odd!
Back to the start!
8 is even!
Back to the start!
9 is odd!
Back to the start!
10 is even!
When using if
-else if
-else
we can have an infinite number of else if
conditions, but only one if
and at most one else
.
Controling Flow with a Switch
We can also use switch
statements to control flow.
Switch statements are a useful method for controlling the flow of code. They are particularly effective for handling errors and parsing parameters when the potential outcomes are well-known.
A switch
statement can be implemented with the following syntax:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
Here, the value
is a parameter we want to test within the switch
. When using a switch
, we test specific cases (case
). For example, in the case value_1
branch, we are testing if value == value_1
. If this is true, then the block of code within that branch will be executed. Notice here that a break
statement is used at the end of the block of code. This prevents the other branches from being executed by break
ing from the switch
construct.
We can have any number of case
statements within the switch
.
You'll notice that there is a special condition called default
.
This block will execute if reached regardless of the value.
It specifies the "default" behavior of the code.
For example, we can write a similar block to our if
statement example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
|
Here we are using two switch
statements.
The first, on line 5, takes i % 2
, which will be 0
if i
is even and 1
if i
is odd.
The second, starting on line 16, takes the argument i
.
Here we're checking the actual value of i
.
On lines 17 and 23, we are checking if i
lies within a range.
In C++, we can do this with the lower ... higher
syntax.
Note this is explicitly 0 ... 3
, not 0...3
.
On line 28, we are using a return
to exit not just the switch but also the while
loop.
This is because running a break
from within the switch
would only break from the switch
and not the outer while
loop. This would output the following:
0 is even!
1 is between 0 and 3
1 is odd!
2 is between 0 and 3
2 is even!
3 is between 0 and 3
3 is odd!
4 is exactly 4
4 is even!
5 is between 5 and 7
5 is odd!
6 is between 5 and 7
6 is even!
7 is between 5 and 7
7 is odd!
8 greter than 7
Let's consider a more useful example. Let's say we have a device that we are controlling with C++.
The device can have one of three states: on, off, or standby.
We can define an enum
to handle these three options.
1 |
|
An enum
is very useful when considering a fixed number of possible options.
The option to use a human-readable enum
can also help with debugging by providing a human-readable status (e.g., Status::Standby
) instead of an error code that might not make sense within the correct context (e.g., 2
).
Let's now define a function to get a random status. This will emulate a device that we want to interface with:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Status
type (which we defined before as an enum
).
On lines 2 and 3, we initialize a random number generator and get a random number between 1 and 3.
We then use this random number to return a status.
When the random number is 1, we return Status::On
.
When the random number is 2, we return Status::Off
.
The default
case here is Status::Standby
.
We can imagine that the get_status
function is actually part of the API for some device.
Let's say it is a readout device; when the status is On
, it is powered on.
When the status is Off
, either the device is off or it cannot be reached.
When the status is Standby
, it is awaiting instructions.
Let's assume we send a power-on command with some function activate_device
. For now, let's take:
void activate_device(){
sleep(1);
}
In our workflow, it would be important to wait until the device has completely powered on and is in Standby
mode before we send instructions.
We could write something like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
|
On line 3, we send the power-on command to our device.
On line 5, we define a Status
object called stat
.
On line 6, we use a while
loop to repeatedly check the status of the device.
On line 8, we retrieve the status of the device.
On line 10, we use a switch
statement to match the stat
to the possible values of that enum
.
If the device is either Status::On
or Status::Off
, we sleep for 1 second and then restart the loop.
If the status is Status::Off
, we also resend the power-on command with the activate_device
function.
Finally, if the status is Status::Standby
, we can continue with our program.
Here is an example output from such a program:
Device is On
Device is Off
Device is Off
Device is On
Device is On
Device is Off
Device is On
Device is in Standby
Storing Data
Arrays
We can define arrays of a data type in C++ using the following format:
1 2 3 4 5 |
|
If we do not know the size of an array at compile time, we may dynamically
allocate memory for the array:
1 2 3 4 5 6 |
|
my_array
using the new
keyword. When we do this we are dynamically allocating memory on the heap. The size of the region is going to be enough to hold n
integers. The int *my_array
syntax is important in C++. We have defined a "pointer" to an array of integers. When we define a new array of values it is important to delete
the data once we're finished with it. This is done on line 6. If we fail to do so, we will end up with a memory leak, causing our memory usage to rise over time.
Aside on pointers
We may discuss pointers later, but essentially, we can think of them not as the actual array itself, but rather as signposts to where the array is stored.
1 2 3 4 5 6 7 8 9 10 |
|
The Value of a is: 42 or by pointer 42
The address of a is: 0x7ffee5f38334 or by pointer 0x7ffee5f38334
The Value of b is: 42 and its address is 0x7ffee5f38330
On line 1, we define the integer a
. On line 2, we set b
to be equal to a
. On line 3, we create a pointer a_pointer
and assign it to &a
. Here, &`` represents that we're passing a reference to
arather than the value of
a. The reference can be thought of as the actual location in memory of
a`.
From the output, we see that a
, b
, and a_pointer
all return 42
. However, the address of a
and b
is different. While the values are equal, they are not the same. However, the address returned by &a
and a_pointer
is identical. This is because a_pointer
is pointing to the spot in memory that a
occupies.
When working with arrays we need to be careful when accessing elements of the array as C++ doesn't offer protection about going out of range:
1 2 3 4 5 6 |
|
Vectors
Vectors offer a safer way to store data, and they can be defined as follows:
1 2 3 4 5 6 7 8 9 10 |
|
Vectors provide safety and have a known length, which can be obtained using:
1 2 3 4 |
|
You can dynamically add values to the vector using push_back
:
1 2 3 4 5 6 7 8 9 |
|
You can also change the size of the vector with:
1 2 3 4 5 6 7 8 9 |
|
Vectors provide a memory-efficient way to store arrays of data. When a vector
is removed, either by deletion or leaving scope, the destructor of the vector's elements is called, freeing the memory so that you don't need to manage it.
Functions
So far, we have used a single function, the main function. In C++, main is a special keyword that designates the entry point for our code. Let's take a closer look at the main function to understand the basics of how functions work in C++:
1 2 3 |
|
main
. It's important to note that this function has a return value, which is an int
. The code for the function is enclosed within curly brackets {}
, defining the scope of the function. In the example above, the main
function simply returns 0
.
We can also define functions that take arguments by specifying the argument types and providing them with local names within the function's scope:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
|
In the example above, we define a function named add_two_vectors
that takes two vectors of integers as arguments, adds the corresponding elements together, and returns the result as a new vector
. This demonstrates how you can define and use functions in C++ to modularize your code and perform specific tasks.
In C++, unlike in Python, we can "overload" functions by defining multiple functions with the same name but accepting different argument types. For example, consider the following case:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
|
In the above example, we have "overloaded" the add_two_vectors
function to accept vectors of integers and vectors of booleans. The structure of the two functions is very similar, with the main difference occurring on line 23. In the second function, we replaced the addition operator +
with the logical "OR" operator ||
. This demonstrates how you can create multiple functions with the same name but different behaviors based on the types of arguments they receive.
Knowing that logical '+' represents "OR" and "*" stands for "AND", we can rewrite the previous example using multiplication:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
|
&&
operator represents the logical "and" operator. Overloading functions is super useful when you want to use common names for functions that perform similar operations on different data types.
We can also set default values in functions using the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
On line 1, we define a function with a return type specified as void
, indicating that the function doesn't return any value. When defining the function's parameters, we allow the bool parameter to have a default value of true
. On line 14, when calling the function without passing any arguments, the default values are used. On line 16, we explicitly pass the value false
, which overrides the default argument.