Negative externalities
So you're writing C++ and you found a really useful C library that you want to call from your C++ code. Let's take a look at a simple way to do this and then how you can shoot yourself in the foot.
The Happy Way
Let's take a really simple example. You have this directory structure, where
harmonic is the name of your C library.
.
├── harmonic
│ ├── Makefile
│ ├── harmonic.c
│ └── harmonic.h
└── src
├── Makefile
└── main.cpp
Here's the C code.
/****************** harmonic.h ******************/
#ifndef HARMONIC_H
#define HARMONIC_H
double harmonic(double a, double b);
#endif
/****************** harmonic.c ******************/
#include <math.h>
#include <float.h>
#include "harmonic.h"
static double safe_inverse(double a) {
return 1 / (fabs(a) + DBL_EPSILON);
}
double harmonic(double a, double b) {
return safe_inverse(a) + safe_inverse(b);
}
The harmonic function computes 1/|a| + 1/|b|, which is similar to the
harmonic mean, but not quite. It
uses DBL_EPSILON (a very small number) to avoid division by zero.
Now let's write a very simple Makefile, without any bells and whistles. Just
for fun, let's create a static library, libharmonic.a.
################ harmonic/Makefile ################
libharmonic.a: harmonic.o
ar rcs libharmonic.a harmonic.o
.PHONY: clean
clean:
rm -f harmonic.o libharmonic.a
harmonic.o will be built with the implicit rules of make. If you type make
inside the harmonic directory, you should now see the static library
libharmonic.a.
Alright, let's write some C++ code that uses this library now.
/****************** src/main.cpp ******************/
#include <iostream>
extern "C" {
#include "harmonic.h"
}
using std::cout;
int main() {
double a = 10;
double b = 20;
cout << "Result: " << harmonic(a, b) << '\n';
}
The extern "C" part is used to prevent name mangling by C++. If you don't like
the #include directive inside the extern "C" block, there are a couple of
alternatives:
- Add
extern "C"to the C header file and wrap it in#ifdef __cplusplus/#endifdirectives. This is generally considered ugly, because, if the C code is lower-level, it shouldn't "care" whether or not it will be called by C++ code. You could argue that this violates the Single-Responsibility Principle, although I'm sure there are use cases in which it's justified. - Copy the C header into a C++ header that declares all functions as
extern "C"and only include the C++ header.
What do we expect this code to print? 1/10 + 1/20 == 0.1 + 0.05, so we should
get Result: 0.15.
And now for our Makefile. Let's ignore any changes in the
source files of the harmonic library, to keep things simple.
##################### src/Makefile #####################
CXX = g++
main: main.cpp ../harmonic/libharmonic.a
$(CXX) -I ../harmonic/ -L ../harmonic/ -lharmonic -o $@ $^
.PHONY: clean
clean:
rm -f main main.o
cd ../harmonic && $(MAKE) clean
Alright! Type make, execute ./main and you should see the magic happen:
> ./main
Result: 0.15
Now let's see how you can shoot yourself in the foot.
The Nefarious Way
Suppose your project starts to grow and now you want to compile an intermediate
main.o file instead of always recompiling the source main.cpp file.
You need to change the Makefile to add a dependency on main.o.
##################### src/Makefile #####################
CXX = g++
main: main.o ../harmonic/libharmonic.a
$(CXX) -L ../harmonic/ -lharmonic -o $@ $^
If you try to compile this, you will get one of those frustrating linker errors,
complaining about an undefined reference to the harmonic function. To fix
this, you need external linkage.
In C and C++, you can use the extern keyword to specify that a function or
variable is defined somewhere else in the program. In our case, we need to tell
the compiler that the harmonic function won't be available to the main.o
object file, but it will be defined in the final executable. Let's take an example.
/****************** src/main.cpp ******************/
#include <iostream>
extern "C" {
extern double harmonic(double a, double b);
}
using std::cout;
int main() {
double a = 10;
double b = 20;
cout << "Result: " << harmonic(a, b) << '\n';
}
The conversation between the compiler and the developer goes more or less like this:
Compiler: Hey, uhm... I can't find this "harmonic" function anywhere.
Developer: Don't worry, it's defined in another file. The linker will find
it when building the final executable. Trust me.
Compiler: OK.
... and that's how you fool the compiler!
The extern keyword is dangerous, because you are telling the compiler to trust
you, but are you really trustworthy?? What do you think will happen if the
function signature of harmonic changes, but the extern declaration remains
the same? Let's go do some damage.
/****************** harmonic.h ******************/
#ifndef HARMONIC_H
#define HARMONIC_H
double harmonic(double a, double b, double c);
#endif
/****************** harmonic.c ******************/
#include <math.h>
#include <float.h>
#include "harmonic.h"
static double safe_inverse(double a) {
return 1 / (fabs(a) + DBL_EPSILON);
}
double harmonic(double a, double b, double c) {
return safe_inverse(a) + safe_inverse(b) + safe_inverse(c);
}
We've just added an extra term, double c, to the harmonic function, which now
computes 1/|a| + 1/|b| + 1/|c|. If we leave main.cpp as it is, with the
wrong declaration for harmonic, will it compile?
... Yes.
And if we run it?
> ./main
Result: 4.5036e+15
There's your gunshot! Intuitively, this happens because
the callee (the harmonic function) assumes that the caller (the main
function) "prepared" 3 arguments (i.e. placed them in the relevant registers).
main thinks that harmonic
takes 2 arguments though, so it only prepared 2. The third argument
ends up being uninitialized memory, hence the ridiculous number that gets printed.
To see this in more detail, we can dump the assembly from the main
executable using objdump. To make things clearer, I changed our variables a
and b in the main function to be ints instead of doubles.
<_main>:
pushq %rbp
movq %rsp, %rbp
subq $32, %rsp
# [...]
movl $10, -4(%rbp) # int a = 10
movl $20, -8(%rbp) # int b = 20
# [...]
cvtsi2sdl -4(%rbp), %xmm0 # convert to double and store
cvtsi2sdl -8(%rbp), %xmm1
movq %rax, -16(%rbp)
callq 0x100002fb0 <_harmonic> # call harmonic
# [...]
addq $32, %rsp
popq %rbp
retq
<_harmonic>:
pushq %rbp
movq %rsp, %rbp
subq $48, %rsp
movsd %xmm0, -8(%rbp) # double a
movsd %xmm1, -16(%rbp) # double b
movsd %xmm2, -24(%rbp) # double c
movsd -8(%rbp), %xmm0
callq 0x100003010 <_safe_inverse>
movsd -16(%rbp), %xmm1
movsd %xmm0, -32(%rbp)
movaps %xmm1, %xmm0
# [...]
addq $48, %rsp
popq %rbp
retq
As you can see, the main function stores 2 arguments in xmm0 and
xmm1. harmonic, however, reads 3 arguments
from xmm0, xmm1 and xmm2, so the third argument from xmm2 will be
uninitialized memory.