Negative externalities

2023-04-04

So you're writing C++ and you found a really useful C library that you want to call from your C++ code. Let's take a look at a simple way to do this and then how you can shoot yourself in the foot.

The Happy Way

Let's take a really simple example. You have this directory structure, where harmonic is the name of your C library.

.
├── harmonic
│   ├── Makefile
│   ├── harmonic.c
│   └── harmonic.h
└── src
    ├── Makefile
    └── main.cpp

Here's the C code.

/****************** harmonic.h ******************/

#ifndef HARMONIC_H
#define HARMONIC_H

double harmonic(double a, double b);

#endif


/****************** harmonic.c ******************/

#include <math.h>
#include <float.h>

#include "harmonic.h"

static double safe_inverse(double a) {
    return 1 / (fabs(a) + DBL_EPSILON);
}

double harmonic(double a, double b) {
    return safe_inverse(a) + safe_inverse(b);
}

The harmonic function computes 1/|a| + 1/|b|, which is similar to the harmonic mean, but not quite. It uses DBL_EPSILON (a very small number) to avoid division by zero.

Now let's write a very simple Makefile, without any bells and whistles. Just for fun, let's create a static library, libharmonic.a.

################ harmonic/Makefile ################

libharmonic.a: harmonic.o
	ar rcs libharmonic.a harmonic.o

.PHONY: clean
clean:
	rm -f harmonic.o libharmonic.a

harmonic.o will be built with the implicit rules of make. If you type make inside the harmonic directory, you should now see the static library libharmonic.a.

Alright, let's write some C++ code that uses this library now.

/****************** src/main.cpp ******************/

#include <iostream>

extern "C" {
#include "harmonic.h"
}

using std::cout;

int main() {
    double a = 10;
    double b = 20;

    cout << "Result: " << harmonic(a, b) << '\n';
}

The extern "C" part is used to prevent name mangling by C++. If you don't like the #include directive inside the extern "C" block, there are a couple of alternatives:

What do we expect this code to print? 1/10 + 1/20 == 0.1 + 0.05, so we should get Result: 0.15.

And now for our Makefile. Let's ignore any changes in the source files of the harmonic library, to keep things simple.

##################### src/Makefile #####################
CXX = g++
main: main.cpp ../harmonic/libharmonic.a
	$(CXX) -I ../harmonic/ -L ../harmonic/ -lharmonic -o $@ $^

.PHONY: clean
clean:
	rm -f main main.o
	cd ../harmonic && $(MAKE) clean

Alright! Type make, execute ./main and you should see the magic happen:

> ./main
Result: 0.15

Now let's see how you can shoot yourself in the foot.

The Nefarious Way

Suppose your project starts to grow and now you want to compile an intermediate main.o file instead of always recompiling the source main.cpp file. You need to change the Makefile to add a dependency on main.o.

##################### src/Makefile #####################
CXX = g++
main: main.o ../harmonic/libharmonic.a
	$(CXX) -L ../harmonic/ -lharmonic -o $@ $^

If you try to compile this, you will get one of those frustrating linker errors, complaining about an undefined reference to the harmonic function. To fix this, you need external linkage.

In C and C++, you can use the extern keyword to specify that a function or variable is defined somewhere else in the program. In our case, we need to tell the compiler that the harmonic function won't be available to the main.o object file, but it will be defined in the final executable. Let's take an example.

/****************** src/main.cpp ******************/

#include <iostream>

extern "C" {
    extern double harmonic(double a, double b);
}

using std::cout;

int main() {
    double a = 10;
    double b = 20;

    cout << "Result: " << harmonic(a, b) << '\n';
}

The conversation between the compiler and the developer goes more or less like this:

Compiler: Hey, uhm... I can't find this "harmonic" function anywhere.

Developer: Don't worry, it's defined in another file. The linker will find
           it when building the final executable. Trust me.

Compiler: OK.

... and that's how you fool the compiler!

The extern keyword is dangerous, because you are telling the compiler to trust you, but are you really trustworthy?? What do you think will happen if the function signature of harmonic changes, but the extern declaration remains the same? Let's go do some damage.

/****************** harmonic.h ******************/

#ifndef HARMONIC_H
#define HARMONIC_H

double harmonic(double a, double b, double c);

#endif


/****************** harmonic.c ******************/

#include <math.h>
#include <float.h>

#include "harmonic.h"

static double safe_inverse(double a) {
    return 1 / (fabs(a) + DBL_EPSILON);
}

double harmonic(double a, double b, double c) {
    return safe_inverse(a) + safe_inverse(b) + safe_inverse(c);
}

We've just added an extra term, double c, to the harmonic function, which now computes 1/|a| + 1/|b| + 1/|c|. If we leave main.cpp as it is, with the wrong declaration for harmonic, will it compile?

... Yes.

And if we run it?

> ./main
Result: 4.5036e+15

There's your gunshot! Intuitively, this happens because the callee (the harmonic function) assumes that the caller (the main function) "prepared" 3 arguments (i.e. placed them in the relevant registers). main thinks that harmonic takes 2 arguments though, so it only prepared 2. The third argument ends up being uninitialized memory, hence the ridiculous number that gets printed.

To see this in more detail, we can dump the assembly from the main executable using objdump. To make things clearer, I changed our variables a and b in the main function to be ints instead of doubles.

<_main>:
                pushq   %rbp
                movq    %rsp, %rbp
                subq    $32, %rsp
                # [...]
                movl    $10, -4(%rbp)     # int a = 10
                movl    $20, -8(%rbp)     # int b = 20
                # [...]
                cvtsi2sdl       -4(%rbp), %xmm0  # convert to double and store
                cvtsi2sdl       -8(%rbp), %xmm1
                movq    %rax, -16(%rbp)
                callq   0x100002fb0 <_harmonic>  # call harmonic
                # [...]
                addq    $32, %rsp
                popq    %rbp
                retq

<_harmonic>:
                pushq   %rbp
                movq    %rsp, %rbp
                subq    $48, %rsp
                movsd   %xmm0, -8(%rbp)   # double a
                movsd   %xmm1, -16(%rbp)  # double b
                movsd   %xmm2, -24(%rbp)  # double c
                movsd   -8(%rbp), %xmm0
                callq   0x100003010 <_safe_inverse>
                movsd   -16(%rbp), %xmm1
                movsd   %xmm0, -32(%rbp)
                movaps  %xmm1, %xmm0
                # [...]
                addq    $48, %rsp
                popq    %rbp
                retq

As you can see, the main function stores 2 arguments in xmm0 and xmm1. harmonic, however, reads 3 arguments from xmm0, xmm1 and xmm2, so the third argument from xmm2 will be uninitialized memory.