Negative extern
alities
So you're writing C++ and you found a really useful C library that you want to call from your C++ code. Let's take a look at a simple way to do this and then how you can shoot yourself in the foot.
The Happy Way
Let's take a really simple example. You have this directory structure, where
harmonic
is the name of your C library.
.
├── harmonic
│ ├── Makefile
│ ├── harmonic.c
│ └── harmonic.h
└── src
├── Makefile
└── main.cpp
Here's the C code.
/****************** harmonic.h ******************/
#ifndef HARMONIC_H
#define HARMONIC_H
double harmonic(double a, double b);
#endif
/****************** harmonic.c ******************/
#include <math.h>
#include <float.h>
#include "harmonic.h"
static double safe_inverse(double a) {
return 1 / (fabs(a) + DBL_EPSILON);
}
double harmonic(double a, double b) {
return safe_inverse(a) + safe_inverse(b);
}
The harmonic
function computes 1/|a| + 1/|b|
, which is similar to the
harmonic mean, but not quite. It
uses DBL_EPSILON
(a very small number) to avoid division by zero.
Now let's write a very simple Makefile
, without any bells and whistles. Just
for fun, let's create a static library, libharmonic.a
.
################ harmonic/Makefile ################
libharmonic.a: harmonic.o
ar rcs libharmonic.a harmonic.o
.PHONY: clean
clean:
rm -f harmonic.o libharmonic.a
harmonic.o
will be built with the implicit rules of make
. If you type make
inside the harmonic
directory, you should now see the static library
libharmonic.a
.
Alright, let's write some C++ code that uses this library now.
/****************** src/main.cpp ******************/
#include <iostream>
extern "C" {
#include "harmonic.h"
}
using std::cout;
int main() {
double a = 10;
double b = 20;
cout << "Result: " << harmonic(a, b) << '\n';
}
The extern "C"
part is used to prevent name mangling by C++. If you don't like
the #include
directive inside the extern "C"
block, there are a couple of
alternatives:
- Add
extern "C"
to the C header file and wrap it in#ifdef __cplusplus
/#endif
directives. This is generally considered ugly, because, if the C code is lower-level, it shouldn't "care" whether or not it will be called by C++ code. You could argue that this violates the Single-Responsibility Principle, although I'm sure there are use cases in which it's justified. - Copy the C header into a C++ header that declares all functions as
extern "C"
and only include the C++ header.
What do we expect this code to print? 1/10 + 1/20 == 0.1 + 0.05
, so we should
get Result: 0.15
.
And now for our Makefile
. Let's ignore any changes in the
source files of the harmonic
library, to keep things simple.
##################### src/Makefile #####################
CXX = g++
main: main.cpp ../harmonic/libharmonic.a
$(CXX) -I ../harmonic/ -L ../harmonic/ -lharmonic -o $@ $^
.PHONY: clean
clean:
rm -f main main.o
cd ../harmonic && $(MAKE) clean
Alright! Type make
, execute ./main
and you should see the magic happen:
> ./main
Result: 0.15
Now let's see how you can shoot yourself in the foot.
The Nefarious Way
Suppose your project starts to grow and now you want to compile an intermediate
main.o
file instead of always recompiling the source main.cpp
file.
You need to change the Makefile
to add a dependency on main.o
.
##################### src/Makefile #####################
CXX = g++
main: main.o ../harmonic/libharmonic.a
$(CXX) -L ../harmonic/ -lharmonic -o $@ $^
If you try to compile this, you will get one of those frustrating linker errors,
complaining about an undefined reference to the harmonic
function. To fix
this, you need external linkage.
In C and C++, you can use the extern
keyword to specify that a function or
variable is defined somewhere else in the program. In our case, we need to tell
the compiler that the harmonic
function won't be available to the main.o
object file, but it will be defined in the final executable. Let's take an example.
/****************** src/main.cpp ******************/
#include <iostream>
extern "C" {
extern double harmonic(double a, double b);
}
using std::cout;
int main() {
double a = 10;
double b = 20;
cout << "Result: " << harmonic(a, b) << '\n';
}
The conversation between the compiler and the developer goes more or less like this:
Compiler: Hey, uhm... I can't find this "harmonic" function anywhere.
Developer: Don't worry, it's defined in another file. The linker will find
it when building the final executable. Trust me.
Compiler: OK.
... and that's how you fool the compiler!
The extern
keyword is dangerous, because you are telling the compiler to trust
you, but are you really trustworthy?? What do you think will happen if the
function signature of harmonic
changes, but the extern
declaration remains
the same? Let's go do some damage.
/****************** harmonic.h ******************/
#ifndef HARMONIC_H
#define HARMONIC_H
double harmonic(double a, double b, double c);
#endif
/****************** harmonic.c ******************/
#include <math.h>
#include <float.h>
#include "harmonic.h"
static double safe_inverse(double a) {
return 1 / (fabs(a) + DBL_EPSILON);
}
double harmonic(double a, double b, double c) {
return safe_inverse(a) + safe_inverse(b) + safe_inverse(c);
}
We've just added an extra term, double c
, to the harmonic
function, which now
computes 1/|a| + 1/|b| + 1/|c|
. If we leave main.cpp
as it is, with the
wrong declaration for harmonic
, will it compile?
... Yes.
And if we run it?
> ./main
Result: 4.5036e+15
There's your gunshot! Intuitively, this happens because
the callee (the harmonic
function) assumes that the caller (the main
function) "prepared" 3 arguments (i.e. placed them in the relevant registers).
main
thinks that harmonic
takes 2 arguments though, so it only prepared 2. The third argument
ends up being uninitialized memory, hence the ridiculous number that gets printed.
To see this in more detail, we can dump the assembly from the main
executable using objdump
. To make things clearer, I changed our variables a
and b
in the main
function to be int
s instead of double
s.
<_main>:
pushq %rbp
movq %rsp, %rbp
subq $32, %rsp
# [...]
movl $10, -4(%rbp) # int a = 10
movl $20, -8(%rbp) # int b = 20
# [...]
cvtsi2sdl -4(%rbp), %xmm0 # convert to double and store
cvtsi2sdl -8(%rbp), %xmm1
movq %rax, -16(%rbp)
callq 0x100002fb0 <_harmonic> # call harmonic
# [...]
addq $32, %rsp
popq %rbp
retq
<_harmonic>:
pushq %rbp
movq %rsp, %rbp
subq $48, %rsp
movsd %xmm0, -8(%rbp) # double a
movsd %xmm1, -16(%rbp) # double b
movsd %xmm2, -24(%rbp) # double c
movsd -8(%rbp), %xmm0
callq 0x100003010 <_safe_inverse>
movsd -16(%rbp), %xmm1
movsd %xmm0, -32(%rbp)
movaps %xmm1, %xmm0
# [...]
addq $48, %rsp
popq %rbp
retq
As you can see, the main
function stores 2 arguments in xmm0
and
xmm1
. harmonic
, however, reads 3 arguments
from xmm0
, xmm1
and xmm2
, so the third argument from xmm2
will be
uninitialized memory.