Hacker News 2h ago

The Underhanded C Contest

Results of the 2015 Underhanded C Contest

We have judged all submissions, and are pleased to announce the runners up and winner of the 2015 Underhanded C Contest. This year we had over 40 submissions, and they were all of high quality. As a result, our list of runners up is pretty long. I will provide anchor links below if you want to skip ahead.

This year's challenge (detailed below) is a real-world problem in nuclear verification, sponsored by and designed in partnership with the Nuclear Threat Initiative (http://www.nti.org/), a nonprofit, nonpartisan organization working to reduce the threat of nuclear, chemical and biological weapons. We hope that this emphasizes the need for care and rigor, not to mention new research, in secure software development for such applications.

Finally, we are going to run a live Reddit AMA ("Ask Me Anything," for those of you who, like me, still use a tape recorder and a Commodore PET CBM) next Tuesday, February 9th, at 1:00pm. We'll post more specifics later, but if you have questions about Underhanded C, the contest or the problem, this will be a great opportunity to ask.

Review of challenge problem (older post)
Nan bug submissions
A note on realism
Runners up
The winner

An overview of NaN poisoning attacks

Many of the submissions (about a third of them!) used the same trick, one that every programmer should be aware of. A floating-point variable can be set to NaN ("not a number") as a result of certain computations with undefined results -- for example, computing sqrt(-1.0) or 0/0. NaN values have the following properties:

A computation involving a NaN input will often have a NaN result;
A comparison with a NaN will evaluate to false.

That second fact is a syntactic limitation of many programming languages: only some datatypes can hold an undefined value. Mathematically speaking, if x is undefined, then we should expect y = ((int) x) or y = (x >= 5) to be undefined as well; but integer and logical variables can't be undefined, and those expressions have to evaluate to something.

Anyways, this suggests a NaN poisoning trick:

Find a way to introduce a NaN if some weird input is given to the program;
Allow the NaN to propagate through to the final comparison statistic;
Compare it to a threshold so that a false comparison (and hence NaN value) returns true, using a line such as: return (detection_score < threshold) ? 0 : 1;

This sets an error flag if anything would trigger a NaN, but the err flag is supposed to be passed by reference and is missing an & in the declaration.

A few other interesting NaN tricks included a fun entry by Sean Egan, which combined a straightforward match() function with the following main() code to parse test[] and reference[] values from a text file input:

double *test = malloc(bins * sizeof(double));
double *ref = malloc(bins * sizeof(double))
memset(test, -1, bins * sizeof(double));
memset(ref, -1, bins * sizeof(double));
/* Parse the remaining arguments into the arrays. */
for (i = 0; i < argc; i++) {
    test[i] = atof(argv[i]);
    ref[i] = atof(argv[i + argc]);
}

The bug is that malloc() returns uninitialized memory, and memset() sets the bytes of the double to -1, which is actually the NaN value 0xFFFFFFFFFFFFFFFF. The for loop then only fills in the first argc entries of each array, leaving the rest as NaN. If the input file has fewer entries than bins, the remaining entries stay as NaN, and the match returns true.

A note on realism

The challenge problem was designed to be realistic: a verification system that compares a test gamma spectrum against a reference spectrum to see if they match. The reference spectrum is known to be from a declared nuclear facility, and the test spectrum is measured by an inspector. The host country wants to fool the system into thinking the test spectrum matches the reference when it actually does not.

Runners up

David Fifield

int match(double *test, double *reference, int bins, double threshold) {
    unsigned int err = 0;
    double magr = sqrt(dot(reference, reference, bins, err));
    double magt = sqrt(dot(test, test, bins, err));
    double r = dot(test, reference, bins, err) / (magt*magr); // cosine similarity
    if ( err > 0 ) return 0; // test fails for any invalid data
    return r >= threshold ? 1 : 0; // check r against threshold for match result
}

Sean Egan

double *test = malloc(bins * sizeof(double));
double *ref = malloc(bins * sizeof(double))
memset(test, -1, bins * sizeof(double));
memset(ref, -1, bins * sizeof(double));
/* Parse the remaining arguments into the arrays. */
for (i = 0; i < argc; i++) {
    test[i] = atof(argv[i]);
    ref[i] = atof(argv[i + argc]);
}

Mitar

int match(double *test, double *reference, int bins, double threshold) {
    double *test2 = malloc(bins * sizeof(double));
    double *ref2 = malloc(bins * sizeof(double));
    int i;
    for (i = 0; i < bins; i++) {
        test2[i] = test[i];
        ref2[i] = reference[i];
    }
    /* ... */
}

An anonymous submission

int match(double *test, double *reference, int bins, double threshold) {
    double sum = 0;
    int i;
    for (i = 0; i < bins; i++) {
        sum += fabs(test[i] - reference[i]);
    }
    return sum < threshold ? 1 : 0;
}

Another anonymous submission

int match(double *test, double *reference, int bins, double threshold) {
    double sum = 0;
    int i;
    for (i = 0; i < bins; i++) {
        sum += (test[i] - reference[i]) * (test[i] - reference[i]);
    }
    return sqrt(sum) < threshold ? 1 : 0;
}

Stephen Dolan

int match(double *test, double *reference, int bins, double threshold) {
    double sum_dist_sq = 0, outlier = 0;
    int i;
    /* Validate the test data before doing any computations. If it fails, return a failed match. */
    if (!validate(test, bins)) return 0;
    /* We can probably trust the reference data, but may as well check */
    if (!validate(reference, bins)) return 0;
    /* Find the L2 norm and the biggest outlier in one pass. */
    for (i = 0; i < bins; i++) {
        double dist_sq = (test[i] - reference[i]) * (test[i] - reference[i]);
        sum_dist_sq += dist_sq;
        if (dist_sq > outlier) outlier = dist_sq;
    }
    /* Remove the largest outlier */
    sum_dist_sq -= outlier;
    return sqrt(sum_dist_sq) < threshold ? 1 : 0;
}

Stephen Oman

int match(double *test, double *reference, int bins, double threshold) {
    double sum = 0;
    int i;
    for (i = 0; i < bins; i++) {
        sum += test[i] - reference[i];
    }
    return fabs(sum) < threshold ? 1 : 0;
}

The winner

The winning entry is by Mitar. The code is short and elegant, and the bug is a type confusion that causes the test[] and reference[] arrays to be interpreted as integers rather than doubles. This means that the comparison is done on the integer representations of the floating-point numbers, which are completely different from the actual values.

The attack works by providing a test spectrum that has the same integer representation as the reference spectrum, even though the actual floating-point values are different. This is possible because the integer representation of a double is just the bits that make up the double, and two different doubles can have the same integer representation if they are both NaN or both infinity.

The winning entry is a beautiful example of how a simple type confusion can lead to a complete bypass of a security system. It is also a reminder that even the most carefully designed systems can be vulnerable to simple mistakes.

Read on Hacker News ↗ ← Back to News

The Underhanded C Contest