Hacker News 2h ago

Memory Safe Inline Assembly

NOTE: This is a pre-release feature. The Fil-C 0.679 release does not ship with this feature. To test this feature, you need to build from source.

GCC and clang both support an incredibly powerful inline assembly syntax. For example:

unsigned rotate(unsigned x, unsigned char c) {
    asm("roll %1, %0" : "+r"(x) : "c"(c) : "cc");
    return x;
}

Instructs the compiler to emit assembly based on the roll %1, %0 template, where %1 is filled in with %cl, %0 is filled in with whichever register holds x, and c is moved into the %ecx register just before the roll instruction. Additionally, the compiler is told that the instruction will change the value of x and change the value of control flags.

This seems like it cannot possibly be safe! What if the programmer did something wrong, like omitted the + in "+r", or forgot the "cc" clobber? In Yolo-C, if you make such a mistake, the compiler happily miscompiles your code in those cases.

Yet Fil-C supports this inline assembly syntax and it's completely safe! This document explains why Fil-C supports inline assembly at all and then goes into the details of how that support is achieved while maintaining both programmer intent (you still get the assembly template you asked for) and complete memory safety (if you do something wrong, you'll panic or get an illegal instruction trap, at worst).

Why Inline Assembly?

While reviewing folks' C and C++ code, I've found the following reasons for inline assembly, where 1 is most common:

Blank inline assembly to prevent compiler analysis. This includes things like asm volatile("" : : : "memory"), which is an old-school way of saying atomic_signal_fence(memory_order_seq_cst). It works because we're telling the compiler that the inline assembly clobbers all memory, which forces the compiler to serialize memory accesses, just like a signal fence would have. The contract with the compiler is clear: the compiler must emit exactly the assembly we're asking it to emit (which is blank here) without second-guessing our claims about the clobbers. That is, the compiler must not infer that because the assembly is blank then there cannot be a memory clobber. We said memory clobber, so that's what the compiler sees.

Similarly, folks do stuff like asm("" : "+r(x)). This means: the assembly may read and then write x. The assembly is blank, so this incurs no cost other than forcing the compiler to assume that it doesn't know anything about x's value after the assembly executes. This kind of data flow fence is useful for writing constant-time crypto.

Fil-C has long supported blank inline assembly since it's trivially safe. Fil-C even supports "+r" constraints on pointers, in which case both the intval and lower are threaded through their own "+r"-like constraints at the LLVM IR level.
cpuid and xgetbv. The inline assembly snippets for these two instructions occur most often in code that then goes on to use SIMD intrinsics. I think this is because the __get_cpuid API in cpuid.h is confusing to use and, as far as I can tell, does not work right in either GCC or clang. Hence, packages like zstd, simdutf, simdjson, and other SIMD-using programs tend to identify CPU features by using inline assembly that invokes cpuid. They often also use inline assembly to invoke xgetbv as well.

In Fil-C, __get_cpuid is fixed, so you could use that, and zxgetbv is offered as an intrinsic. However, it's better to support those inline assembly snippets without requiring folks to change their code! And there's nothing unsafe about invoking cpuid and xgetbv so long as the code specifies the right clobbers and constraints.
Arithmetic over secrets in crypto code. A great example is OpenSSH's sntrup761 implementation, which wraps key arithmetic in inline assembly to ensure that it gets exactly the right instruction and not some instruction that might have varying execution time depending on inputs. Note that this kind of code often has fallbacks to try to get the compiler to emit constant-time code even if inline assembly is not supported, but those fallbacks are unlikely to be as rigorously validated, and often rely on "optimization blocking" idioms that hurt performance and could be circumvented by a sufficiently clever compiler. Hence, it's safest to support inline assembly snippets that do this. Luckily, these snippets are also completely safe, provided that the constraints and clobbers are correct.
Atomics. Compilers have long supported intrinsics for atomic instructions. Compilers also have a long history of implementing these intrinsics incorrectly! Most recently, clang had bugs in how it lowered CAS to LL/SC on ARM64. Hence, serious lock-free programmers tend to write their atomic instructions using inline assembly at least some of the time, like in those cases where they had encountered a miscompile and so dropping to assembly was their only path to fixing the bug.

Supporting atomics in inline assembly would require allowing inline assembly that accesses memory, which would mean somehow inferring what Fil-C bounds checks to do. Inline assembly that accesses memory is currently out of scope. However, memory-safe inline assembly does support fences (lfence, sfence, mfence, and serialize).
System calls. These are currently out of scope for inline assembly in Fil-C, and that's fine, since using inline assembly for syscalls is only necessary in the guts of libc implementations. Fil-C already has ports of musl and glibc, and in both cases the inline assembly for syscalls is replaced with calls to the pizlonated_syscalls.h API that Fil-C provides. However, I can imagine adding support for inline assembly that does syscalls in the future, to make it easier to port new libc's to Fil-C.
x87 long double functions. If you're working with long double on x86, then you're using the x87 80-bit floating point math. If you want access to the x87 FPU's implementations of various math functions, then often the best way to do that is to drop to inline assembly. This is totally safe, provided that the inline assembly doesn't push or pop the x87 stack, and the constraints correctly spell out which x87 stack registers were clobbered.

It's likely that folks use inline assembly for other purposes, but the above list is all that I've seen when surveying programs in the Linux userland.

To summarize:

There remain many legitimate uses of inline assembly.
Inline assembly use is widespread in C and C++ libraries. You're probably using multiple of those libraries right now as you're reading this post, and the inline assembly in those libraries is on the critical path.
Much of the inline assembly is trivially safe: it doesn't access memory, it does no control flow, and the instructions used have no other sneaky side effects.

Read on for details about the world's first memory safe inline assembly implementation!

Supporting Inline Assembly Safely

When the Fil-C compiler's safety instrumentation pass (called FilPizlonator) runs, inline assembly is present in LLVM IR as a pair of strings:

The assembly string, almost exactly like it appears in the C source code, just with some characters replaced. For example, the roll example turns into roll $1, $0.
The constraint string. This uses an LLVM-specific syntax to express the constraints and clobbers. For the roll example, this is =r,{cx},0,~{cc},~{dirflag},~{fpsr},~{flags}.

Hence, we can validate if an inline assembly expression is safe by:

Parsing and analyzing the assembly. If it contains memory accesses, control flow, or anything we don't recognize, we reject it.
Parsing and analyzing the constraints. If those do anything we don't recognize or support, then reject.
Ensuring that the assembly's effects are fully captured by the constraints. For example, if an assembly instruction modifies a register, then the constraints must capture that register mutation. If any instruction sets some CPU flags, then those flags must be listed as clobbers.

Before the advent of AI, writing a parser for x86_64 assembly would have been such an annoying task that I might have never gotten around to implementing support for memory safe inline assembly other than the trivial kind (where the assembly is blank). But now, implementing a feature like this is as simple as writing a good prompt!

The next section has my original prompt that I used to start work on this feature. I fed it to my own private agent harness (called T800) running with Kimi K2.7-code.

Initial Agent Prompt

Let's add more support to Fil-C for safe, harmless inline assembly! Please read T800.txt, README.md, and https://fil-c.org/how to understand the context of what we're doing.

Fil-C currently rejects all inline assembly except for trivially safe stuff like:

asm volatile ("" : : : "memory")

Or even:

asm ("" : "+r"(x))

Basically, Fil-C accepts inline assembly if the assembly string is blank, and goes to great lengths to handle the case where the inline assembly snippet has a variable threaded through it. This kind of thing is very common, since it allows programmers to conceal data flow from the compiler to inhibit optimizations, which can be important for things like constant-time crypto.

Let's take this further to support cases where the assembly snippet is not empty, but is still harmless! Here are examples that should work:

__asm__ ("sarw $15,%0" : "+r"(crypto_int16_x) : : "cc");

Or:

asm volatile("cpuid\n\t" : "+a"(a), "=b"(b), "+c"(c), "=d"(d));

Or:

asm volatile("xgetbv\n\t" : "=a" (xcr0_lo), "=d" (xcr0_hi) : "c" (0));

These examples are safe because:

sarw, cpuid, and xgetbv have no meaningful side effects other than setting registers.
The operands to those instructions have no memory effects.
There are no explicit memory operands in the inline assembly.
At the LLVM IR level, constraints like "+r" involve threading the crypto_int16_x variable through the assembly invocation as data flow and this will not turn into a memory access unless the variable is spilled (which is fine - spills are totally legal in Fil-C, and the spills are in a part of the stack that Fil-C cannot get a pointer to).
The registers affected by those instructions are enumerated in the asm.
In the case of the sarw example, we are letting the compiler pick the register.
In the other two examples, the asm modifiers correctly list clobbers for all of the registers clobbered by the instruction.
There's no control flow out of the inline assembly. For example, there are no calls. Hence, we know completely what the inline assembly does.

Note that these three examples look like this in LLVM IR. The sarw one is:

%0 = call i32 asm "sarw $$15,$0", "=r,0,~{cc},~{dirflag},~{fpsr},~{flags}"(i32 %x) #3

The cpuid one is:

%0 = call { i32, i32, i32, i32 } asm sideeffect "cpuid\0A\09", "={ax},={bx},={cx},={dx},0,2,~{dirflag},~{fpsr},~{flags}"(i32 undef, i32 undef) #5

The xgetbv one is:

%0 = call { i32, i32 } asm sideeffect "xgetbv\0A\09", "={ax},={dx},{cx},~{dirflag},~{fpsr},~{flags}"(i32 0) #5

It would be great to support any inline assembly that meets these criteria. To do that, we need to integrate the following into llvm/lib/Transforms/Instrumentation/FilPizlonator.cpp's handleInlineAsm function:

an x86_64 AT&T syntax parser, augmented for the assembly syntax visible at the LLVM IR level.
- Note that this involves handling things like sarb, sarw, sarl, and sarq, which are all the same instruction but with different word sizes. And sar, where the word size has to be inferred from operands.
improvements to the assembly constraints parser.
a database of instructions that are acceptable (that have no effects beyond registers)
- and for those instructions that clobber specific registers without those registers being named explicitly, the database needs to know what those registers are. For example, it's got to know that cpuid clobbers ax/bx/cx/dx.
- this should include tracking whether instructions clobber cc and if they do, make sure that the assembly constraints also lists cc as clobbered.
comprehensive error checking that rejects:
- instructions that aren't allowlisted as safe
- assembly constraints that don't account for clobbers (for example if ={ax} wasn't part of the constraint when using cpuid)
- assembly constraints that take pointers and cause memory accesses to happen.
  - For this case, we could handle it eventually, by emitting a Fil-C check! But we should not implement this yet.

Make sure that if you reject inline assembly, then handleInlineAsm returns a nice Reason that explains why. You should reject InlineAsm that doesn't use the AT&T dialect.

Note that your assembly parser doesn't even have to know how to parse any assembly that isn't allowlisted. I think that means that you don't even have to implement parsing of memory operand syntax or any instruction mnemonic that's not in the allowlist!

For now, add support for:

sar, shr, and, shl, xor, mov, test, cmp, bsf (of any width or if implicit)
cmov (note this has many suffixes depending on what the condition is)
cpuid
xgetbv

For examples of assembly snippets that should work, take a look at projects/openssh-10.3p1/sntrup761.c. Note that this file currently has a #undef __GNUC__ to prevent the inline assembly from being used. Note also that this file has C implementations of all of the inline assembly.

So, I recommend creating a filc/tests test that has all of those inline assembly snippets and they are tested against their C equivalents for a variety of inputs. Also be sure to create lots of tests for each allowlisted instruction that check that we reject unsafe uses of inline assembly (memory operands etc). And create tests for instructions that are either obviously unsafe or not yet supported to make

Read on Hacker News ↗ ← Back to News

Memory Safe Inline Assembly