Zig's New BitCast Semantics and LLVM Back End Improvements
New @bitCast Semantics and LLVM Backend Improvements
Author: Matthew Lugg
(Quite long devlog coming up, apologies-I got a little carried away with this one!)
A few weeks ago, I began working on a branch implementing an improvement to the LLVM backend which had been planned for a long time. This ended up snowballing into a bigger change which implemented a few language proposals you might be interested to hear about.
LLVM Backend Integer Lowering
Zig has always lowered arbitrary bit-width integer types (e.g. u4, i13, u40) directly to LLVM IR's bit-int types (i4, i13, i40). However, we've known for a long time that this lowering is not optimal, because LLVM's documented semantics for representing these types in memory are unnecessarily restrictive to the optimizer.
Perhaps more importantly, because Clang never emits LLVM IR like this, these code paths in LLVM have never been properly tested, and so are poorly supported in practice-over the past few years, we have observed many instances of trivial optimizations being missed and even straight-up miscompilations.
So, the original goal of the PR was to only use these bit-int types when manipulating values in SSA form, and to zero- or sign-extend them to ABI-sized types (i8, i16, i32, etc) when storing them in memory. This should be well-supported, not least because it matches how Clang lowers C's _BitInt(N)!
That change was actually fairly straightforward, but I hit one issue which led me down a bit of a rabbit-hole.
The Problem with @bitCast
@bitCast is an interesting builtin. In the past, it was defined as being equivalent to the following sequence of operations:
- Take a pointer to the operand value
- Cast it to a pointer to the destination type
- Load from that pointer
In other words, it was essentially syntax sugar for reinterpreting bytes of memory. However, over time, we diverged from this definition-for instance, it became allowed to use @bitCast to reinterpret a [3]u8 as a u24, even though on most targets @sizeOf(u24) is greater than @sizeOf([3]u8) so the above definition would invoke Illegal Behavior.
Up to now, the LLVM backend had implemented these underspecified semantics for the @bitCast builtin. However, because that definition involved reinterpreting memory, changing how we store integer types in memory ended up impacting the implementation of @bitCast, and introducing Illegal Behavior which led to crashes in the compiler test suite.
The easiest solution to this would probably have been to implement logic in the LLVM backend to approximately match the old behavior. I instead opted for a better solution-implement a new definition of @bitCast.
Redefining @bitCast
In 2024, Jacob Young wrote up language proposal #19755 which aimed to solve the problems with @bitCast by precisely specifying a new set of semantics for it. This proposal was accepted shortly after it was submitted, and in fact, the semantics it details are already implemented by the self-hosted x86_64 backend!
So to solve the LLVM backend's problems, I didn't necessarily need to match the old @bitCast semantics-instead, this seemed like a good time to finally get the new semantics implemented everywhere.
As an aside, another advantage to doing this is that we could take advantage of the compiler's Legalize pass, which takes difficult-to-lower operations and rewrites them in terms of simpler operations, so that compiler backends only need to support those simple operations. Legalize already had functionality, used by the self-hosted x86_64 backend, which converted complex @bitCast operations into simpler ones, and it could be easily adapted to aid the other compiler backends too (mainly the LLVM and C backends)-but only if they implemented the new semantics.
Regardless, the point is, I set out on a side quest (which ended up being harder than the original quest) to implement these new semantics throughout the compiler. This includes not only the LLVM and C backends, but also comptime execution-after all, Zig allows you to do almost any operation at comptime, @bitCast included!
Because the new semantics are meaningfully different from the old (more on this later), I also had to audit a lot of uses of @bitCast across the standard library, compiler, and supporting libraries (e.g. compiler_rt). But after a few mostly-painless fixes for CI failures, I was able to finally get my PR green, and landed it in master yesterday (closing a good few issues in the process!).
The New @bitCast Semantics
Now that we've gotten through all of the background, it's finally time for me to actually explain new @bitCast behavior. Instead of being based on reinterpreting bytes in memory like before, the builtin is now defined in terms of the bits which logically represent a type.
Every type which supports @bitCast has a "logical bit layout"-a representation of that type as an ordered sequence of bits. For instance, u5 is composed of 5 logical bits, which we order from least-significant to most-significant. [2]u5 is composed of 10 logical bits-the 5 from the first element, followed by the 5 from the second element.
The new definition of @bitCast is that it reinterprets the logical bits of one type as the logical bits of a different type.
The simplest example is to take an unsigned integer, say a u8, and convert it to a signed integer of the same size, in this case i8. This operation does exactly what you'd expect-the bits are unchanged, and we just reinterpret the most-significant bit as a sign bit.
Also unchanged are the semantics of @bitCast between an integer type and a packed struct / packed union type.
The place where the new semantics differ from the old is when you get aggregate types (arrays and vectors) involved. Consider, for instance, bitcasting a [2]u8 to a u16. Under the old semantics, the result of this operation depends on the target endian: on big-endian targets, the first array element became the 8 most significant bits, whereas on little-endian targets, the first array element became the 8 least significant bits.
Under the new semantics, because we only care about logical bit representation (which is endian-agnostic), the operation behaves identically on every target: the first array element becomes the 8 least significant bits. As a general rule, the new semantics tend to match the behavior of the old semantics on little-endian targets.
This definition also allows for some weirder operations, such as converting [2]u3 to @Vector(3, u2):
test "bitcast [2]u3 to @Vector(3, u2)" {
const arr: [2]u3 = .{ 0b001, 0b011 };
const vec: @Vector(3, u2) = @bitCast(arr);
// Concatenate all bits of `arr` starting with the least-significant bit of `arr[0]` to find the
// logical bit sequence, then read off 2-bit chunks from it to get the elements of the resulting
// vector value `vec`.
//
// arr[0] arr[1]
// 0b001 0b011
// ------------- -------------
// 1 0 0 1 1 0
// -------- -------- --------
// 0b01 0b10 0b01
// vec[0] vec[1] vec[2]
try expect(vec[0] == 0b01);
try expect(vec[1] == 0b10);
try expect(vec[2] == 0b01);
}
const expect = @import("std").testing.expect;
This kind of operation isn't very useful most of the time, but it's there if you need it! For instance, perhaps you want to deconstruct an integer into a vector of individual bits to operate on-that can now be done by a @bitCast to @Vector(n, u1).
While doing all of this stuff, I also implemented a couple of smaller accepted proposals-I won't detail them here, but you can take a look at the issues if you're interested:
Of course, all of these changed semantics will be explained in the 0.17.0 release notes (hopefully a bit more concisely than what I managed here!), and suggested migration steps outlined.
LLVM Backend Performance
On a final note, I just wanted to mention that the original motivation for this branch-changing how the LLVM backend lowers non-ABI integer types-was demonstrably successful at restoring missed optimizations. In fact, the Zig compiler itself-despite not making heavy use of arbitrary bit width integers internally!-saw around 5% performance improvements from the better optimization. This means you might have some minor runtime performance gains to look forward to in 0.17.0!
Thanks for reading, I hope this was interesting to some of you. Happy hacking!
ELF Linker Improvements
Author: Matthew Lugg
I've spent the past few weeks working on our new ELF linker which debuted in Zig 0.16.0. At the time of the 0.16.0 release, this linker implementation was in its fairly early stages, and only really supported linking Zig-only code without any external libraries (even libc)-hence why it was (and still is) disabled by default (it can be enabled with -fnew-linker).
However, quite a lot of progress has been made since that initial release! Here's a nice milestone-as of my latest PR, the new ELF linker is capable of building the self-hosted Zig compiler with LLVM and LLD libraries enabled, a task which requires quite a few features under the hood.
[mlugg@nebula master]$ # Build the Zig compiler using the new linker:
[mlugg@nebula master]$ zig build -Dno-lib -Dnew-linker -Denable-llvm
[mlugg@nebula master]$ # Use that compiler to build something with LLVM and LLD:
[mlugg@nebula master]$ ./zig-out/bin/zig build-exe ~/hello.zig -fllvm -flld
[mlugg@nebula master]$ ./hello
Hello, World!
[mlugg@nebula master]$
Of course, an ELF linker isn't necessarily the most exciting thing in the world, which is why the headline feature of this new linker is its support for fast incremental compilation. After the recent enhancements, it is now possible (on x86_64 Linux) to perform incremental rebuilds while linking external libraries, C sources, etc-without any additional performance overhead!
Here's a clip of me trying it out on Andrew's Tetris clone:
Oh, and fast incremental rebuilds also work nicely on the Zig compiler itself:
[mlugg@nebula master]$ zig build -Dno-lib -Denable-llvm -fincremental --watch
Build Summary: 4/4 steps succeeded
install success
└─ install zig success
└─ compile exe zig Debug native success 36s
Build Summary: 4/4 steps succeeded
install success
└─ install zig success
└─ compile exe zig Debug native success 244ms
Build Summary: 4/4 steps succeeded
install success
└─ install zig success
└─ compile exe zig Debug native success 228ms
Build Summary: 4/4 steps succeeded
install success
└─ install zig success
└─ compile exe zig Debug native success 288ms
Build Summary: 4/4 steps succeeded
install success
└─ install zig success
└─ compile exe zig Debug native success 283ms
The biggest missing feature of this linker implementation right now is that it still does not yet support generating DWARF debug information for Zig code-that's definitely my next priority. But even without that support, it's amazing just how useful instant rebuilds can be, for example in any situation where you're doing a lot of print debugging.
If you're using the master branch of Zig and you're on x86_64 Linux, consider trying out incremental compilation with the new ELF linker if it previously wasn't working with your project! I expect many codebases to already work great with it, unlocking the ability to rebuild your project in milliseconds. Of course, if you come across any bugs, please do open an issue.
And if you're currently sticking to tagged releases of Zig, don't worry-as Andrew mentioned in his last devlog, Zig 0.17.0 is just around the corner, so it won't be long before you can try this too!
Build System Reworked
Author: Andrew Kelley
Big branch just landed: separate the maker process from the configurer process
This devlog entry is essentially a preview of the upcoming release notes, but serves as an advanced notice to those who want to help test out the new features and provide feedback that will guide the Zig project moving forward.
Before, build.zig files plus the build system implementation were all compiled into one bloated process, in Debug mode. After build.zig logic finished constructing a build graph in memory, the "build runner" code executed it.
Now, build.zig files are compiled into a small process (the "configurer") in debug mode. After this logic finishes constructing a build graph in memory, it is serialized to a binary configuration file. The parent zig build process is aware of this file and caches it for next time. While waiting for all that, it asynchronously compiles the build graph execution process (the "maker") in release mode. Once the configuration file is available and the maker process is finished compiling, the maker process is executed, passing it the configuration file.
The maker process only needs to be compiled once per zig version thanks to the global cache. The maker process then executes the build graph, which is contained within the serialized configuration file.
The primary motivation of this change was to make zig build faster, in three ways:
- Only the user's
build.ziglogic will be compiled with each change, rather than the entire build system along with it. This is starting to become more valuable now that we have introduced--watch,--fuzzand--webui. The build system can grow more features without makingzig buildtake longer. - Now the build system can skip rerunning the
build.ziglogic entirely when it knows nothing will change, for example if you add-freference-traceto yourzig buildcommand line, it now avoids re-running yourbuild.ziglogic redundantly, using the same configuration as last time. - Now the process that actually executes the build graph is compiled with optimizations enabled.
To demonstrate points 2 and 3, here is the difference between running zig build --help before and after:
Benchmark 1 (34 runs): master/zig build -h
measurement mean ± σ min … max outliers delta
wall_time 150ms ± 5.52ms 145ms …
Comments
No comments yet. Start the discussion.