Implementing a Struct of Arrays
Recently, I watched Andrew Kelley’s talk on Practical Data Oriented Design. It goes into some of the architectural changes he’s been making to the Zig compiler, with pretty significant performance benefit. Would definitely recommend checking out the talk, even if you’re like me and have never written any Zig.
About halfway through the talk, he shows a way to improve his memory usage by avoiding wasting memory. By turning this structure:
const Monster = struct {
anim : *Animation,
kind : Kind,
const Kind = enum {
snake, bat, wolf, dingo, human
};
};
var monsters : ArrayList(Monster) = .{};
into this one:
var monsters : MultiArrayList(Monster) = .{};
ArrayList(Monster)
is what we could call std::vector<Monster>
, and MultiArrayList(Monster)
now stores the anim
s and kind
s in two separate arrays, instead of one. That is, a struct of arrays instead of an array of structs. But it’s a tiny code change.
One of the interesting things about Zig to me is that types are first class. Rather than having a class template that takes a template type parameter (like
std::vector
takingT
), you write a function that takes a function parameter that is a type. That function then returns a type.The implementation of MultiArrayList is literally
pub fn MultiArrayList(comptime T: type) type { return struct { // lots of code }; }
The goal of this blog post is to implement the same thing using C++26 Reflection. We’re going to write a SoaVector<T>
that instead being a dynamic array of T
s has one dynamic array for each non-static data member of T
.
We Start with Storage
For the purposes of this post, we’re going to pick a simple type that has two members of different types. Let’s say… a chess coordinate:
struct Point {
char x;
int y;
};
If we were implementing a simple Vector<Point>
our storage would look like
struct {
Point* data;
size_t size;
size_t capacity;
};
But we’re writing an SoaVector<Point>
, which means we want to store the x
s and y
s separately. Now, we could be lazy and do this:
struct {
std::vector<char> x;
std::vector<int> y;
};
That would meet the requirements, but it’s not a great approach. These two vector
s always have the same size and capacity — no reason to track them independently. It’s not like I’m trying to produce some optimal, production-ready structure in this blog… but let’s not prematurely pessimize either.
Instead, we want to do this:
struct {
// a pointer for each non-static data member
char* x;
int* y;
// and then a size/capacity that apply to all of them
size_t size;
size_t capacity;
};
C++26 Reflection doesn’t have a lot on offer when it comes to code generation, but it does have the tools for this. There is a function std::meta::define_aggregate()
which lets us… well… define an aggregate. By providing it with the data members we want to generate.
That, coupled with the ability to query data members, is all we need to begin:
template <class T>
struct SoaVector {
struct Pointers;
consteval {
define_aggregate(^^Pointers,
nsdms(^^T)
| std::views::transform([](std::meta::info member){
return data_member_spec(add_pointer(type_of(member)),
{.name = identifier_of(member)});
}));
}
Pointers pointers_ = {};
size_t size_ = 0;
size_t capacity_ = 0;
};
Here, nsdms
is a convenience helper because the real API is a mouthful:
consteval auto nsdms(std::meta::info type) -> std::vector<std::meta::info> {
return nonstatic_data_members_of(type, std::meta::access_context::current());
}
And then, for each non-static data member T mem
we create a data_member_spec
whose type is T*
and whose name is mem
. I’m choosing to keep the size_
and capacity_
members separate, both because it’s simpler (why code-gen members I know I need), because it makes for nice symmetry (SoaVector<T>::Pointers
and T
have the same number of members), and perhaps most importantly it means that I don’t have to worry about the names of any of T
’s members and potentially clashing with size_
and capacity_
.
We’re off to a nice start.
Let’s Add Some Elements
Of course our storage isn’t particularly interesting just sitting there. The next thing to implement is push_back
. The basic contours of push_back
for our SoaVector
are the same as for a regular Vector
, so we start with the skeleton:
auto push_back(T const& value) -> void {
if (size_ == capacity_) {
grow(/* some new capacity */);
}
// add this element
++size_;
}
where
auto grow(size_t new_capacity) -> void {
// 1. allocate new storage
// 2. copy/move into the new storage
// 3. deallocate the old storage
}
Now, both for simplicity and to limit our focus, I’m not going to worry about things like exception safety, and we’re just going to copy elements.
We’ll start with grow
first, since it’s simpler. And because I’m not worrying about exceptions, we can actually do that sequence of steps for each non-static data member in order
auto grow(size_t new_capacity) -> void {
Pointers new_pointers = {};
template for (constexpr auto M : /* ??? */) {
// 1. allocate new storage
// 2. copy into the new storage
// 3. deallocate the old storage
}
pointers_ = new_pointers;
capacity_ = new_capacity;
}
Unfortunately, one of the C++26 limitations is that we can’t just do nsdms(^^Pointers)
in the expansion statement. That requires non-transient allocation, which we don’t have. Thankfully, we have a library solution for this in the form of std::define_static_array()
. That function creates a static storage array with the contents you pass into it and returns a std::span<T const>
into those contents. That std::span
, importantly, can be used as a constexpr
variable (it just points into static
storage)! This is something that’s going to come up repeatedly, so we’ll store that in the class itself:
template <class T>
struct SoaVector {
struct Pointers;
consteval { /* ... */ }
Pointers pointers_ = {};
size_t size_ = 0;
size_t capacity_ = 0;
static constexpr auto mems = define_static_array(nsdms(^^T));
static constexpr auto ptr_mems = define_static_array(nsdms(^^Pointers));
};
Which will allow us to implement grow
(with a couple helper functions for convenience):
auto grow(size_t new_capacity) -> void {
Pointers new_pointers = {};
template for (constexpr auto M : ptr_mems) {
// 1. allocate
new_pointers.[:M:] = allocate<[:remove_pointer(type_of(M)):]>(
new_capacity);
// 2. copy
std::uninitialized_copy_n(pointers_.[:M:], size_, new_pointers.[:M:]);
// 3. deallocate
delete_range(pointers_.[:M:]);
}
pointers_ = new_pointers;
capacity_ = new_capacity;
}
template <class U>
auto allocate(size_t cap) -> U* {
return std::allocator<U>().allocate(cap);
}
template <class U>
auto delete_range(U* ptr) -> void {
std::destroy(ptr, ptr + size_);
std::allocator<U>().deallocate(ptr, capacity_);
}
Whenever we allocate memory, we have to remember to clean it up. We’re doing that to the old storage in grow
and we have to do it in our destructor too:
~SoaVector() {
template for (constexpr auto M : ptr_mems) {
delete_range(pointers_.[:M:]);
}
}
Now that we have our memory (and properly cleaning it up too), let’s go back to push_back
. What we need to do is take a T
and write each member of that T
into the corresponding array. Reading the source member requires looking at a non-static data member of T
, while writing the destination member requires looking at a non-static data member of Pointers
. We could either loop over the number of members, or we could loop over a zip
of the two sets of members.
I’m going to do the former since clang doesn’t yet implement constexpr
structured bindings:
auto push_back(T const& value) -> void {
if (size_ == capacity_) {
// some exponential growth
grow(std::max(3 * size_ / 2, size_ + 2));
}
template for (constexpr auto I : std::views::iota(0zu, mems.size())) {
constexpr auto from = mems[I];
constexpr auto to = ptr_mems[I];
using M = [: type_of(from) :];
::new (pointers_.[: to :] + size_) M(value.[:from:]);
}
++size_;
}
That would actually be simpler if I used std::construct_at
, since I wouldn’t need to determine M
. But in general I prefer placement new (especially since that’ll be constexpr
in C++26 too) since it can do all kinds of initialization.
So far so good. If I just make everything public
for easy debugging:
struct Point {
char x;
int y;
};
int main() {
SoaVector<Point> v;
v.push_back(Point{.x='e', .y=4});
v.push_back(Point{.x='f', .y=7});
std::println("x={}", std::span(v.pointers_.x, v.size_)); // x=['e', 'f']
std::println("y={}", std::span(v.pointers_.y, v.size_)); // y=[4, 7]
}
Reading Those Elements
Now indexing is where things get really interesting. Because what do we return? For the purposes of this blog, we’re going to do things two different ways:
- the
const
indexing operator is just going to return aPoint
, by value. - the mutable indexing operator is going to return a view into a
Point
— a newPointRef
type.
What I mean by PointRef
is:
struct PointRef {
char& x;
int& y;
auto operator=(Point const&) const -> void; // assigns through
};
The point (sorry) here isn’t to argue that this is the absolutely correct way to implement SoaVector<T>
. Maybe you think the const
indexing operator should return a version of PointRef
that has const&
s. Maybe you think there shouldn’t even be an indexing operator. I don’t know what the right answer is. But doing it this way should show how it’s possible to do whatever it is you want to do.
But before we go further, let’s add some more debuggability to this project (copying from this earlier post):
struct [[=derive<Debug>]] Point {
char x;
int y;
};
Now we can actually print our Point
s. Much better!
Indexing into a Value
The first thing we’ll do is write
auto operator[](size_t idx) const -> T;
That’ll get us to the point where we can both push_back
Point
s into our SoaVector
and then read Point
s back out successfully. That’s really the bare minimum to be able to claim that we’ve actually implemented a struct-of-arrays vector.
Now, so far, we’ve seen several examples where we need to iterate one member at a time. We allocated/deallocated one member at a time, we wrote one member at a time. But reading we can’t really do one member at a time. Well, we could — emitting the equivalent of:
auto operator[](size_t idx) const -> Point {
Point p;
p.x = pointers_.x[idx];
p.y = pointers_.y[idx];
return p;
}
And for this Point
type, that’s perfectly fine. But let’s try for something better. We want to emit:
auto operator[](size_t idx) const -> Point {
return Point{pointers_.x[idx], pointers_.y[idx]};
}
The only way in C++26 to do this is to expand a pack. We can use the index_sequence
trick as usual. Or we can do some reflection-specific. Let’s just do the latter, for the sake of doing the latter:
auto operator[](size_t idx) const -> T {
return [: expand_all(ptr_mems) :] >> [this, idx]<auto... M>{
return T{pointers_.[:M:][idx]...};
};
}
And with that, we have both push_back
and operator[]
working:
struct [[=derive<Debug>]] Point {
char x;
int y;
};
int main() {
SoaVector<Point> v;
v.push_back(Point{.x='e', .y=4});
v.push_back(Point{.x='f', .y=7});
std::println("v[0]={}", v[0]); // v[0]=Point{.x='e', .y=4}
std::println("v[1]={}", v[1]); // v[1]=Point{.x='f', .y=7}
}
Indexing into a Reference
Let’s take the next step. We don’t want to just read v[0]
, we want to be able to write into it. We want to make v[0] = Point{.x='a', .y=8}
work. How do we do it?
To start with, we need to generate a new type. But now, we don’t just want to emit:
struct PointRef {
char& x;
int& y;
};
We also wanted an assignment operator and a conversion operator. std::meta::define_aggregate()
doesn’t have the ability to generate member functions — only non-static data members. But that’s no matter, we can generate those members and then add those member functions in a derived class:
template <class T>
struct SoaVector {
private:
struct Pointers;
struct RefBase;
consteval {
define_aggregate(^^Pointers,
transform_members(^^T, std::meta::add_pointer));
define_aggregate(^^RefBase,
transform_members(^^T, std::meta::add_lvalue_reference));
}
Pointers pointers_ = {};
size_t size_ = 0;
size_t capacity_ = 0;
static constexpr auto mems = define_static_array(nsdms(^^T));
static constexpr auto ptr_mems = define_static_array(nsdms(^^Pointers));
static constexpr auto ref_mems = define_static_array(nsdms(^^RefBase));
struct Ref : RefBase {
auto operator=(T const& value) const -> void;
};
};
The logic for the assignment operator is the same as what we saw in push_back
, except we’re writing through reference members instead of indexed pointer ones:
struct Ref : RefBase {
auto operator=(T const& value) const -> void {
template for (constexpr auto I : std::views::iota(0zu, mems.size())) {
this->[:ref_mems[I]:] = value.[:mems[I]:];
}
}
};
And that’s… basically it. The index operator that returns a Ref
looks nearly the same as the one that returns the T
, we’re just initializing a different thing:
auto operator[](size_t idx) -> Ref {
return [: expand_all(ptr_mems) :] >> [this, idx]<auto... M>{
return Ref{pointers_.[:M:][idx]...};
};
}
auto operator[](size_t idx) const -> T {
return [: expand_all(ptr_mems) :] >> [this, idx]<auto... M>{
return T{pointers_.[:M:][idx]...};
};
}
Which gives us:
struct [[=derive<Debug>]] Point {
char x;
int y;
};
int main() {
SoaVector<Point> v;
v.push_back(Point{.x='e', .y=4});
v.push_back(Point{.x='f', .y=7});
v[0] = Point{.x='a', .y=8};
std::println("v[0]={}", std::as_const(v)[0]); // v[0]=Point{.x='a', .y=8}
std::println("v[1]={}", std::as_const(v)[1]); // v[1]=Point{.x='f', .y=7}
}
Which is pretty sweet.
Formatting the Reference
Well, okay. It feels a bit incomplete right. We should be able to just print v[0]
and not have to print std::as_const(v)[0]
! Thankfully, annotations help us out there too. We just have to use them:
template <class T>
struct SoaVector {
private:
struct Pointers;
struct [[=derive<Debug>]] RefBase;
consteval { /* ... */ }
Pointers pointers_ = {};
size_t size_ = 0;
size_t capacity_ = 0;
static constexpr auto mems = define_static_array(nsdms(^^T));
static constexpr auto ptr_mems = define_static_array(nsdms(^^Pointers));
static constexpr auto ref_mems = define_static_array(nsdms(^^RefBase));
struct [[=derive<Debug>]] Ref : RefBase {
// ...
};
};
Annotations are so cool.
Anyway, that’s great, since it lets us just print v[0]
instead of std::as_const(v)[0]
. But it doesn’t quite print it the way I’d like:
v[0]=Ref{RefBase{.x='a', .y=8}}
v[1]=Ref{RefBase{.x='f', .y=7}}
What if we just, at the point of formatting, forced the conversion to Point
? That requires two things. First, we need to add such a conversion. That’s no problem, we’ve already done that twice:
struct [[=derive<Debug>]] Ref : RefBase {
auto operator=(T const& value) const -> void {
template for (constexpr auto I : std::views::iota(0zu, mems.size())) {
this->[:ref_mems[I]:] = value.[:mems[I]:];
}
}
operator T() const {
return [: expand_all(ref_mems) :] >> [this]<auto... M>{
return T{this->[:M:]...};
};
}
};
And then we add some more functionality to our little formatting annotation library. Right now we just have derive<Debug>
. We could add some more information there — add which type we want to format as. Let’s do something like this. We’ll add a new annotation type:
struct format_as { std::meta::info type; };
Which we’ll add to Ref
(and then remove the derive<Debug>
from RefBase
as no longer necessary):
template <class T>
struct SoaVector {
private:
struct Pointers;
struct RefBase;
consteval { /* ... */; }
Pointers pointers_ = {};
size_t size_ = 0;
size_t capacity_ = 0;
// ...
struct [[=derive<Debug>, =format_as{^^T}]] Ref : RefBase {
// ...
};
};
Now we have to teach our annotation-based formatter to look for this other annotation. First, we determine what type we’re formatting as. That’s the format_as
type, if present:
consteval auto format_type(std::meta::info T) -> std::meta::info {
auto as = annotations_of(T, ^^format_as);
if (not as.empty()) {
return extract<format_as>(as[0]).type;
} else {
return T;
}
}
If we find an annotation of type format_as
, as[0]
will still be of type std::meta::info
— so we need extract<format_as>()
to actually pull out a format_as
value.
One of the advantages of the value-based reflection model is that a lot of Reflection functions are just that — functions.
And once we have this function, the simplest change we can make is to change our formatter to inherit from a different class template:
template <class T>
struct derive_formatter {
constexpr auto parse(auto& ctx) { return ctx.begin(); }
auto format(T const&, auto& ctx) const {
// all as before
}
};
template <class T> requires (has_annotation(^^T, derive<Debug>))
struct std::formatter<T>
: derive_formatter<[: format_type(^^T) :]>
{ };
And we’re done. std::formatter<Point>
and std::formatter<SoaVector<Point>::Ref>
both inherit from derive_formatter<Point>
, whose format()
takes a Point const&
. For Ref
, that implicit conversion just happens on the way in.
A Working Implementation
And with all that said and done, let’s look at this program:
struct [[=derive<Debug>]] Point {
char x;
int y;
};
int main() {
SoaVector<Point> v;
v.push_back(Point{.x='e', .y=4});
v.push_back(Point{.x='f', .y=7});
v[0] = Point{.x='a', .y=8};
std::println("v[0]={}", v[0]); // v[0]=Point{.x='a', .y=8}
std::println("v[1]={}", v[1]); // v[1]=Point{.x='f', .y=7}
}
We’re taking an arbitrary type, creating a struct of vectors out of each element, supporting pushing elements into it (handling the piece-wise split ourselves), reading elements out of it (ditto), and even supporting proxy references that print like the original type.
The complete implementation of SoaVector
here was less than 100 lines of code, plus a few other short helpers. Just pasting the actual implementation:
template <class T>
struct SoaVector {
private:
struct Pointers;
struct RefBase;
consteval {
define_aggregate(
^^Pointers,
transform_members(^^T, std::meta::add_pointer));
define_aggregate(
^^RefBase,
transform_members(^^T, std::meta::add_lvalue_reference));
}
Pointers pointers_ = {};
size_t size_ = 0;
size_t capacity_ = 0;
static constexpr auto mems = define_static_array(nsdms(^^T));
static constexpr auto ptr_mems = define_static_array(nsdms(^^Pointers));
static constexpr auto ref_mems = define_static_array(nsdms(^^RefBase));
struct [[=derive<Debug>, =format_as{^^T}]] Ref : RefBase {
auto operator=(T const& value) const -> void {
template for (constexpr auto I :
std::views::iota(0zu, mems.size())) {
this->[:ref_mems[I]:] = value.[:mems[I]:];
}
}
operator T() const {
return [: expand_all(ref_mems) :] >> [this]<auto... M>{
return T{this->[:M:]...};
};
}
};
auto grow(size_t new_capacity) -> void {
Pointers new_pointers = {};
template for (constexpr auto M : ptr_mems) {
new_pointers.[:M:] = alloc<[:remove_pointer(type_of(M)):]>(
new_capacity);
std::uninitialized_copy_n(pointers_.[:M:],
size_,
new_pointers.[:M:]);
delete_range(pointers_.[:M:]);
}
pointers_ = new_pointers;
capacity_ = new_capacity;
}
template <class U>
auto alloc(size_t cap) -> U* {
return std::allocator<U>().allocate(cap);
}
template <class U>
auto delete_range(U* ptr) -> void {
std::destroy(ptr, ptr + size_);
std::allocator<U>().deallocate(ptr, capacity_);
}
public:
SoaVector() = default;
~SoaVector() {
template for (constexpr auto M : ptr_mems) {
delete_range(pointers_.[:M:]);
}
}
auto push_back(T const& value) -> void {
if (size_ == capacity_) {
grow(std::max(3 * size_ / 2, size_ + 2));
}
template for (constexpr auto I :
std::views::iota(0zu, mems.size())) {
constexpr auto from = mems[I];
constexpr auto to = ptr_mems[I];
using M = [: type_of(from) :];
::new (pointers_.[: to :] + size_) M(value.[:from:]);
}
++size_;
}
auto operator[](size_t idx) -> Ref {
return [: expand_all(ptr_mems) :] >> [this, idx]<auto... M>{
return Ref{pointers_.[:M:][idx]...};
};
}
auto operator[](size_t idx) const -> T {
return [: expand_all(ptr_mems) :] >> [this, idx]<auto... M>{
return T{pointers_.[:M:][idx]...};
};
}
};
Of course this is still just push_back
and two overloads of operator[]
, I didn’t even add iterator support or any of the other functions you’d probably want. But this was basically already the hard part. Once we can do this, we’ve demonstrated that we can do everything.
For instance, you might want to be able to have v.fields().x
give you a std::span<char>
(or std::span<char const>
) — just the x
s. I’d manually done that earlier just to be able to test the implementation, but that’s actually a useful thing to want. Doing so is just another round of generating a type and then populating it.
Needless to say, I’m very excited about Reflection.
Comparison with Zig
I’ve written in this blog about Rust a number of times. And while I don’t know Rust very well at all, I have at least read the Rust Programming Language, written a number of small Rust programs, read blogs, watched talks, and talked to Rust people about Rust things.
I can’t say any of that about Zig. The only Zig talk I’ve ever watched is the one I linked to above (and that one arguably is less a Zig talk and more a “Practical Data-Oriented Design” talk that is someone language-agnostic). I haven’t read any Zig books or blogs, written any programs, nothing like that. I have browsed through the docs. Sort of. So I am not exactly in a great position to give a proper comparison to the Zig implementation here. I apologize in advance for all the egregious errors.
The Zig implementation of MultiArrayList
does an extra optimization that I didn’t think about: it does a single allocation — and then chunks that allocation up. Its layout is simply:
pub fn MultiArrayList(comptime T: type) type {
return struct {
bytes: [*]align(@alignOf(T)) u8 = undefined,
len: usize = 0,
capacity: usize = 0,
// ...
};
}
Coming from C/C++, Zig declaration syntax is a bit jarring — but it has a lot going for it. Declarations always read in one direction, there’s no spiraling. Here,
bytes
is a many-item pointer ([*]
) to an unknown number of suitably-aligned (@alignof(T)
)u8
, initialized toundefined
. Zig also differentiates between a many-item pointer ([*]T
) and a single-item pointer (*T
).
Now, whereas I created the struct Pointers
(which had a char*
and an int*
for the two fields), the Zig implementation doesn’t do that. Instead, when it chunks up its single allocation, it still keeps everything in u8
space:
pub const Slice = struct {
ptrs: [fields.len][*]u8,
len: usize,
capacity: usize,
};
ptrs
is an array of many-item pointers to u8
, one pointer for each field.
Eventually though, we need to go back to type-space. Before I walk through any of the code, there is one type being generated that is worth talking about:
pub const Field = meta.FieldEnum(T);
meta.FieldEnum()
is a Zig function that takes a type and returns an enum
(roughly a C++ enum class
) with an enumerator for each member. So if we had:
const Point = struct {
x : c_char,
y : i32
};
then meta.FieldEnum(Point)
would produce enum { x, y }
.
One of the things I really like about Zig is that, because types are first-class, you can just have functions that take types and return types. One of the parts of C++26 reflection that I dislike is our approach to generating types. Right now, I have to write:
struct Pointers; consteval { define_aggregate(^^Pointers, transform_members(^^T, std::meta::add_pointer)); }
When really the meaning and intent here is more akin to:
struct Pointers = transform_members(^^T, std::meta::add_pointer);
Zig gets me the latter. Even if I wanted to write
AddPointers<T>
, the only way to implement that is by doing one further layer of indirection:template <class T> struct AddPointersImpl { struct type; consteval { transform_members(^^type, std::meta::add_pointer); } }; template <class T> using AddPointers = AddPointersImpl<T>::type;
Also, let’s look at how FieldEnum
actually works:
pub fn FieldEnum(comptime T: type) type {
const field_infos = fields(T); // basically, @typeInfo(T).fields
if (field_infos.len == 0) {
// skipping for brevity
}
if (@typeInfo(T) == .@"union") {
// skipping for brevity
}
var decls = [_]std.builtin.Type.Declaration{};
var enumFields: [field_infos.len]std.builtin.Type.EnumField = undefined;
inline for (field_infos, 0..) |field, i| {
enumFields[i] = .{
.name = field.name ++ "",
.value = i,
};
}
return @Type(.{
.@"enum" = .{
.tag_type = std.math.IntFittingRange(0, field_infos.len - 1),
.fields = &enumFields,
.decls = &decls,
.is_exhaustive = true,
},
});
}
This is pretty similar in spirit to how std::meta::define_aggregate()
works, the primary difference being that we’re returning a new type rather than taking as a parameter the type we’re defining. An equivalent std::meta::define_enum()
should be a fairly straightforward extension.
Moving on. Eventually Zig has to get from a bunch of u8
s to specific types. We have items()
for that:
// for our Point example, this was enum { x, y }
pub const Field = meta.FieldEnum(T);
pub fn items(self: Slice, comptime field: Field) []FieldType(field) {
const F = FieldType(field);
if (self.capacity == 0) {
return &[_]F{};
}
const byte_ptr = self.ptrs[@intFromEnum(field)];
const casted_ptr: [*]F = if (@sizeOf(F) == 0)
undefined
else
@ptrCast(@alignCast(byte_ptr));
return casted_ptr[0..self.len];
}
There’s a lot of interesting stuff going on in the highlighted line so I wanted to call it out:
field
is acomptime
function parameter. This is what C++ would callconstexpr
function parameters. We already saw a few cases with passing a type as a function parameter, this is now passing anenum
.- This returns
[]FieldType(field)
. The type[]T
(a slice) is what C++ would spellstd::span<T>
, it’s basically many-item pointer and a length. Other languages have a slice type too — Rust spells it[T]
and D spells itT[]
. Anywhere you put theT
is valid in some language. - The specific type
FieldType(field)
is shorter convenience for@FieldType(T, @tagName(field))
. The@
functions are all Zig builtins. I’m guessing@FieldType
is capitalized because it’s a type, while@tagName
is a value.
For the call self.items(Field.x)
(or simply self.items(.x)
as a nice shorthand), we’d look up the return type as @FieldType(Point, @tagName(Field.x))
which is @FieldType(Point, "x")
which is c_char
. So this returns a []c_char
(or std::span<char>
).
The enum here is used two ways — its name maps back to the original field to gets its type, and its index is used to pick which byte array we’re accessing — that’s self.ptrs[@intFromEnum(field)]
(where @intFromEnum
is just an explicit cast from enum
to int
).
Lastly, let’s actually get an element. What does that look like? I’m simplifying the implementation slightly to only care about the struct
case:
pub fn get(self: Slice, index: usize) T {
var result: T = undefined;
inline for (fields, 0..) |field_info, i| {
@field(result, field_info.name) =
self.items(@as(Field, @enumFromInt(i)))[index];
}
return result;
}
One nice thing Zig has going for it is that, as a C language, you can just construct an undefined
object and then piece-wise assign the members. Can’t do that in C++, so this function was always going to be simpler in Zig than in C++. The @as(Field, @enumfromInt(i))
part is taking the int
iteration index and casting it to our Field
enum. inline for
is semantically equivalent to C++’s template for
.
But other than that, this implementation isn’t actually all that different than what we’d do in C++. Copying the member-at-a-time approach, the C++ implementation is:
auto operator[](size_t index) const -> T {
T result;
template for (constexpr auto I : views::iota(0zu, mems.size())) {
result.[:mems[I]:] = pointers_.[:pmems[I]:][index];
}
return result;
}
It’s interesting that while syntactic choices are quite different, the semantics are remarkably similar between the two approaches to reflection. While the Zig language is still very foreign to me, a lot of the implementation I nevertheless found familiar — because it wasn’t all that different from what I just did.
But there are a few choices Zig makes that I really like, that I wanted to specifically note:
- As already mentioned, the ability to “initialize” types in the same way that you could initialize other values. This may prove difficult in C++ for a number of reasons, but I think it’s worth considering — especially as we’ll want to pursue more code generation utilities.
constexpr
function parameters make for nice syntax, since you don’t have different syntax choices for different “kinds” of parameters.- Being able to implicitly name enumerators. I don’t know what Zig calls this feature, but basically the fact that this works:
const Color = enum { red, green, blue }; pub fn takes_color(c: Color) void { _ = c; } pub fn main() void { takes_color(.red); // ok, implicitly Color.red }
which is what allows this syntax to work:
var list : std.MultiArrayList(Point) = .{}; var xs : []c_char = list.items(.x); var y0 : i32 = list.items(.y)[0];
After all, is
f(.x)
all that different to support thanf({.x=1})
which already works? The best I can come up with in C++ right now is this syntax, using the proposedstd::constant_wrapper
:std::println("xs={}", v.items(std::cw<"x">)); // xs=['a', 'f'] std::println("ys={}", v.items(std::cw<"y">)); // ys=[8, 7]
One of the reasons this is reasonable in Zig is that
.red
is short-hand forColor.red
. But in C++, the qualified name for the enumerator isn’tColor.red
, we’d spell itColor::red
instead. Having.red
be short-hand for a completely different syntax might be a hard sell. And::red
already has meaning. We’ll see.
That all said, I will simply repeat how I ended the previous section: I’m very excited about Reflection in C++.