Projections are Function Adaptors

on under c++
15 minute read

There was a question recently on StackOverflow where a user was confused about the purpose of projections in a way that I think is fairly common. They wanted to do something like this:

struct Person {
    std::string first;
    std::string last;
};

std::vector<Person> people = { /* ... */ };
std::vector<std::string> r_names;
std::ranges::copy_if(
        people,
        std::back_inserter(r_names),
        [](std::string const& s) { return s[0] == 'R'; },
        &Person::last);

Obligatory link to post about std::back_inserter, or output iterators more generally.

The goal here was to put all the last names that start with 'R' into r_names. But this ends up not compiling. Why not?

I could paste the compile error, but I’m sorry to say that it’s not very useful (unless you already know the problem). But if we could actually get good compile errors, the one that this would emit would be:

error: std::back_inserter(r_names) does not satisfy std::output_iterator<Person&>

To which the user in question would be very confused. Of course std::back_inserter(r_names) isn’t an output iterator for Person, the goal was to output std::strings! Isn’t that the point of &Person::last? Why is the projection being ignored?

The reason for this user’s error is a common misunderstanding of the point of projections. If you take away one thing from this blog post, it should be this:

Projections do not change the algorithm. Projections are a simply convenient way to adapt the algorithm’s predicate (or function).

§ Projections do not change the algorithm

What do I mean by this? The goal of std::ranges::copy_if is to copy some (possibly no) elements of a source range into an output iterator. When you see:

std::ranges::copy_if(people, o, ...)

That algorithm is copying objects of type Person into o. What comes next (the predicate and the projection) only affect which objects of type Person are copied. They do not change the crux of the algorithm. They do not change what the source range being copied is.

Note also that std::ranges::copy, an algorithm that copies all of the elements of a source range into the output iterator, does not take a projection argument at all. It does not, because it also does not take any kind of function argument. As a result, it doesn’t need a projection either, because…

§ Projections are function adaptors

Let’s now correct the original example into something that compiles (although isn’t yet what was actually wanted): by providing an output iterator that accepts people.

std::vector<Person> r_people;
std::ranges::copy_if(people,
    std::back_inserter(r_people),
    [](std::string const& s) { return s[0] == 'R'; },
    &Person::last);

This is one way to write a call that copies all the Person objects whose last name starts with 'R'. But you could write it differently too, by simply combining both operations into a single lambda:

std::ranges::copy_if(people,
    std::back_inserter(r_people),
    [](Person const& p) { return p.last[0] == 'R'; });

These two calls do exactly the same thing.

That’s because ranges::copy_if(r, o, pred) copies every element e in r such that pred(e) is true. Whereas ranges::copy_if(r, o, pred, proj) copies every element e in r such that pred(proj(e)) is true.

We could write this differently still, by separating out pred(proj(e)) into a distinct function that we can invoke on e. That is, some new function f such that f(e) is pred(proj(e)). That’s your most basic kind of function composition: taking two functions and producing a new one that invokes the first and then the second. This is such a fundamental operation that in Haskell it exists as a language feature that takes just one character to compose functions: pred . proj (since the mathematics symbol for function composition is ).

In C++, we don’t have function composition as a language feature. It is, instead, a library feature. You can find it, for instance, in Boost.HOF under the name hof::compose. This other formulation also does the exact same thing as the other two:

std::ranges::copy_if(people,
    std::back_inserter(r_people),
    hof::compose([](std::string const& s) { return s[0] == 'R'; }, &Person::last));

Put differently, every algorithm in the standard library that accepts a unary predicate and a projection, such that we have algo(..., pred, proj) can have its projection dropped without any loss of functionality because users can do the function composition themselves and invoke it as algo(..., hof::compose(pred, proj))).

§ How to change the algorithm

Let’s take a look at a different example, that hopefully demonstrates better the use of projections (or function adaption in general) and the difference between changing the predicate of an algorithm and changing the source of an algorithm.

Consider a two-dimensional Point class, which is lexicograhically ordered by its x coordinate and then its y coordinate:

struct Point {
    int x;
    int y;

    auto operator<=>(Point const&) const = default;
}

std::vector<Point> points = {
    {1, 2},
    {1, 4},
    {2, 1},
    {3, 2},
    {0, 5},
    {6, 0}
};

If I want to take the smallest or largest points, I can do:

Point smallest = std::ranges::min(points); // (0, 5)
Point largest  = std::ranges::max(points); // (6, 0)

By default, ranges::min and ranges::max both use std::ranges::less{} as the comparison object and identity as the projection. Both algorithms always give you an object of the source range’s value type – regardless of the predicate or the projection. While it may be pretty obvious that ranges::min(points, pred) is always a Point (assuming it compiles), it’s really worth stressing again that ranges::min(points, pred, proj) is also always a Point (again, assuming it compiles).

Why might we want to use projections? Well, we might not want the smallest point lexicographically. We might want the point with the smallest Y-coordinate. We could do that by providing a custom predicate:

// this is the Point (6, 0), not the int 0
Point point_with_smallest_y = std::ranges::min(points, [](Point const& a, Point const& b){
    return a.y < b.y;
});

And that’s a perfectly fine approach. I have written code like this before and will continue to write codel ike this in the future. But it does conflate two different things: the < part and the repeated Point.y part. Repeating is a complete non-issue when it’s a single-character member name, but the thing we’re comparing could be arbitrarily complex.

Projections allow us to separate the concerns:

// still (6, 0)
Point point_with_smallest_y2 = std::ranges::min(points, std::ranges::less{}, &Point::y);

You can read this as computing the smallest value in points by &Point::y, or using the key &Point::y (as Swift and Python would call this, respectively).

Before, we were applying a projection to a unary predicate. Here, we’re applying a projection to a binary predicate. What does that mean? Well, instead of comparing pred(lhs, rhs) we’re first applying the projection to both arguments. That is, we’re looking at the result of pred(proj(lhs), proj(rhs)).

This isn’t the most simple kind of function composition, but it is still a form of function adaptor. Boost.HOF has us covered there too, with a very appropriately named adaptor:

// also still (6, 0)
Point point_with_smallest_y3 = std::ranges::min(points, hof::proj(&Point::y, std::ranges::less{}));

Earlier, I used the adaptor hof::compose, which has the meaning:

hof::compose(f, g)(x) == f(g(x))

Here, the adaptor hof::proj has the meaning

hof::proj(p, f)(xs...) == f(p(xs)...)

In the unary case, hof::proj(g, f)(x) == f(g(x)) == hof::compose(f, g). That is, just the order of arguments reversed.

But okay, I just showed three different ways to get the Point with the smallest Y coordinate, but that’s still getting the smallest Point. What if you wanted to just get the smallest Y coordinate? After all, this section is titled “How to change the algorithm” and I have not yet shown any change in the algorithm!

Of course, we could just do this:

int smallest_y = std::ranges::min(points, std::ranges::less{}, &Point::y).y;
assert(smallest_y == 0);

But this is again the same kind of repetition I mentioned earlier: we’re using our projection twice, once as an argument into ranges::min and again on the result of ranges::min. We don’t have to do that. If we want the smallest Y coordinate, the way to do that is to have a range not of Point but instead have a range of Y coordinates.

That’s not a function adaptor anymore, which means that projections can’t help us. Projections cannot change the source range, which is what we need to solve this problem. The way to change the source range is to use a range adaptor:

int smallest_y2 = std::ranges::min(points | std::views::transform(&Point::y));
assert(smallest_y2 == 0);

Let me place those two things together again to make this more clear:

// this is the Point with the smallest Y coordinate
// because min(points, pred, proj) is always a Point
std::ranges::min(points, std::ranges::less{}, &Point::y)

// this is the smallest Y coordinate
// because we're starting from a range of Y coordinates
std::ranges::min(points | std::views::transform(&Point::y))

In both cases, we’re using the same predicate (std::ranges::less{}, that’s just the default argument to ranges::min so we didn’t pass it in the second call. If we had named function arguments, we wouldn’t have had to pass it in the first call either) and we’re even using the same function to get the Y coordinate (&Point::y). It’s just that in the first call, it is used to modify the predicate, whereas in the second call it is used to modify the range.

Going back to the original example that started this blog post, what the user wanted was to copy std::strings that met a certain criteria. In order to ranges::copy_if a bunch of std::strings, you need to have a range of std::strings. What the user wanted to do was this:

std::ranges::copy_if(
        people | std::views::transform(&Person::last),
        std::back_inserter(r_names),
        [](std::string const& s) { return s[0] == 'R'; });

And by now it’s hopefully clear why the original attempt was incorrect and why this one works.

§ Preserving the Source Range

Another example might help demonstrate the difference between a projection (i.e. adapting the algorithm’s predicate) and transformation (i.e. adapting the source range). Let’s say we want to look in our points for a particular Y coordinate. There are potentially two ways to do that:

// project into ranges::find
auto i = std::ranges::find(points, 5, &Point::y);

// transform points into a range of y-coordinates
auto j = std::ranges::find(points | std::views::transform(&Point::y), 5);

Which is preferred?

It helps to consider what i and j actually are. With i, we’re finding in std::vector<Point> so we get a std::vector<Point>::iterator back (note that it does not matter what we’re finding, only the source range). With j, we’re finding in a std::ranges::transform_view<...> so we get a std::ranges::transform_view<...>::iterator back out. With i, that’s a useful result. With j, that is extremely unlikely to be useful – what you need is j.base().

Which might be a moot point anyway, since the views::transform version does not even compile to begin with because points | std::views::transform(&Point::y) is not a borrowed range.

The point here is: using projections allows you to preserve the source range, which means you get back iterators into the range that you want to get back iterators into. Transforming the source range is useful in other contexts, but these two approaches solve fundamentally different and orthogonal problems.

§ Projections in the Standard Library

There are, broadly speaking, five kinds of projections used in the standard library (which is to say, in algorithms in the std::ranges namespace). Note that projections are always unary.

  1. Applied to the argument of a unary function (e.g. ranges::for_each)
  2. Applied to the argument of a unary predicate (e.g. ranges::copy_if)
  3. Applied to both arguments of a binary predicate (e.g. ranges::min)
  4. Two different projections, each applied to only one argument of a binary predicate (e.g. ranges::equal)
  5. Applied to only one argument of a binary predicate (e.g. ranges::lower_bound and ranges::upper_bound)

I walked through some examples of the first three already (well, two of ‘em anyway, there’s not much difference between the unary function and unary predicate case). The fourth and fifth kinds merit a description though. Indeed, ranges::lower_bound and ranges::upper_bound were the motivations for adding projections, so I had better start with those.

Note that ranges::find technically doesn’t fit any of those categories, but it’s basically a special case of projecting one argument into a binary predicate (the predicate being ranges::equal_to and the other argument being the value that you’re finding).

§ ranges::lower_bound

In keeping with my vector<Point> points object from earlier, let’s say I now want to start maintaining my points in sorted order, including subsequent inserts. But I don’t want to sort it fully lexicographically, I just want to sort it by just the x-coordinate. And once I do that, I want to look for a particular x-coordinate using a binary search.

With the C++17 algorithms, that would look like:

std::sort(points.begin(), points.end(), [](Point const& lhs, Point const& rhs){
    return lhs.x < rhs.x;
});
auto it = std::lower_bound(points.begin(), points.end(), x, [](Point const& p, int v){
    return p.x < v;
});

The binary predicate that std::lower_bound takes is heterogeneous, but the binary predicate that std::sort takes is homogeneous. And further, the binary predicate that std::upper_bound takes is also heterogeneous but with the reverse order of parameters. This is messy, can lead to surprises if your types happen to be cross-convertible, but more importantly means that you can’t just use the same predicate in both contexts.

Projections address this problem. It’s just that in ranges::sort, the projection is applied to both arguments, while in ranges::lower_bound (and ranges::upper_bound), the projection only has to be applied to the element from the range, not the value being looked for. This means that you don’t have to remember which order the parameters go in (did I get them right earlier??), but importantly it means you can use the same predicate and the same projection for all of these algorithms.

The above, in C++20 using projections, is:

std::ranges::sort(points, std::ranges::less{}, &Point::x);
auto it = std::ranges::lower_bound(points, x, std::ranges::less{}, &Point::x);

Or, since std::ranges::less{} is the default:

std::ranges::sort(points, {}, &Point::x);
auto it = std::ranges::lower_bound(points, x, {}, &Point::x);

It may seem initially weird that we only apply the projection to half the arguments, and indeed this case doesn’t map nicely onto hof::proj. We could write a proj_left and a proj_right. It’s not that hard to do, but then we’d go back to having to remember which one of proj_left or proj_right you use for ranges::lower_bound. None of the other algorithms have this particular issue - projections are especially valuable in this case.

§ ranges::equal

Algorithms that take one range and a binary predicate really only need one projection, since all of your elements are only coming from one range. You just can’t have more than one projection. But algorithms that take two ranges and a binary predicate could easily want to project each range separately and you need a way to do so.

For example, consider our vector<Point> points object from earlier. Let’s say we wanted to compare its x-coordinates against a range of x-coordinates. We could still use ranges::equal for that, we just have to project the range of Point into a range of x-coordinate:

std::vector<int> x_coords = {1, 1, 2, 3, 0, 6};

bool a = std::ranges::equal(points, x_coords, {}, &Point::x, {});
bool b = std::ranges::equal(x_coords, points, {}, {}, &Point::x);
assert(a and b);

Let me explain what’s going on here. In both cases, I want to use std::ranges::equal_to{} as the binary predicate. But that’s the default, so I can just write {}. Then, in the first call, I need to project only the left-hand range, while in the second call, I need to project only the right-hand range. The default projection is identity, so I can just write {} for the appropriate slot (and the trailing {} in the first call is unnecessary, I just added it so that the calls are clearly symmetric).

This is a great example for why named function arguments would be a great feature (and why strong typing is a non-solution), if I could just have written the above as:

bool a = std::ranges::equal(points, x_coords, .by1 = &Point::x);
bool b = std::ranges::equal(x_coords, points, .by2 = &Point::x);

§ In Summary

To conclude, and repeat: Projections do not change the algorithm. Projections are a simply convenient way to adapt the algorithm’s function or predicate. In most cases, this projection could be applied by the user directly using something like hof::proj (which should help make clear that projections simply adapt the function or predicate), but for ranges::lower_bound and ranges::upper_bound, the heterogeneous nature of the projection makes them especially valuable.