In-place filtering of an array

March 22nd, 2023

Let’s say you have a bunch of 2D boxes and you want to find only ones that intersect some other box. For example to do collision detection in a game. If you’re in a hurry you can just write a loop and be done with it:

// The 'Box' type is our 2D bounds struct.
std::vector<Box> filtered;
// Assume we will remove two-thirds of the elements.
filtered.reserve(boxes.size() / 3);

// Keep only elements of 'boxes' that intersect with the 'target' box.
for (const Box& b : boxes) {
    if (boxesIntersect(target, b)) {

This is absolutely fine but if you’re only removing a small fraction of the elements then you might want to avoid creating that new filtered vector at all. So let’s do in-place filtering to save memory and avoid extra copies.

The “copy from end” technique

The idea is to remove an array element by copying the last one over it, and then shrink the array size by one. Obviously this will shuffle the array but since it actually represents a set in this case, we are fine. This is how it looks:

size_t count = boxes.size();
size_t i = 0;

while (i < count) {
    if (boxesIntersect(target, boxes[i])) {
    } else {
        boxes[i] = std::move(boxes[count - 1]); // (1)
        // (2)


Perhaps trickiest part above is the else branch at (2) where the index i doesn’t get incremented, so that the newly copied element will be correctly processed on the next iteration. I also used std::move() at (1) but it doesn’t have an effect here. It’s because the Box struct consists of just four floats (see filtering.cpp) so its contents can’t be really “moved”, only copied.

However, this is somewhat tricky code so using a library function may be a good idea.

The “just call a function” technique

We can use std::remove_if() for in-place filtering. Pass in a range and a lambda that tells if an element should be removed:

auto it = std::remove_if(boxes.begin(), boxes.end(),
                            [&target](const Box &box) {
                                return !boxesIntersect(target, box);

boxes.resize(std::distance(boxes.begin(), it));

The function packs all elements we wish to keep in the beginning of the array and returns a new “end” iterator where the packed run stops. We could use the new smaller range directly in later processing but I chose to call boxes.resize() to get rid of the now unused elements at the end.

The code for the function becomes surprisingly clean once all the extra underscores are stripped away:

template<typename ForwardIterator, typename Predicate>
remove_if(ForwardIterator first, ForwardIterator last, Predicate pred)
    first = std::find_if(first, last, pred);
    if (first == last)
        return first;

    ForwardIterator result = first;

    for (; first != last; ++first)
        if (!pred(first))
            *result = std::move(*first);

    return result;

The thinking is the reverse of the “copy from end” technique: we make a copy of every element we wish to keep and skip over ones to be removed. If you know most elements will be filtered away then this approach will make fewer copies. Also there’s a std::move() in there again but Box still can’t be moved so it doesn’t have an effect.

I didn’t benchmark runtimes since the most important point was the in-place operation to save memory. But I did compare compile times :)

Build time comparison

Alright now to the part they don’t teach in schools. To access std::remove_if() you need to include the <algorithm> header which should have a visible build speed impact on our tiny test program.

I profiled the build times of each approach with Clang’s -ftime-trace feature and took the fastest time of three runs for each:

Approach Build time (ms) Build time ratio to fastest
Filter to a new list with push_back() 381 100 %
Filter in-place with copy from end 386 101 %
Filter in-place with std::remove_if() 448 118 %

As expected, the inclusion of <algorithm> header largely explains the difference:

Clang’s build time “flame graph” for one compilation of the std::remove_if approach. Note how parsing the <algorithm> header takes only a quarter of <iostream>’s time. Clang version 14.0.0-1ubuntu1.

Surprisingly <iostream> is dominating the compile time here so the build slowdown between std::remove_if() and our custom loop is only 15-20%.

On Windows you can do the same analysis with vcperf & cpp-build-analyzer.

How it looks like in a debugger

One small but persistent annoyance caused by STL types and algorithms are the very long stack traces. For example compare what happens if I put a failing assert in the boxesIntersect() function, build in debug mode, and run it in gdb:

Stack traces of a failing predicate with “Copy from end” vs std::remove_if().

The second one is literally twice the size! In C++ you accept this as a fact of life and step through all these wrappers everywhere. If you’re debugging on Windows with Visual Studio, you can press Shift+Alt+F11 (Debugger.StepIntoSpecific) to open a little drop down menu to step directly into the function of interest, which helps a bit.

Source code for this experiment: filtering.cpp.

Thanks to noby for feedback on a draft of this page.