March 22nd, 2023
Let’s say you have a bunch of 2D boxes and you want to find only ones that intersect some other box. For example to do collision detection in a game. If you’re in a hurry you can just write a loop and be done with it:
// The 'Box' type is our 2D bounds struct.
std::vector<Box> filtered;
// Assume we will remove two-thirds of the elements.
.reserve(boxes.size() / 3);
filtered
// Keep only elements of 'boxes' that intersect with the 'target' box.
for (const Box& b : boxes) {
if (boxesIntersect(target, b)) {
.push_back(b);
filtered}
}
This is absolutely fine but if you’re only removing a small fraction of the elements then you might want to avoid creating that new filtered vector at all. So let’s do in-place filtering to save memory and avoid extra copies.
The idea is to remove an array element by copying the last one over it, and then shrink the array size by one. Obviously this will shuffle the array but since it actually represents a set in this case, we are fine. This is how it looks:
size_t count = boxes.size();
size_t i = 0;
while (i < count) {
if (boxesIntersect(target, boxes[i])) {
++;
i} else {
[i] = std::move(boxes[count - 1]); // (1)
boxes--;
count// (2)
}
}
.resize(count); boxes
Perhaps trickiest part above is the else branch at (2) where the index i doesn’t get incremented, so that the newly copied element will be correctly processed on the next iteration. I also used std::move() at (1) but it doesn’t have an effect here. It’s because the Box struct consists of just four floats (see filtering.cpp) so its contents can’t be really “moved”, only copied.
However, this is somewhat tricky code so using a library function may be a good idea.
We can use std::remove_if() for in-place filtering. Pass in a range and a lambda that tells if an element should be removed:
auto it = std::remove_if(boxes.begin(), boxes.end(),
[&target](const Box &box) {
return !boxesIntersect(target, box);
});
.resize(std::distance(boxes.begin(), it)); boxes
The function packs all elements we wish to keep in the beginning of the array and returns a new “end” iterator where the packed run stops. We could use the new smaller range directly in later processing but I chose to call boxes.resize() to get rid of the now unused elements at the end.
The code for the function becomes surprisingly clean once all the extra underscores are stripped away:
template<typename ForwardIterator, typename Predicate>
ForwardIterator(ForwardIterator first, ForwardIterator last, Predicate pred)
remove_if{
= std::find_if(first, last, pred);
first if (first == last)
return first;
= first;
ForwardIterator result ++first;
for (; first != last; ++first)
if (!pred(first))
{
*result = std::move(*first);
++result;
}
return result;
}
The thinking is the reverse of the “copy from end” technique: we make a copy of every element we wish to keep and skip over ones to be removed. If you know most elements will be filtered away then this approach will make fewer copies. Also there’s a std::move() in there again but Box still can’t be moved so it doesn’t have an effect.
I didn’t benchmark runtimes since the most important point was the in-place operation to save memory. But I did compare compile times :)
Alright now to the part they don’t teach in schools. To access std::remove_if() you need to include the <algorithm> header which should have a visible build speed impact on our tiny test program.
I profiled the build times of each approach with Clang’s
-ftime-trace
feature and took the fastest time of three
runs for each:
Approach | Build time (ms) | Build time ratio to fastest |
---|---|---|
Filter to a new list with push_back() | 381 | 100 % |
Filter in-place with copy from end | 386 | 101 % |
Filter in-place with std::remove_if() | 448 | 118 % |
As expected, the inclusion of <algorithm> header largely explains the difference:
Surprisingly <iostream> is dominating the compile time here so the build slowdown between std::remove_if() and our custom loop is only 15-20%.
On Windows you can do the same analysis with vcperf & cpp-build-analyzer.
One small but persistent annoyance caused by STL types and algorithms are the very long stack traces. For example compare what happens if I put a failing assert in the boxesIntersect() function, build in debug mode, and run it in gdb:
The second one is literally twice the size! In C++ you accept this as a fact of life and step through all these wrappers everywhere. If you’re debugging on Windows with Visual Studio, you can press Shift+Alt+F11 (Debugger.StepIntoSpecific) to open a little drop down menu to step directly into the function of interest, which helps a bit.
Source code for this experiment: filtering.cpp.
Thanks to noby for feedback on a draft of this page.