Mathieu Ropert

Meeting C++ 2022 trip report

2022-12-11T00:00:00+00:00

Contrary to some others C++ conferences, I don’t have a great attendance record at Meeting C++. The last (and only other) time I was there was in 2017, which incidentally was also the year I started to speak publicly at tech conferences. I had tried to come back in 2018, but that year most of my proposals didn’t make it (CppCon was the only one to accept any of the new ideas I came up with that year). The year after that, I travelled so much for events that I would probably not have made it to Berlin even if I was invited. And then the pandemic happened.

All that to say, it had been a while and my memories of the conference were a bit fuzzy and I wasn’t sure what to expect.

Venue & city

Since its inception 10 years ago, Meeting C++ has been held in the Vienna House Hotel in East Berlin. Speakers are lodged on site, and attendees are encouraged to do the same. That makes it convenient to freely go from your room to the conference in a flash in case you forgot something, or want to grab your coat before heading out with a dinner group.

Despite that suggestion, I found the post conference evenings at the hotel bar/lobby pretty quiet. I could only meet a handful of people there each night, mostly other speakers. Either folks got tired early, or they decided to head into town and went back directly to their room afterwards. I’m told the hotel also has a fancy sky bar, sadly it was either closed or booked for private events during our stay.

The reason I mention all this is because as good as the talks can get, I believe the hallway track and the evening tracks are usually the real reason I urge people to attend conferences. I’ve got some inspiration from live talks in the past, but not nearly as much as from a break or evening conversation with other attendees. To me this is the one thing that online conferences have failed to replicate (not for a lack of trying, mind you).

Flights to Stockholm being rare, I arrived early a day in advance and had another almost full day after the conference due to a late flight back. Given the conference hotel location, it was quite easy to head into town and see the city. Berlin doesn’t lack in sights, museum, restaurants, concerts and the like. If you find yourself going to the conference in the future with some spare time, you should easily be able to find something of interest to do.

Keynotes

Thursday

The conference opened with Nicolai Josuttis’ Belle Views on C++ Ranges, the Details, and the Devil. In a large tour of C++ std::views, Nico shared his concerns about the potential pitfalls and traps that lie and await the unwary programmer. In particular, the fact that views of non-const containers allow mutability of elements (similar to std::span), but also offer no simple way to build a view to const, nor a simple way to write a generic function that will constrain the view to const elements and ensure any confusion between = and == will be stopped on its tracks by the compiler.

Worse, he also demonstrated several dangerous edge cases stemming from views possibly caching data to guarantee amortized constant time on begin(), making some view filters behave differently depending on the underlying container. I’m told some of those points were already mentioned in Nico’s book on C++20, and lead him to recommend staying away from std::views until the issues are addressed (std::ranges are fine though).

Finally, he briefly demonstrated an alternative implementation he made named Belle Views, which diverges from the standard to avoid all those potential pitfalls (at the other cost of different tradeoffs). Sadly the repository had to be taken offline the following day as it turns out some code was lifted from GPL-licensed STL implementations which were incompatible with the project’s license. I’m sure the repo will come back up later once those issues are sorted out.

There has been some discussion online since that talk came out, and so far it looks like there’s no big consensus on the topic. Some countered that those examples can be convoluted (especially reusing views or using views taken before a container is mutated), and that views are explicitly made to be similar to span and that no one should expect a different const propagation model.

Friday

The second day continued with Contemporary C++ in Action from Daniela Engert. In about an hour and half, she walked us through the entire code of one of her recent projects written with C++23. The whole thing was a client/server app for Windows able to stream animated GIFs from a server to a client that would display them as video.

While it did of course make use of 3rd party libraries (such ffmpeg, SDL and Boost.ASIO) for rendering, video decoding and networking, it was impressive to able to read through entire thing in the time of one talk. One might have expected the code to be quite verbose, but the point was that through the use of contemporary C++ features it was possible to fit the whole application in about 1000 LOCs.

Still while impressive, it was a lot of code to go through and I felt like a few more high level slides would have helped digest the whole thing. I guess the issue was to be able to fit all that under an hour and half, which is difficult for most “real” projects.

Saturday

The closing keynote was Breaking Dependencies: The Path to High-Quality Software by Klaus Iglberger. In it, he made an interesting observation that C++ conferences focus a lot on language features and other “implementation details”, and not so much on software architecture. A given conference will probably have a bunch of talks about the latest new thing coming up with C++23, but barely one mentioning design patterns or how to build good C++ software from a higher level point of view.

Sadly, it felt like the discussion lost its track halfway through the talk, spending in my opinion too much time talking about the nitty gritty details of how to implement a design pattern like the Singleton or the Factory in Modern C++ and the talk ran out of steam for me.

Still, I believe there is an argument to be made that we should talk more about software architecture in C++ compared to the amount of time we spend on NRVO, SFINAE and other random letter combinations.

Talks

As usual in my trip reports, I will not go over each talk I attended but rather mention the ones that especially caught my attention, one way or the other.

Lambdas—how to capture everything and stay sane from Dawid Zalewski gave a good recap of all the lambda capture rules, reminding us of the caveats and best practices to keep in mind.
Foundations of GPU programming by Filipe Mulonde showed that there is a lot to explore in GPU programming, but sadly an hour was not enough to cover all he wanted to talk about and I’d recommend splitting that talk into two, or maybe three parts.
Hana Dusíková’s Lightning Updates demonstrated an interesting way to implement real-time update and hot reload of program data (in her case an antivirus database) with a rather elegant model.
After attending and (re)watching a few talks about the topic, I finally managed to make sense of Deciphering Coroutines - A Visual Approach from Andreas Weis. I think he did his best to make them palatable, I just find the topic quite hard to follow due to the amount of glue code that needs to be shown when using C++20 coroutines (partially due to lack of library support).
C++ MythBusters by Victor Ciura was a nice and refreshing experience after some of the more “heavy” talks, exploring the most common myths and legends about C++ and whether or not they still make sense today (if they ever did).

Final thoughts

This might be a controversial opinion, but I really do feel like the return to in-person conferences is a blessing. Some folks might be confortable with the online experience, but I have to admit my attention span during the last three years of CppCon (online each time) has plummeted to almost nothing, to the point that I feel like I would only attend online if the program is in need of content as a speaker. There is just something that doesn’t work for me with sitting in Zoom calls and barely getting any interaction with the speakers or the attendees. Between Meeting C++ and ACCU this year, I was really happy to be able to reconnect with friends and peers, discuss ideas and get some inspiration.

Do I recommend attending Meeting C++ if you can? Absolutely! Will I try to be there next year? Maybe. As more and more conferences reopen in-person, I’ll have to make some choice as to which ones I can reasonably go to. But it’s high on my list.

ACCU 2022 trip report

2022-04-11T00:00:00+00:00

Is it a coincidence that the last trip report I wrote was for ACCU 2019? Maybe it’s due to timing (I usually end up on a plane soon after and can write this while it’s still fresh)? Or maybe there’s something about this small British conference. A je ne sais quoi that keeps bringing me back?

In any case, I was in Bristol from April 5th to April 10th for ACCU 2022, my first in-person conference since the world shut down in 2020.

Flights, luggage, drugs and horrible support

Stockholm-Arlanda isn’t a big airport. In 2019 I got lucky and managed to find a combination of flights to avoid any connection (I cheated a bit by making a stop in Paris). This year I had to go through AMS. I showed up at Arlanda with a cabin luggage that they promptly asked me to check-in due to an overbooked plane (quite common for the past years, sadly). They assured me it would follow me to Bristol and was fine to put there as long as it didn’t contain any electronics. And then, of course, they lost it.

If Arlanda isn’t a big airport, then Bristol is a strip of dirt in the middle of nowhere. I had lost luggage before. There was a kiosk by the baggage belt where someone would promptly scan my receipt, tell me where my case was and schedule a delivery within a day. Bristol has no such thing. They had a pile of paper forms, a mailbox, and a sign telling you to fill A by hand and put it in B.

I still don’t know what went wrong (if anything) with my paper form, but after 3 days I had no news. The airport wasn’t taking calls. Emails to the baggage service went unanswered. Airline website told me to contact the airport. Airport website told me to contact the airline. Airline hotline took 2 days to reply I should use the lost luggage form on the website (the one I had already tried many times). And so I resigned myself to do the only thing that seems to get results these days: I screamed at the airline on social media. Within 12 hours I had a human on the line to explain my problem to. It took them another 2 days to file a missing baggage report on my behalf so I could finally track it down. I finally got a call today about scheduling a delivery, after I had left Bristol.

But I wasn’t done with my trouble. See, that bag that was never supposed to be checked in the first place contained my insulin supply for the trip. I had some extra on me as always, but not enough for the whole stay. Which lead me to another round of endlessly calling hotlines to get my problem solved. Much like airlines having no escalation mechanism when an airport isn’t in their system, the NHS hotline isn’t equipped to deal with foreign phone numbers, prescriptions written in Swedish and people without an existence in the British medical system. Although I eventually managed to get it sorted out, it took me most of Friday, including 3 and a half hours sitting in my hotel room (the only place with a landline that NHS would take a call from) getting bounced between hotlines.

I believe there’s a lesson there for fellow software engineers. Let’s ask ourselves: what can our users do when the “golden path” in our application workflow fails? Do they have a mean escalate? Do they get a special code allowing them to bypass the steps they already tried 10 times before calling tech support? We shouldn’t expect a given system to be able to handle all possible exceptional cases, but it should be possible to get a human on the phone when the workflow breaks down. And there should be an easier way to do it than making a scene on social media until the PR department panics.

Anyways, don’t expect many reviews of the talks on Friday, I missed most of those due to the above.

Keynotes

Wednesday

I was quite excited to hear what Guy Davidson had to say about Growing Better Programmers. Like many others I’m told, I’ve been noticing growing difficulties hiring and retaining programmers, especially senior ones. Two years of remote work have changed a lot in work patterns, more and more employers offer remote work from further away (or anywhere really), leading to new competition in your local area.

The talk didn’t turn out to give me as many answers as I’d hoped, although it’s probably fair to say this is a complex topic with many variables, a bunch of them depending on your local environment that may not translate well to others. That being said, the talk did reinforce my belief that we should always spend more time on mentoring, it is unlikely that our current structure does that enough.

An interesting takeaway that came up in the post-talk chat was that there is sometimes an aura of mystery surrounding experts and leads that make them feel all-knowledgeable and unattainable, while in fact they do refer to documentation, talks, searching their conversation history or good old trial and error on Godbolt when asked a specific technical question, much like everyone else. This was likely reinforced the past two years when there wasn’t really a way for people to glance at how we handled things from the confines of our remote workstations. Now that we might be back to the office, perhaps some extra attention at making our expertise look more mundane would help encourage people to grow into those roles themselves?

Thursday

Thursday’s keynote gave us Hannah Dee’s Diversity Is Too Important to Be left to Women. I’m afraid I wouldn’t do it justice by trying to summarize it. In short, it was great, go and watch it and recommend your colleagues (especially male colleagues) to do the same. It’s a very solid recap of the diversity issues our profession is facing, studies about how and why it’s happening, and more importantly what we can do about it.

Saturday

The conference closed with Titus Winter’s Tradeoffs in the Software Workflow, an attempt at answering the old age question “how can I put a number on how much money we would save (or lose) by having quality gates in our development process?”. As he rightfully observed, answering that question often leads to trying to estimate the cost of something that did not happen. Some may say if can you catch the worst in beta-test, or even if you’re quick enough to patch production if needed, the value is 0. Others will sum the potential damages of recent near-misses and claim that’s how much revenue loss the company saved. One might even say that blocking merges on a red CI test is making the company lose money because it stops other engineers from being able to check in their work.

Instead, Titus argues we should focus on the cost of fixing a bug when it’s caught to see which quality gates give us the best bang for our buck, because that is a number we can agree upon. Issue caught by a unit test on the developer machine before they submit for review? An hour of their time maybe? Issue caught during RC testing? A couple engineer hours (at least) to find the root cause and fix it, plus the time spent on making a new build and re-running the RC validation process. Merging something that breaks the builds? Sum of the salaries of the teammates who can’t work until it’s fixed.

The talk also touched upon a few studies showing that shortening release cycles leads to better software value and quality. The issue for me is that this isn’t as easy a change because it ties to the business model. For companies like Google who get revenue from freemium or ad-driven apps, this is a non-issue. But for other industries (like games) where you make money by selling apps or expansions to an app, it’s a harder problem. I for one cannot sell a patch each week or two, the shortest we can do in our market is probably a small DLC every 3 to 6 months. Some games do run on a subscription model, but as far as I know they are far from the norm. It’s probably more common to be on a free-to-play model than a monthly subscription, even if you discount mobile games. And personally, I don’t think free-to-play games are better value for their players.

Talks

It has been a while since I talked about build systems (my energy for it comes and goes), so I was happy to see CB Bailey take upon the mantle early on Wednesday with their Things I Learnt While Trying to Avoid Becoming a CMake Expert talk. I found it a good introduction to CMake for beginners. I really do appreciate any attempt at reminding the world build files aren’t sorcery or a job only for the build engineer. There were also a good share of practical tips about caveats that can trip CMake newcomers.

On the topic of build-related talks, Diego Rodriguez-Losada was there to present us the upcoming version Conan 2.O. Overall I liked where this is going, although I worry a bit that the focus on supporting every build anti-pattern in the world will make the tool more complicated for the rest of us in the future, the same way CMake offers you a million ways to shoot yourself in the foot. While I understand the value of offering Conan’s existing users a way to integrate with their build in its current state, I would really like to see all those as an “advanced mode” toggle you have to opt-in to so that new projects can stay on the safe side. A lot of bad builds today are the product of engineers not quite understanding what they are doing and fiddling a bit with settings until something kinda works. I’d rather have something more restrictive with a checkbox that reads “I understand that my build is bad and I should really refactor it, but I need the foot gun right now”. Maybe not the best label when you’re trying to sell this to companies, I suppose.

A short must-see talk from this year, Björn Fhaller’s Phoenix? was a very personal story about burnout and an important warning to all of us. As for Hannah’s talk, I can’t say much more than “check it online as soon as it’s uploaded”.

The same session brought a few other short talks, including Remote is a Four Letter Word by Dom Davis, a reflection about what it means to work remote, and how it should rather be considered “co-located”. Remote implies the idea of isolation and difficulty to reach, which is usually not the case with today’s technology, and trying to approach it that way will probably lead to bad communication in the team. I’d also like to give a quick mention of Jim Pascoe’s Why it’s Good to Be a Software Manager which discussed the hard to name role of Tech Lead / Software Manager and gave me some food for thought. Programmers aren’t good at names, and that does include job titles.

If you know anything about me you know I like history, so of course I really appreciated the double feature of Jim Hague’s The Victorian Internet and Gail Ollis’ Do you want fries with that? From punched cards to devops and beyond. If you’re interested in the history of computers and networking, those are both worth a watch, and only 45 minutes each.

The best talk of the conference was, of course, The Bascs of Profiling by yours truly. Joking aside, the talk went well despite the previous day’s adventures. If I managed to convince at least once person to check-out Optick or a similar instrumentation profiler, I’ll consider it a win. And I believe a few people in the audience told me they would.

Another game related talk, Sandbox Games by my fellow ex-gamedev Ólafur Waage showed us how to use WebAssembly to mix native code with HTML/JS tech, and even how to make a client web app that is about 99% C++. The tech still has a few things to figure out (such as providing some form of threading/TBB equivalent), but I wouldn’t be surprised if we soon start seeing some classic (and more recent) games that can run at reasonable performance in a browser. Anything that relies on SDL and OpenGL should be a relatively straightforward port, as long as they don’t need multithreading (for now). Asset loading might also prove a bit more difficult as data needs to be streamed over HTTP which is quite slower than a local SSD, but maybe not really worse than a good old mechanical HDD depending on your internet connection.

Finally, Frances Buontempo’s Crowd Your Way Out of a Paper Bag was a nice introduction to cellular/swarm AI with a dash of multi-agent. I think this could be turned into a good workshop/kata/mob programming exercise as the AI and rules can be evolved quite iteratively once the basic rendering has been made. I might give it a shot with my team once the code is published.

Closing Thoughts

While the talks were great, they pale in comparison to the social aspect of the conference. This was not only the opportunity to meet-up with long lost C++ friends after two years, but also a great source of energy and inspiration sprung from random post-talk (or bar) discussions with other attendees. I have mentioned it in past trip reports, and I believe some conference organizers have been caught saying something like “we know our real value is the hallway track”. To me this rings 100% true. Not only did I have serious trouble keeping my attention up during remote conferences, but it was also much harder to strike up a random conversation with other people and get a lot of value out of it. Don’t get me wrong, conferences have tried their damn best to make online conferences emulate the social aspect through different online platforms, but at the end of day there’s no good substitute to haunting the venue corridors and overhearing conversations.

I really do hope this isn’t just a phase and we’ll keep seeing more conferences return back to normal. You see despite the lost luggage, the struggle to get meds, British food and the hotel bar wine card, I would still absolutely recommend you go to ACCU next year!

PhysFS performance, a story of threading and locking

2020-07-26T00:00:00+00:00

Loading screens are pretty cool. They let artists showcase some nice art while the intro theme song starts playing. Used well, they can set up the stage for the eventual play by putting the player in the mood. But that’s only a side effect. Their main purpose is to keep the user busy while your game loads and initialize everything it needs to render the main menu, and possibly more. But after the first hundred or so starts, the experience may get old. Especially if that loading bar seems to be stuck forever.

Our Grand Strategy titles at Paradox have a reputation to not be on the fast side to load up, especially considering that they are not exactly next-gen graphics games with tens of gigabytes of assets. A couple months back, I decided to take a look at how Clausewitz (our internal engine) loads up stuff and what took so long.

The answer was interesting: most of the time was spent doing nothing. More precisely, either doing nothing by sleeping on a condition, or burning CPU cycles waiting a boolean to become true. We had a locking problem.

Thread safe, but not thread efficient

The big lock contention was on file access. Every read, write, open, close or enumerate ended up locking a mutex or spin lock somewhere. And most of it was not even due to our engine.

To ease up filesystem access, we rely on a 3rd party library called PhysicsFS, or PhysFS. It’s a pretty handy way of mounting several directories and archives under the same logical hierarchy and treat it as one, handling things like priority when resolving names which is crucial to support DLCs and user mods properly. It also supports a variety of platforms and archive formats. But it had one fatal flaw: it’s bad at threading. Really bad.

Some time ago, my predecessors decided to load up the game data from disk with several concurrent threads. While it was probably efficient at the time, today it had become quite the pain, since PhysFS implements thread safety through one global mutex that is locked on all API calls. Meaning one thread doing a read() would block any other trying to either read, open, close or seek a file (be it the same or another one).

static void *stateLock = NULL;     /* protects other PhysFS static state. */

int PHYSFS_close(PHYSFS_File *_handle)
{
    FileHandle *handle = (FileHandle *) _handle;
    int rc;

    __PHYSFS_platformGrabMutex(stateLock);

PhysFS has been there for quite sometime, and at some point an issue was raised about the thread safety of the API, which was resolved by putting a global mutex or two. While probably good enough at the time, it turns out to be a serious performance killer on a modern machine with 8 cores trying to access the filesystem concurrently. In several cases it turned out to be even slower than processing the same data serially in one thread.

Improving the threading model

Luckily for me, it had been recently decided that we would embed a modified copy of PhysFS in our engine, which made it easier to tinker with it until I had something satisfactory. While I could probably make those changes public, they would have little chance of being accepted mainstream as is, as one of my first decision was to drop usage of C in favour of C++. Instead, I’ll describe here was I did to make our implementation more friendly to threads.

As I hinted, the first thing I did was to tweak the build (of course!) to have our local fork of PhysFS compiled by a C++ instead of a C compiler. Our engine is entirely built in C++ and there was little reason for me to keep dealing with C in 2020, especially if locks were involved. For example using macros and gotos instead of RAII to ensure proper unlocking on early returns is pure masochism if you ask me. See for yourself:

#define GOTO(e, g) do { if (e) PHYSFS_setErrorCode(e); goto g; } while (0)
#define GOTO_ERRPASS(g) do { goto g; } while (0)
#define GOTO_IF(c, e, g) do { if (c) { if (e) PHYSFS_setErrorCode(e); goto g; } } while (0)
#define GOTO_IF_ERRPASS(c, g) do { if (c) { goto g; } } while (0)
#define GOTO_MUTEX(e, m, g) do { if (e) PHYSFS_setErrorCode(e); __PHYSFS_platformReleaseMutex(m); goto g; } while (0)
#define GOTO_MUTEX_ERRPASS(m, g) do { __PHYSFS_platformReleaseMutex(m); goto g; } while (0)
#define GOTO_IF_MUTEX(c, e, m, g) do { if (c) { if (e) PHYSFS_setErrorCode(e); __PHYSFS_platformReleaseMutex(m); goto g; } } while (0)
#define GOTO_IF_MUTEX_ERRPASS(c, m, g) do { if (c) { __PHYSFS_platformReleaseMutex(m); goto g; } } while (0)

Once that was done, I was able to put all the API state (a few globals) inside a struct that would only be accessed through a RAII accessor that would lock/unlock on construction and destruction. Then I started checking if that state could be split into several sub-structures with each their locks. This allowed me to have a clearer understanding of the data model and where the contention was.

As it turned out, about 75% of the locking was done to ensure that the filesystem configuration was not changed during pathname resolution. As PhysFS aggregates several mount points under one root (like UNIX mount does), the API needed to lock the list of mount points to protect against data races should another thread mount or unmount a path at the same time.

The thing is, mounting or unmounting paths is a rare and specific operation. It is usually set up once when the game starts, and only change at very specific places (for example if the game allows to enable/disable mods on the fly). Almost 100% of the calls to mount() and unmount() are done while no other thread is loading as even with locks it would make the result unpredictable. Locking the whole API for that very rare case made no sense for us.

To solve this I introduced a freeze_config() call in the PhysFS API that would turn the configuration read-only internally. With proper use of const accessors, it allowed me to bypass locking entirely for read accesses to configuration while the freeze flag was set. Any non-const access to the configuration state in that mode would assert or throw an exception to trap the logic error. Since all access to the configuration singleton was now handled by a RAII accessor, it was easy to handle the change at one single point and have the whole PhysFS code behave like I wanted. This reduced the lock contention on FS access drastically, making threaded code actually run in parallel.

The next lock I got rid off was the error lock. PhysFS keeps track of errors by having a form of errno per thread which is stored inside a linked list of thread id / error code pairs:

typedef struct __PHYSFS_ERRSTATETYPE__
{
    void *tid;
    PHYSFS_ErrorCode code;
    struct __PHYSFS_ERRSTATETYPE__ *next;
} ErrState;

static ErrState *errorStates = NULL;

static void *errorLock = NULL;     /* protects error message list.        */

The API offers a way to get a description of the last error encountered on this thread. The canonical solution would have been the change the error reporting entirely, as keeping the last error code as internal state is considered bad form today (for good reasons), and instead ensure each API call would return an error code or exception when used. This would have, of course, required to change the whole API and all the callers.

Instead, I went for a quicker solution that made the last error code be a thread_local static variable inside PhysFS. This allowed me to get rid of most of the error code storage implementation, as there was no more need to handle a manually implemented linked list or lock anything. Unless the library is used in an environment where threads are really tight on thread local storage (which wasn’t my case), I found it to be a simple and elegant enough solution to the problem.

A lock might hide another

A surprise that came out of my new locking mechanism is that some lower bits of the library were not using any form of synchronization and simply relied on the main API lock to protect against data races. The archive file driver, for instance, shared a bunch of data meaning that two concurrent accesses to files in the same archive could result in a race condition.

Luckily, it turned out that access to those archive I/O handles were done a very few places so I could easily solve the issue by putting a lock at archive level. I would have searched for a more elegant solution if needed, but it turned that this one lock never showed up in the profiler, so I left it as is.

Still, that illustrated the fact that putting a global lock may lead to a bunch of unseen potential data races which will appear later once that lock is replaced by a more efficient mechanism, requiring changes in cascade throughout the implementation, down to the lowest levels.

The final lock I had to remove turned out to be on our side, as apparently someone had encountered an issue with PhysFS thread safety in the past on UNIX platforms and had added another spin lock on the call site of the API to address it. With the entire locking inside PhysFS now handled by std::recursive_mutex and std::atomic I could safely remove it.

Cascading bottlenecks

As my readers may know, removing a bottleneck somewhere does not always solve the issue. Quite often, it only makes the next one visible. In my case, the biggest one to show itself after I was done was PhysFS was our asset loading system. Since PhysFS used to perform so poorly, there was little point trying to load some assets in parallel, especially back when our main renderer was DirectX 9.

Graphics API were historically also bad at threading. For example DirectX 9 isn’t thread safe by default, and the “thread-safe” flag just enables a big mutex, exactly like PhysFS does. The official documentation recommends to not use it and instead do all API call on one thread. It’s one of the reason a lot of games still use a dedicated “rendering” thread to this date. While there are pros to this approach, the big drawback is that any model or texture loading as to be done in a serial fashion, which is quite sub optimal.

Fortunately more recent graphics API such as DirectX 11/12 and Vulkan are perfectly capable of loading textures, vertex buffers and the like in a threaded fashion. With the unexpected help of my great technical director, we were able to load most 3D models in parallel if the renderer supports it. This lead to another drastic improvement of loading times, as 8 cores can process vertices much faster than one.

One observation I’d like to highlight is that those upgrade requirements stack up. For example without those changes to PhysFS there would be little reason for us to port our older games to a newer version of DirectX unless was had plans to use the new graphics capabilities offered by the API. But with the file access contention issue solved, switching to a rendering API that supports threaded loading becomes much more interesting. Sometimes technical upgrades and refactoring are more valuable for the opportunities they open that for the immediate gains.

Tooling

Before concluding this article, I want to stress out the importance of using a profiler to find bottlenecks. In my case this round of optimizations was strongly guided by Intel vTune.

The timeline visualization of cores activity in particular was quite handy to figure out when the CPU could be more effectively utilized. I found it useful to both verify assumptions but also to get a sense of some bits of the codeline that I wasn’t familiar with.

For example, it made me realize that all the game music in some of our titles was loaded upfront while the only one we really cared about during startup was the main theme. A moderate refactoring of the music player implementation allowed me to load only that particular track, and start a low priority background thread to load the rest for later once the player actually starts interacting with the ingame music player.

Knowing how your user will interact with your application is key when deciding when to do computation in the background, as it makes it easy to pick a time frame that is usually low on CPU utilization and won’t hurt performance of the main activity.

In Conclusion

First, see for yourself the difference on a side by side comparison I recorded on my home computer booting Stellaris (i7 2600k, 16GB of RAM, game installed on SSD drive):

Now, I do not consider myself new to threading and related topics, yet I still did not believe at first that I would be able to achieve such a drastic improvement. By the end of the iteration we had at least one game starting up more than twice as fast as it used to do on my machine (from over a minute to a few tens of seconds).

Looking back at my engineering school days in the 2000s and the years of work that followed, I now wonder how many opportunities were missed, both in teaching and applying those teachings. We keep saying over and over that modern CPUs are fast and can crunch an insane amount of instructions per second, but the true challenge is to make effective use of it. It is one thing to see the impact of threading or cache friendliness on slideware, but it’s another to apply it to a codebase that has existed for years or decades.

So, as many before me have recommended, please run your code through a profiling tool and see how good your CPU utilization is. It may come at the price of a costly refactoring, but the potential of getting twice or even ten times as fast is there…

… the only drawback is that now your composer needs to make the intro part of your game’s theme song much shorter.

Fifty shades of debug

2019-08-03T00:00:00+00:00

“They’ve done it again”, he exclaimed.

“The Committee has made a draft of what will become the next iteration of C++, and this time again they crammed a bunch of new language features but did not make my code faster in debug!”, he continued.

As the outrage reached its peak, he dared to ask what everybody had been secretly wondering all along: “What are those guys even paid for?!”

Every once in a while, a similar drama flares up on social media. I could go at lengths to explain that nobody pays people to attend ISO meetings except (sometimes) their own employers, or that WG21 doesn’t write compilers but this is not the article for it. For that you can read my previous entry.

Instead today I will argue about semantics.

The semantics of Debug

The recent episode reminded me that this is not the first time “C++ performance in debug” is brought up, and that again I wasn’t understanding the need. I almost never run a build of my game in what I’d consider “Debug”. The speed trade off is just not worth what it brings up for my day to day work. I only run unit tests in debug.

But let’s take a step back.

In his great talk, What Do You Mean?, Kevlin Henney made a good observation about the fact that semantics are quite important when you are trying to make a point, else your discussion isn’t gonna go far.

So when we say “Debug build”, what do we mean? As a quick experiment, I ran a poll on Twitter:

In C++, Debug means...
— Mathieu Ropert (@MatRopert) July 30, 2019

As you can see from the 287 answers, there is some majority vote for debug symbols, but that’s not exactly what I’d call a consensus. Worse, out of all the options I offered, this is by far the one that has the least impact on runtime performance compared to full “Release” build!

There was also trick in my question: those 4 options are (almost) entirely orthogonal. You can mix and match them as you like while turning the others off. A better poll would have allowed for multiple answers but Twitter is quite limited there, so let’s not try to draw too many conclusions from this.

Four Axis of Debug

As conveniently laid out in my poll, a “classic” C++ build profile will offer (at least) 4 axis to play with on the debug spectrum:

-O0, /Od and the like control optimization. When disabled, the generated code will strongly resemble the C++ source code. No inlining will be performed, no instruction reordering, loop unrolling, variable elimination, nothing. This will have a huge impact on performance, as we usually write our code for humans first, and machines second.
-g, /Zi and friends decide whether or not the compiler emits debug information. The process of compilation being very lossy, the debugger needs extra data to be able to link back machine code to source code when debugging. That database can be stored outside the actual binary, and has no impact on performance by itself.
/MTd, /MDd will switch the C++ runtime to a special debug version that contains extra checks. On Windows, the debug runtime will help catching heap corruptions and bad iterator accesses. It comes with a performance cost that is not nearly as bad as it looks if optimizations are enabled and the standard algorithms are used. Contrary to the two previous options it has ABI implications and should be turned on for all the objects in the binary (including dynamic libraries).
-UNDEBUG (or the absence of NDEBUG in defines) ensures asserts are built in. The assert() macro function is required to only produce runtime checks if NDEBUG isn’t defined. The impact will greatly depend on the codebase usage of it. A badly placed assert in a tight loop can kill performance for good, whereas a sanity check on the inputs of a database API will probably not have a noticeable effect.

As previously mentioned, those 4 options can be mixed and matched depending on what’s needed and which tradeoffs are acceptable.

It’s important I think to notice that the most sought after feature, debug symbols, have no impact on performance and can even be stored outside the program for added isolation. For example it is customary for libraries on Windows to ship .pdb files alongside .dll files so that developers can get some modicum of information even in release mode.

I am genuinely curious as to why it came up as the most strongly associated with Debug while it is in fact probably the least tied to the “performance in debug” debate. A small suspicion I have is that it could be due to CMake defaults, in which the Release profile doesn’t include debug symbols. In which case let me tell you, dear reader, that CMake defaults are wrong and you should override them. For example, if you create a project using MSVC’s wizard, it will only generate 2 profiles, Debug and Release, both of which produce debug symbols.

Debug runtimes and ABI fun

The second one, the debug runtime (/MDd or /MTd) is a bit more complex. In my opinion its biggest impact is on compile time. Since it’s potentially an ABI breaker, all of the program (including 3rd parties) should be recompiled using that flag. In a classic workflow where the programmer suddenly realizes he needs the debug runtime to track a nasty bug, the thought of needing to recompile everything can be quite disheartening. And then, there’s the runtime performance cost of checked iterators.

The compilation issue can be partially mitigated by a good build system and package manager. Simply put, there is usually a lot of code in a project that doesn’t need recompiling often and could be cached. Third parties are the obvious example, but low level libraries, frameworks and other engines could also probably be built every night and shared across developers. Readers who want to know more about the state of package management in C++ can go and watch this guy talking about it at ACCU.

As for the performance impact, I ran my own benchmarks and discovered that it is indeed utterly terrible in tight loops… when used with custom algorithms. As it turns out, the optimizer has a hard time lifting iterator validity checks outside of the inner loop, leading to extremely redundant checks. This is why MSVC’s STL code has iterator debugging aware code in <algorithm> and <numeric> to keep performance decent.

A brief glance Visual C++ 2017’s implementation of std::accumulate() for example will reveal that an initial check is made on the validity of first and last parameters, and then unwrapped iterators are used in the tight while() loop to ensure vectorization can still happen. C++11 range-for loops also seem to be optimized properly, meaning the big thing to keep an eye about is hand-made specific code that runs tight loops using STL iterators.

Help me, /Ob1 Kenobi

Disabling optimizations in C++ is, of course, a serious tradeoff. Sure, no inlining or vectorizing makes the step through code process easier when debugging, but it comes at the price of most of the zero cost abstraction principle.

Fortunately, compiler vendors do not consider optimization an “all or nothing” deal and offer ways for the user to determine which tradeoff they are willing to accept. Clang and GCC, for instance, have the -Og flag to enable all optimizations that don’t interfere with debugging. MSVC doesn’t have the equivalent but the compiler has an “optimized debug” feature (/Zo) that is on by default and include extra debug info to track local variables and inline functions. As MSVC users might tell, results may vary…

Another option for MSVC is to disable optimization (/Od) but keep a degree of inlining so that the STL stays reasonably fast: /Ob1. What it does is allow inlining of functions declared inline or using the __forceinline macro. Incidentally the STL is tagged accordingly so that most of it can still be inlined. CMake’s default RelWithDebInfo build profile keeps the optimization flag (/O2) but lowers down the inlining from /Ob2 (inline all the things) to /Ob1.

It’s important to note that optimization flags do not break ABI, meaning it’s perfectly fine to build 3rd parties and other “stable” code with full optimizations, while keeping the user’s project with less aggressive optimization settings. Again here a package manager would help, as it should allow the user to use a release toolchain for dependencies while keeping with a debug toolchain for the local project (as long as they only differ through optimization flags).

Another option would be to make external code inlined, while user’s code isn’t. A simple way would be similar to the dllimport / dllexport preprocessor trick: a macro that expands to either __inline or __forceinline if no particular define is set (used from the outside) or to nothing when that define is set (when that library is being built for debug). This way the /Ob1 flag would make the compiler inline code from external sources only.

A more reliable (and less intrusive) solution would require compiler support for an inlining level in which code found through -isystem inclusion (or module import) would be candidate for inlining, but code found through -I search path wouldn’t. One would then just need to tell CMake to translate INCLUDE_DIRECTORIES properties to -I whereas INTERFACE_INCLUDE_DIRECTORIES would be pulled though -isystem (or equivalent). A package manager could also ensure that dependencies include directories are passed through -isystem. Both solutions should be discussed with compiler and build systems vendors and not through WG21 as inlining isn’t defined in the C++ Standard (well the keyword is, but only in the context of linkage).

Oh and I’d like to quickly mention -fomit-frame-pointer, the dreaded optimization that breaks debuggers. For one it doesn’t, as the C++ standard requires the runtime to be able to unwind local variables if an exception is thrown, meaning the debugger will also be able to find them. Secondly, omitting the frame pointer is a dubious optimization anyway as the x86 and x86_64 define RBP as non-volatile, meaning a function that uses it will have to restore it so it only really saves a push and a pop in select cases, for total of probably less than 10 x86 cycles (as the stack is probably in L1 cache anyway). Benchmarking is advised before adding this flag to a project build profile.

assert(fibonacci(10000) == 3e10^2089))

The standard assert() macro is disabled by setting the NDEBUG preprocessor define. The standard requires it. What it doesn’t require is for optimizations to disable it.

Contrary to popular belief, passing -O2 or /O2 does not imply -DNDEBUG or /DNDEBUG. The two are orthogonal, but build systems usually put one with the other when generating pre-defined profiles. For example CMake defines NDEBUG in both Release and RelWithDebInfo.

Which means, again, that one can perfectly enable asserts in a build profile other than Debug. I’m not advocating for enabling asserts in release (I don’t think anyone does?) but there’s no reason why a tailor-made “development” profile couldn’t have a modicum of optimization while keeping asserts (and debug symbols) on.

When it comes to usage, the only things to really watch out for are dubious asserts that have side effects, heavy computations or that are run in tight critical loops. In my experience the reason they may happen is precisely because so many default profiles disable them, leading to them been rarely seen.

Conclusion

As we have seen, build profiles are not a boolean that can be either Debug or Release. Compiler vendors know that this is a complex problem that is unlikely to have one solution and offer a myriad a settings to play with.

For the same reason the C++ Standard will probably never specify what Debug means or how fast it should run.

And as my regular readers probably know by now, there are many things we complain about regularly that end up being a build system issue more than anything. This one is just another in the list. There is a much to be done by figuring which build profile is good for one’s use case, formalizing it as a toolchain file, and making sure one’s build can correctly use it. So go out there and clean those CMakeLists.txt!

ACCU 2019 trip report

2019-04-19T00:00:00+00:00

This year’s edition of ACCU was held from April 10th to April 13th, in Bristol as always. I arrived a day earlier from Paris after a short stop in France which was supposed to offer some supply of good weather and trips to a few winemakers in preparation for the harsh conditions of Great Britain.

From the start things went awry as I could only spare half an hour for a visit to a winemaker in Vouvray who turned out to be quite forgettable, not to mention the weather that was only barely keeping it up together. Still I didn’t immediately notice that something was off, having spent the pasts months enduring the cold winter of Sweden. It took a second flight from Paris to Bristol to realize it: spring is there (although a couple of Bristol locals apologized for the weather being unexpectedly non-terrible).

Many meetings

My arrival was pretty unremarkable. It was, of course, raining and people still drove on the wrong side of the road. I had come across my former colleague Jonathan Boccara (of Fluent C++ fame) while waiting at my gate. We traded some war stories and he told me about his book he would be showing at the conference. I didn’t have the time to read it yet but I already heard some positive feedback about it.

ACCU is, like most conferences, a good time for me to spend some time face to face with friends from the C++ community living around the world. It is sometimes said that there is more value in the discussions with the people you meet at conferences that with the conference content and I would partly agree. Depending on the circumstances, I do feel like the bulk of the value falls slightly one side or the other. At times there’s a presentation that justifies the whole ticket in itself, sometimes I meet someone and have a discussion that is as valuable to me as the sum of all the talks I attend to.

The other reason I often see so many familiar faces is that, in my opinion, people don’t try to attend conferences nearly enough. After asking around a bit it does seem like I’m not the only one to have noticed that. Regardless of the company, there will be a small minority that ask their manager to be sent there, and large majority who will never do so. I am not sure how to explain it. Not feeling like it’s worth the time? Thinking it’s only for some “elite”? Maybe simply too focused on the day-to-day job, on the next deadline?

I don’t claim to have the answer, but I will certainly encourage anyone who never asks to go to do so, and those who do to encourage their colleagues to do the same. We are always happy to see new faces, meet new people and buy them a drink at the conference bar at the end of the day.

Keynotes

As I mentioned in my last year trip report, I was a bit disappointed by 2018’s opening keynote. This time was quite the opposite. ACCU 2019 opened with M. Angela Sasse telling us about security. The key takeaway was that the human, the user, will always be the weak link regardless of the technology deployed. More importantly, the fact that security is everyone’s business and not just the IT department means it must offer a good UX else it will be badly used or worked around. This was pointed out in the 90s and it still hold true today, with sadly little progress to show for it.

The great Herb Sutter travelled from his Redmond office to England to tell us about his vision for the future of error handling in C++. While I already knew about his work on the matter (it was sent to SG14 for review a couple months back), it was nice to have a refresher in front of the all conference. In short, the direction is toward better exceptions, with bounded, predictable throw and catch times. No more need for dynamic allocation. No extra cost when no exception occur (this is already mostly the case on x86_64) and a push for noexcept becoming the default unless otherwise specified.

The closing keynote was given to none other that Kate Gregory who walked us through a nice lecture on code empathy , or how to read the previous programmer’s emotions through existing code, try to understand what triggered those emotions and how to react when confronted to it. I have a hunch that it will be a nice combination to Jonathan’s book on how to deal with legacy code, as the two seem closely related.

Talks

At the rate of 3 talks a day outside keynotes, there was a total of 12 I could potentially attend during the conference. Subtract one because I had to attend mine and perhaps another one where I was busy writing slides and we get a rough estimate of about 10. While that number could make for a nice clickbait section (“10 ACCU talks you won’t believe I attended”), I will stick to my boring routine of mentioning the ones I remember the most. Also keep in mind that there were 5 tracks, meaning I saw roughly 17% of the conference content.

The two talks that made the biggest impression on me were Vittorio Romeo’s Higher-order functions and function_ref and Andreas Weiss’ Taming Dynamic Memory - An Introduction to Custom Allocators. The first one did a good job of explaining what higher-order functions are and also the content and benefits of the function_ref proposed addition to the C++ standard , all in one session. The second one offered a good tour of custom allocators, how they work and when they can be considered to replace the standard ones. Both presenters also had to accomplish their tasks while fending off the many questions coming from John Lakos who sat on the first row each time (a victory he congratulated them for at the end).

The next two talks I can think about were Peter Bindel’s and Simon Brand’s Hello world from scratch, and Andy Balaam and CB Bailey’s How does git actually work?. Both explained things we do every day by taking very simple use case (building a very simple program and committing some changes to a VCS) and showing what happens under the hood. They also both ran out of time before showing all they had planned because it turns out abstraction is no myth: even our simplest tasks are actually fairly complex when you look at how they are done. I think they both did a good job at it and would gladly schedule both in a “how does XXX works” track. That is a good theme that I would suggest having at every conference.

Next up is Kevlin Henney’s What do you mean. Kevlin is quite the celebrity in Bristol and I really liked his talk at the previous edition. While perhaps not as remarkable (I would have appreciated a clearer outline to follow), this one was still quite interesting. The main point was that meaning is derived from both what is said or written and the context that surrounds it. The latter being subjective, it implies a bunch of assumptions by both parties that, when not in line, lead to quite the misunderstanding. The main obstacle to solving that problem is that assumptions are, by definition, implicit and so can only be discovered when proved wrong (“Oh but I assumed that…”). This of course brings us back to the software craftmanship practices of frequent iterative deliveries and testing.

Finally I’d like to mention Christopher Di Bella’s How to Teach C++ and Influence a Generation. Last year, Christopher started SG20, a study group in the standard committee focused on education. Education and teaching is an important subject to me, partly because of my own personal experience of learning C++ in school, then learning another language also called C++ around 5-10 years later. As you may guess, the first one was more in the line of “C with classes” while the second looked more like the C++ we know and recommend today. To that end the group has worked on some guidelines on how to write teaching materials. They also run polls to better understand how and when people learn C++. A good complement to this talk would be Bjarne’s keynote at CppCon 2017 Learning and Teaching Modern C++.

Lightning talks

One of the best things at ACCU is how the lightning talks sessions are organized. They are simply done in the keynote room as the closing session of each day. That way, most of the conference attends before going out for beers or dinner. Program is usually decided between the day before and a couple hours before the session, meaning last minute entries are definitely an option.

It’s a great opportunity to bring up a point you had in mind but couldn’t get in as a talk, respond to a previous talk (or lightning talk) or simply raise awareness in the community on a particular matter. For example, upon arriving in Bristol on Tuesday I learnt that the great people from the Paris meetup were planning to announce a new conference. I put a few slides together, slipped in a joke or two about english food and Brexit, then went up on stage on Wednesday to tell everyone about CPPP.

Of all the C++ conferences I went to, I think this formula works best and is one of the reasons ACCU feels like a big family gathering. If you are a conference organizer and have some lightning talk sessions, I strongly suggest you consider this option. It might feel intimidating to step up on stage in front the entire conference, but then again I feel the familial atmosphere helps reducing the pressure.

Until next time

On Friday I gave my talk, The State of Package Management in C++. Frequent readers of this blog will probably be familiar with the topic. I gave a tour of package management in C++, why we want it and how far we’ve come yet (spoiler warning: enough for you to try it). As you can see the ACCU has made a fantastic job of uploading the recording on YouTube in less than a week.

But the greatest learning of all for me came after the conference, when I discovered that airlines will now charge you 50£ when boarding your plane for bringing a laptop bag with your carry-on luggage. I used to do that all the time, but today it appears you can be charged extra depending on the mood. I suppose next time I will have to put my stuff in cargo :(

Do not let that stop you from attending conferences though, I still hope to see you there!

How do you keep up with tech in your game?

2019-03-31T00:00:00+00:00

World of Warcraft was initially released in 2004. Wikipedia tells me the latest expansion was published last summer, 14 years after release. The original game could run on Windows 98. Today it requires both hardware and software that didn’t exist at release.

While WoW is certainly one of the most extreme examples, it’s not the only videogame to have an active lifespan of half a decade or more. While MMORPGs and other multiplayer games fill a good part of that category, they are not alone in here.

At Paradox, we keep releasing new content for our strategy games more than 5 years after release. The title I’m working on, Europa Universalis IV, will get a new expansion pack later this year, while it celebrates its 6th birthday.

People are sometimes surprised to hear me say that the build uses C++14. After all, ISO C++14 was not standardized at the time of release, and even C++11 support in compilers was a bit wonky when the initial development started.

Don’t fix it ain’t broken?

The first question my readers might ask is why bother upgrading at all? After all, once we made a release with a given set of tech and know it works, using anything else will come with a cost and a risk, right?

In fact, it’s a bit more complicated. The longer development will continue on a given title, the more it will cost to not upgrade. New patterns demonstrated in the field will not be usable, programmers will ask to be moved to a “newer” project, interviews will feature uneasy talks about the age of the compiler and one day a vendor will deprecate some of the technologies used.

To give some concrete examples:

A title released 5 years ago is unlikely to use C++11, as compilers during its development didn’t have a great support for it, especially on Windows/MVSC platforms. Using C++03 today will be a hard sell to programmers, especially new hires.
5 years ago, around 20% of Steam users were still running 32 bits OS, making x86_32 a likely target for a release. The next release of OSX will not support 32 bits binaries.
Metal was only released in 2014, followed by Vulkan in 2016, making OpenGL the obvious option for OSX and Linux at the time. In 2018, Apple announced the depreciation of OpenGL in favor of Metal.

The cost of upgrading

Over the course of a year, our team has done a bunch of tech upgrades to our title. Here’s a short overview of the efforts it required.

Compiler upgrades

Moving to MSVC 2017 was a pet peeve of mine. While it hasn’t been rolled into the main branch yet, I’ve been toying with a test branch for some time. Most of the effort was spent on rebuilding pre-built 3rd parties that were not binary compatible. Since the project doesn’t use a package manager (a shame, I know :)), it took a day or two to do. I’ve shamelessly used a local install of Conan to generate a new build of OpenSSL, then spent the rest of the time fighting the build of libvpx.

Operating system upgrade

Our Linux target has been Ubuntu 14.04 LTS for a long time. The next patch should move that 16.04 LTS. While it was mostly painless, we had to deal with the change of the default C++11 ABI in libstdc++. Again it was an issue with prebuilt binaries but fortunately we did have some for both targets, so it boiled down to making the CMake finder smart enough to pick the right one.

OSX proved a bit more of an issue because (in my experience) backward compatibility and building for an older release is badly documented. For example, did you know that setting an OSX deployment target silently switches your runtime from libstdc++ to libc++ past a given release? The solution for me was to let Xcode set its defaults and ensure we used a prebuilt that used the same target. Sadly the OSX deployment target is usually not considered in platform triplets (it should), whether it’s a binary distribution of a 3rd party or a package manager like Conan or vcpkg.

C++ Standard Upgrade

One last tech upgrade I’ve toyed with but not pushed yet is a full C++17 build. Of course it first required some of the previously mentioned upgrades. On Windows it means switching to MSVC 2017. On Linux it means using a newer Clang or GCC and linking the C++ runtime statically. OSX is trickier since only the latest 10.14 (Mojave) deployment target will toggle the compiler to C++17. I could try to force it anyway but Apple goes at great length to make it hard, so I’m not 100% sold on the idea. The alternative would be to bump the minimum system requirement to the latest version of OSX for all users. While they tend to upgrade much faster than on Windows or Linux, it’s still not an easy decision.

Some third parties proved a bit too old to handle C++17 properly. 99% of those were due to the use of the deprecated std::auto_ptr. When I could, I simply upgraded the 3rd party in question. One of them wasn’t maintained anymore, but fortunately was not a big one so I could simply replace all uses with std::unique_ptr manually. In hindsight, had I started the work with Linux, I could simply have run clang-tidy with modernize-replace-auto-ptr and let it do the work for me. In theory it could be done on Windows too, but I had bad experiences in the past.

Finally, there was the question of the game code itself. My readers will be pleased to hear that EU4 doesn’t use digraphs or trigraphs, so again std::auto_ptr was the only serious obstacle and was swiftly dealt with using the same techniques described in the previous paragraph.

Migration to 64 bits

Moving from 32 to 64 bits involved a lot of stuff that I’ve already discussed, from upgrading and rebuilding 3rd parties to changing operating systems. Code wise, the largest issue we had by far was the use of long. In theory, long would be a better choice than int since C and C++ only guarantee 32 bit integers with long (int only needs to be 16 bit long to be compliant). In practice however, int is 32 bit on all x86 platforms, whereas long will be 32 or 64 bits depending on platform (LP vs LLP).

A good way to make your project future proof on that matter is to follow this guideline:

When doing serialization (files or network), only use <cstdint> types
Everywhere else, use int unless you have a very good reason not to
Never use long on x86 (long long is OK though)

Third party upgrades

Some of the third party software we use was upgraded not because it blocked another upgrade, but simply because a newer version brought some benefits. An update of Intel TBB fixed some aggressive CPU usage on idle threads, an upgrade of PhysFS gave us better loading times, and an update of the Steam SDK fixed some bugs in the in-game browser on Linux.

Once more most the challenge here was fighting CMake and the absence of a package manager, but some also required to adapt our code to newer API. Fortunately the one that changed the most (PhysFS) was already hidden behind an abstraction layer so the impact was quite limited. That example illustrates some good judgement calls from the engineers who preceded me on when to use an API directly and when to wrap it.

My personal rule is usually to look at the quality level and stability of a 3rd party API. I am fine with using Boost directly, but I would clearly wrap anything that looks like a C API, or that has changed several times in the past already.

Client targets

One that I feel gets too little attention is how to decide of a given target. How can we know that moving to 64 bits will be OK with our users? To change minimum required CPU or operating system?

To me the answer is to have some form of telemetry. Collecting some anonymous data about your users (with the proper GDPR consideration) will allow you to know when it’s safe to push an upgrade on them. Steam may publish monthly surveys, but they are about all users, not your users. Players who mostly play the latest generation AAA title will not follow the same patterns as those playing indie pixel art games.

For example, on EU4 we made sure than more than 99% of our users were running a 64 bits OS before deciding to drop 32 bit support on the next patch. Furthermore, with some data from the CPUs of users still running a 32 bit OS, we could confirm that most of them would be able to simply upgrade their OS to a 64 bits variant without changing hardware to continue playing our game. In the future, this could also help us decide when to enable AVX and other CPU specific optimizations.

Wrapping up

The number one lesson I derive from this experience should not surprise my usual readers: having simpler build files and using a package manager would have saved me quite some time on those upgrades. Someone who hasn’t spent some time debugging CMakeLists (don’t we all?) would probably have been stuck even longer.

So again, I’ll strongly suggest you keep your build files simple, use a toolchain description and a package manager.

I will also suggest something possibly unpopular: have some form of telemetry in your product. It doesn’t have to “spy” on users. Keep it simple and anonymous, be mindful of GDPR. But remember that if you don’t know about your users’ hardware and software, you are likely to break something for them in an update, or postpone updates fearing that your might.

Copy and Swap, 20 years later

2019-01-07T00:00:00+00:00

Once upon a time, in the 90s, we started preaching one of the oldest pillars of Modern C++ that is RAII. We taught programmers the simple rule that a constructor must leave an object in a usable state, that we should able to copy it, and that the destructor must clean all owned resources, no matter what.

To help reasoning, we explained the Rule of Three which reminds everyone that if they customize either the copy constructor, assignment operator or destructor, they should most likely also do something about the other two.

Then came C++11 and with move semantics it became the Rule of Five which is basically the same, but adds the move constructor and move assignment operator to the list.

Things started to become confusing so nowadays we prefer to advertise the Rule of Zero which basically says “when in doubt, use = default”. When using STL containers and smart pointers, your compiler will generate suitable defaults for most classes, even the ones that hold resources.

But why do we have all those rules in the first place?

It’s dangerous to go alone

Copy/move constructors, copy/move assignment operators and destructors are the key part objects’ lifecycle. If one is wrong, users will get dangling references, leaks, double deletes and other unsavoury things. And of course they need to do that without leaking anything if an exception occurs.

Since the best way to avoid writing bugs is to write no code at all, the Rule of Zero is obviously an excellent solution. Except sometimes we cannot rely on it. In those cases (for example when writing custom containers) we will need to follow the Rule of Five.

This comes with some constraints:

Our five functions need to form a consistent whole when they acquire, copy, move and destroy resources, else we’ll leak or leave dangling pointers
Some operations like construction and assignment are quite similar so we would prefer to write one by calling the other (again reuse reduces the amount of code to review)
Construction and memory allocation may throw, meaning we must offer at least basic exception safety guarantee.
All those should be efficient, as there would be little reason to write custom containers that underperform the STL (why not use the STL in that case?).

Fortunately, the problem is not a recent one, so we have Copy and Swap.

Back to the 90s

The oldest reference to this idiom I could find is from Herb Sutter’s Guru of the Week #59, published in July 1999. The phrasing of the article seems to reference the pattern as already known, so I suspect it’s even older.

The concept is well known:

Write a destructor that deletes any owned resource
Write a copy constructor that duplicates any owned resource and takes ownership of it
Write a non-throwing swap() function that will exchange the contents of two containers by swapping the internal bits
Write the copy-assignment operator by making a temporary copy of the source object, then swap the copy with this.

This 4th point is the most elegant and important part of the idiom. Copy-assignment is usually the trickiest one to write since it must delete existing content, insert a copy of the source objects and survive if an exception is thrown somewhere in the process.

We solve all that by writing this simple code that works whatever we did in the other 2 (or 4):

T& operator=(const T& rhs)
{
  T tmp(rhs);
  swap(tmp);
  return *this;
}

The idiom even scales to the Rule of Five easily by throwing one std::move in the recipe:

T& operator=(T&& rhs)
{
  T tmp(std::move(rhs));
  swap(tmp);
  return *this;
}

In 3 lines we solved the problem while offering strong exception guarantee, that’s brilliant! That part is more than 20 years old and I still find it magical.

Well, except one thing…

The catch

Remember the constraints we enumerated before? Especially the one about performance? Because we have a problem here. Two actually.

With Copy and Swap, we will always allocate new resources then throw the present ones away. Even if our collection could fit in the already allocate storage (remember it’s an assignment). And as we know allocation can be unpredictably slow. Especially for small collections that would otherwise have been fairly cheap to copy.
The whole operation will at some point require 3 times the resources: one for this, one for the source object, and one for the intermediate copy. We do a three-way swap and the cost is one extra copy of a potentially very large collection. CPU wise we will still keep to one copy operation per element which is perfectly fine, but for memory this comes with an extra 50% cost.

Alas! But is there a better option?

There is, but it will not be free. Copy and Swap is not an old and outdated idiom to which we have a much better answer today. It remains one of the best solutions to the problem. The reason is that to offer strong exception guarantee, there is no way around it. There must be a temporary copy done first that we can simply delete if something goes wrong without touching the existing collection.

To get better performance, we will have to give up something.

Warrantee voids if exception happens

Sadly we don’t have many options, we have to demote the strong exception safety guarantee to simply basic. Our object will still be destructible if an exception occur during assignment but the content of our collection will be unspecified.

For example let’s say we’re making a vector-like container:

MyVector& operator=(const MyVector& rhs)
{
  if (this != &rhs)
  {
    clear();              // Deletes content but leaves buffer itself intact
    reserve(rhs.size());  // Reallocates buffer if needed
    std::uninitialized_copy(rhs.begin(), rhs.end(), m_end);
    m_end += rhs.size();
  }
  return *this;
}

There, we do the same job, but if an exception is thrown we only offer basic guarantee (the vector will be empty). In exchange we can reuse the same buffer if suitable and avoid a costly new.

Well almost. It would be preferable to ensure that reserve() is smart enough to require extra runtime memory only when growing from a non-zero size. On the other hand, the will probably be unhappy if a failed reallocation destroys all his data when he calls reserve() so we still need to offer a strong guarantee:

using storage_type = std::aligned_storage_t<sizeof(T), alignof(T)>;

void reserve(std::size_t new_capacity)
{
  if (new_capacity > capacity())
  {
    if (empty())
    {
      // We don't have any value to preserve, we can delete first
      delete[] reinterpret_cast<storage_type*>(m_begin);
      m_begin = reinterpret_cast<T*>(new storage_type[new_capacity]);
      m_end = m_begin;
      m_end_of_storage = m_begin + new_capacity;
    }
    else
    {
      // Alloc first, if OK move data and then delete
      auto new_storage = std::make_unique<storage_type[]>(new_capacity);
      auto new_begin = reinterpret_cast<T*>(new_storage.get());
      std::uninitialized_move(m_begin, m_end, new_begin);
      std::destroy(m_begin, m_end);
      delete[] reinterpret_cast<storage_type*>(m_begin);
      m_end = new_begin + size();
      m_begin = new_begin;     
      m_end_of_storage = m_begin + new_capacity;
      new_storage.release();
    }
  }
}

Since reserve() is a no-op when there is already enough space to fit, our copy assignment operator will reuse the existing buffer. And if reallocation is needed, since we called clear() first it will delete then allocate instead of the other way around to save up runtime memory.

In Conclusion

20 years later, it feels like Copy and Swap still does what we expect of him: ease up the implementation of the Rule of Three (or Five) while offering strong guarantee.

We can do better by dropping that guarantee to basic, but as always software engineering is a matter of tradeoffs. We gave up something and increased the maintenance cost of our container (the code is clearly harder to review and understand than the original Copy and Swap) to save up on precious allocations in some cases. You may also have noticed that our first version was generic, was the second one would require different implementation for containers.

Which choice did the C++ Standard by the way? Well, as far as I know, the standard doesn’t say. There is no exception guarantee specified for copy assignment so it would seem that implementers are free to choose. A quick glance at MSVC shows me that they went for basic guarantee so I’d expect Clang and GCC to be similar.

I’d like to give credit to the great Howard Hinnant who pops up every now and then on Stack Overflow threads about Copy and Swap to remind us of the tradeoff. He was a good inspiration to this article and you can find a whole talk about the matter here in which he makes a case for always having basic (and not strong) guarantee for assignments and then use a generic template function to do a strong copy and swap when really needed.

Oh and by the way, did you notice that we wrote vector copy and reallocation without a single raw loop using only the STL? :)

Gamedev introduction to ‘Modern’ C++

2019-01-02T00:00:00+00:00

I was initially going to write a response to Aras Pranckevičius’ Modern C++ Lamentations but the great Ben Dean beat me to it.

Reading the whole discussion on social media, I felt like an UFO, as I am part of the videogame industry but don’t believe that C++ is headed in the wrong direction or that the committee as lost touch with the community. And I think it is probably a matter of background.

I didn’t start my career in videogames. My engineering thesis was on operating systems design, then I worked for the web (in C) and then in finance (in C then C++), in total more than 10 years before I joined a videogame company. Before opening the codebase of a videogame, I went to a couple big C++ conferences, spoke at some of them, organized meetups for my local user group and of course in the process met a lot of people who are committee members, voters, writers or simply influencers.

As Ben put it, I feel like my fellow gamedevs could gain a lot by participating more in the C++ community that’s out there, and in turn bring out topics that may have been overlooked.

When writing my original thoughts, I also realized a lot of discussion reminded me of Dan Saks’ Talking to C Programmers about C++, so before I suggest any other resource, I want to recommend this one. It explains why debugging isn’t the #1 priority for the committee and explains better than I could why discussing technical points with different perspectives can be difficult.

The Standard and the Committee

C++ is an international standard. As such, it falls under the rulings of ISO and is one of the few programming languages to do so (C being the over one). This comes with a few organizational constraints.

While the final draft has to be voted by people representing national standard organizations (ANSI, AFNOR, BFI…), all the editing process is open to anyone who wants to participate. Participants usually include national bodies representatives, but also academics and engineers from all across the world.

So how does one participate? Well that’s easy, they just have to show up. The C++ committee usually meets thrice a year, most often in the US or in Europe and anyone who comes can participate in the meetings and debates. The only thing they can’t do is vote in the formal polls since ISO requires to be a member or representative of a national organization.

Most participants are sponsored by their company or university to attend, but some do come at their own expenses. There are also a few non-profit organizations that sponsor people to attend, such as the C++ Alliance.

Being unable to find the time or sponsorship to attend is absolutely not a showstopper. In fact, a lot of work in done between meetings, by email and other electronic exchanges. Meetings are usually about discussing and voting proposals, while all the writing and editing is done outside. One can perfectly submit a paper, address comments, and then find a champion in the attendees to defend it at committee meetings. More can be found here.

Finally, since the C++ standard is a big piece, work is usually split by topic. WG21 has created a number of study groups over the years to discuss particular bits of the language. The one that should interest my readers the most is SG14: Game Development and Low Latency. It’s free to join, all the discussions are available on the mailing-list and there’s a teleconference most Wednesdays on European evening time.

In short, most of the process that leads up to the adoption of a new C++ standard is open to anyone who wants to follow or participate. For more information, refer to the ISO C++ website.

The community

Not everyone in the C++ community participates in the standard, nor do they have to. Getting in touch with fellow developers is already a big plus.

In my opinion, the best way is to attend conferences. Here’s a incomplete schedule for 2019:

CppOnSea (Folkestone, UK), February 4-6th
ACCU (Bristol, UK), April 10-13th
C++Now (Aspen, CO, USA), May 5-10th
CppCon (Aurora, CO, USA), September 15-20th
Pacific++ (Australia/New Zealand), usually mid October
Meeting C++ (Berlin, Germany), usually mid November

Most employers would gain a lot by sending their developers there, but if that’s not an option, the conferences usually offer some student and diversity tickets at a discount. Of course the best way is to submit a talk and get most/all expenses paid by the conference!

Outside of those special gathering, there’s probably a C++ Meetup happening nearby. And if not, I would recommend starting one. I know personally a couple people who launched one because they couldn’t find it, and quickly got dozens of attendees.

Finally, there’s the Cpplang Slack!

Online resources

Should you miss a conference, or simply be curious of what happened there, most (if not all) of them publish all the recordings freely on youtube:

To help you follow all the guidelines and best practices that are shared in conferences and blog posts and others, Bjarne and Herb (and many more now) maintain the C++ Core Guidelines. When in doubt, check them!

For readers who commute to work, or go running, or whatever else, it’s always good to save up a couple of CppCast episodes to listen to. Each week Rob and Jason interview someone from the community and share recent news about C++.

Also, since historically the gamedev community doesn’t trust its compiler, the Godbolt¹ is an invaluable resource.

And of course, you should always measure, so better run the code through QuickBench.

Mythbusting

In closing, I’d like to suggest a couple talks that give some perspective on where the C++ community is heading:

Using C++17 to make a video game on a legacy platform by Jason Turner
How smart can compilers be? by Matt Godbolt
Ranges for the Standard Library by Eric Niebler
Writing good C++ by default by Herb Sutter

Happy watching and welcome here, I hope to see you around soon!

¹ Sorry Matt, I know it’s called Compiler Explorer but it just doesn’t stick :(

CppCon 2018 trip report

2018-10-14T00:00:00+00:00

CppCon 2018 was held in Bellevue, Washington, on the last week of September this year. Flying from Stockholm proved a bit more painful than from Paris, as no direct flights are offered. My trip started somewhere around 11:15 at ARN, and ended in SEA around an eternity later (official sources settling around 15 hours).

Pre-show warmup

With most of Saturday spent flying and brooding about the absence of direct destinations from Arlanda, I can’t say much happened the first day.

Sunday offered for optional trainings that I didn’t suscribe to, but I heard from those present that they were pretty good. One especially was about teaching the attendees how to be better speaker and offered a very attractive deal to first time speakers, which I hope they all took. On my side I used my (dubious) position as a veteran of CppCon to show Seattle around to two of my colleagues from Paradox Arctic. We went to see the Living Computer Museum which hosts an impressive collection of restored (and still working) mainframes such as DEC PDPs and VAXs and Xerox Sigma. You can even request a telnet access to them.

While visiting the place, I rolled on the table of random encounters and found Robert Maynard, who turned out to work on CMake. Unfortunately his occupation wasn’t known to me when we met so I missed a crucial occasion to introduce myself by saying “Hey, your build system is horrible, but all the others I tried are even worse. Anyway pleased to meet you.” Next time!

The evening was kicked-off by the ritual T-Shirt Dinner which requires you to put on a C++ T-shirt and then go to one of Bellevue’s restaurants to meet other developers. The place I picked started by giving us two tables, before they realized that more than 50 people would show up there and hastily found a larger space which still felt a bit overcrowded. In retrospect, it foreshadowed my impression that CppCon is growing too big for its venue.

Keynotes, plenaries and where is my brain teaser?

As I usually mention in my trip reports, I tend to have a somewhat (too) high expectation of keynotes and plenaries at conferences. This is where I expect to get challenged and walk out with a new perspective in mind.

Great past examples at CppCon would be Eric Niebler’s Ranges for the Standard Library, Herb Sutter’s Writing Good C++14… By Default and Jason Turner’s Rich Code for Tiny Computers.

This year, my personal award would go not to any keynote, but to Hana Dusíková’s Regular Expressions (video was removed, a new one is being uploaded) which is probably the best pratical use of constexpr I’ve seen so far. The idea of generating state machines from strings at compile time is not only smart, but also proves faster than parsers generated by code generators (lex/yacc style). The concept demonstrated is applicable to any LL(1) grammar which opens a large panel of applications.

As for the keynotes themselves, I’d say they were good, but not as good as the past years. This is an entirely subjective advice though, that is heavily weighted by the fact that most topics didn’t feel as new (Bjarne Stroustrup on Concepts, Kate Gregory on Simplicity, Herb Sutter on Metaclasses and GSL) or as interesting (Chandler Carruth on Spectre). Of course, viewers who didn’t follow the previous editions of CppCon, or the other conferences of the season, would probably rate those talks much higher.

The keynote I didn’t mention yet was Mark Elendt’s Patterns and Techniques Used in the Houdini 3D Graphics Application which did a retrospective on the 30+ years his application lived through. I felt some personal connection with the talk, having worked on similar codebases in the past and I think it’s important to speak about those at least so that other developers realize they are not alone out there. Yes there were bad decisions to be made and fads that would have been better left alone, and no your company is not the only one to have fallen for it. As I explained in my previous blog post, Unicorns aren’t real.

SG14

This year marked the first time I attended the SG14 face-to-face meeting. It was quite insightful to see standard discussion from the inside out, at the price of missing a full day of the conference.

For the readers who don’t know already, SG14 is a working group that gives advice on C++ standard proposals relevant to Low Latency. It is open to anyone interested and is mostly attended by folks from the Gaming, Trading and Embedded industries.

A few papers were discussed, with especially two catching my attention:

Object relocation by Artur O’Dwyer (D1144R0) offers to add a new trait “trivially relocatable” to the language that basically tells the compiler “moving this type amounts to a memcpy() and not calling the destructor”. This could potentially optimize the relocation of any container, smart pointer or resource holder type.
Linear Algebra by Guy Davidson (D1166R0) proposes to add linear algebra types and operations to the standard library. A worthy addition, if you ask me, as every video game company has to write its own or use some 3rd party BLAS implementation. Having standard vocabulary types would make it much simpler for two libraries using vectors and matrixes to interact.

One thing I felt by the end is that sometimes papers only really interest parts of the audience, for example Linear Algebra probably didn’t speak much to the Embedded people, whereas I couldn’t care much about the Freestanding Libary proposal. This might call for better planning from my side next time to skip some hours.

2018, The year of package management?

One of my big expectation for this year was to discuss the direction of build and package management for C++. Bjarne himself said during his CppCon 2017 keynote that the lack of accessibilty of 3rd party libraries is pulling students away from the language as it’s just too painful to write something more interactive than a terminal application.

While the talks and panels themselves didn’t seem to bring much (although I missed Peter Bindel’s), there was a good deal of discussion going on the side between the people from Conan, the Bincrafters, the maintainers of VCPKG, CMake, build2 and of course Isabella Muerte. A lot was said and would take (at least) a post by itself to summarize. In short, my takeaways are:

CMake remains the best we have (for better or worse)
Modules still need some work, especially on the interaction with the compiler to find them
VCPKG’s goal is to package as many libaries as possible, working around broken builds like Linux distributions do when they package for apt or emerge
Conan has put more focus on the tool itself and providing enterprise services for it, but is now investing to deliver more packaged libraries to the community
The idea of a standardized package descriptor isn’t getting much traction, we’re more likely headed towards a de-facto standard if one software manages to win the market
New tools such as build2 are interesting as research on the topic but I’m doubtful they will offer a practical solution to the current ecosystem
Overcomplicated builds and reliance on non-native approaches (Cygwin/MSYS) is still the #1 reason why a library isn’t trivial to package
Header-only libraries continue to be the “prefered” way to avoid the topic entirely

Looking forward

As this edition ended, I couldn’t help myself thinking that it wasn’t my favourite. The place felt too crowded, I couldn’t quote as many memorable talks, didn’t meet as many new people. But then I realized, this was perfectly normal. As the conference is growing, so am I. This is not my first time anymore. Or my first gathering of the year. I can’t meet Matt Goldbolt (among others) in person for the first time again.

When you first jump into this world, you’ll certainly be amazed by all those topics you didn’t hear about before, and those aweseome people you can finally see in person. As you become more of a veteran, your reasons for going may change. You go for the hidden gem, or for the late night discussion with the 3 experts in whatever field you’re intested in.

Whatever your profile, I’m pretty sure you’ll find it worthwile so… see you in Colorado next year!

Wait, Stockholm has no direct flight to Denver? Forget what I’ve just said, everything sucks!

Chasing Unicorns

2018-09-01T00:00:00+00:00

Picture this situation: a new developer joins a team. Sooner or later, he discovers the existing codebase and starts playing with it. Some time may pass, but at some point he encounters a pattern he finds weird. Maybe he read a case against it in a book, or heard it in a talk, or simply learned from experience that this is probably not the recommended way of solving that particular problem.

He brings up the matter to another programmer and points-out that particular construct before suggesting the different approach. He makes his case but is switfly rebuffed: “we don’t do things like that here”. Surprised but not defeated, our developer brings out his sources: it’s not just his weird idea, it’s more of a consensus that the industry has reached over the years. And then the hammer falls: “this doesn’t apply to us, we have unique constraints”.

He has found a unicorn.

One in a million

Does this story feel similar? Maybe brings backs a past experience, or perhaps a story shared over a beer?

It usually works as follows: a suggestion to use an engineering best practice is ignored because it can’t possibly apply to this particular project or company. Maybe it comes from another industry, from a company of a different size, it may even have been tried in the past with unsuccessful results!

In some cases, that reason may hold. Maybe the idea comes from a very specific and different use case and can’t be generalized. But most of the time, what we call best practices are accepted all over the board.

What are the chances that a given project is, in fact, a pioneer on the domain and has indeed a use case that reveals a pitfall in an accepted practice? There is certainly a possibility, but one must ask himself if it really is the one that defies statistics because most of the time, it won’t be.

Take the Unicorn Test

Reading this article, my readers may already have some concrete examples in mind that would be related to the point I’m making here.

Let’s start with one of the most obvious: testing. I believe that today everybody knows that testing is good, usually the more the better, and done at the lowest possible level. This is not a new idea, in fact as Kevlin Henney pointed out at ACCU 2018, you could find references to the idea of Test Driven Development in the 1970s. Yet it is not that hard today, more than 40 years later, to find projects without any unit tests at all.

Another well-known practice is to stay away from goto instructions. As Kevlin also pointed out in his talk, the recommendation came from Dijkstra himself, in 1968. A 50 years old best practice that still managed to be ignored and produced a critical vulnerability in iOS in 2014!

Even more concerning, in my opinion, is the fact that while looking for references to the bug I could find recent articles defending the use of goto, stating that the problem was somewhere else. A 2015 paper even analyzed Github projects and concluded that “developers limit themselves to using goto appropriately in most cases”. Fortunately the C++ Core Guidelines still stand by Dijkstra stating “ES.76: Avoid goto”.

Most common C++ unicorns

(This section will delve into C++ specific practices, if you, dear reader, isn’t interested in C++, first I’d like to thank you for reading my blog anyway, that must be a difficult experience, and second, suggest you to skip to the next one)

As I mentioned in the previous paragraph, the C++ Core Guidelines are a formidable source of best practices in C++ today. They may not the best way to learn (books, talks, articles and conferences are a better vehicle for that), but when in doubt, it’s a pretty solid go-to reference (pun intended).

To help my readers figure out if they may be working on an unicorn C++ project, I’ll list a few best practices that I found or heard to be discouraged in some places:

Using the STL. That may look obvious, but some still don’t trust it today. This may come from previous bad experiences (some implementations were indeed buggy in the 90s/2000s), a policy against templates (see next item) or simply a case of NIH (Not Invented Here). Whatever the reason, the fact remains that the STL today is a well designed library provided with every compiler on every platform. Yes, some bits have their shortcomings (iostreams, futures…), and some containers could be better optimized if some constraints from the standard were to be relaxed (std::unordered_map for example), but in which case there’s probably a replacement or a more specialized alternative in discussion somewhere. And if not, it’s probably a prime subject for a new paper. You may use alternatives to some parts of the STL if you know what you’re doing, but not using it at all makes little sense.
Using templates. Once upon a time, compilers were having a hard time with two-phase name lookup, or SFINAE, or inlining. I had to deal with a linker that did not eliminate duplicate instantiations of std::string across translation units, leading to ridiculous binary sizes. But that time is beyond us. Today GCC, Clang and MSVC will do wonders, allowing us to write generic containers and algorithms that can be optimized better than assembly code written by humans.
Using RAII. The idea is more 30 year old now. This is the reason why C++ doesn’t need goto, or manual use of new and delete. This is why we should use smart pointers. This is how we can have extremely performant native code without resource leaks or garbage collection. Scope-based management of resources is in my opinion one of the “killer” features of C++ that most languages lack.
Avoiding raw loops. C++ algorithms are fast, have names that express intent, and will ensure that cases with only 0 or 1 elements in a collection still work. If you don’t know what std::rotate() does by heart, you probably haven’t watched my favourite talk of all time by Sean Parent. It even contains a short story about how he was told that “we don’t use std::rotate() here`” and what impacts it had on the product.
Prefering values over pointers. Pointers are useful, but their semantics are weird. They can be null, copying them doesn’t really copy anything, pointer data members don’t enforce const correctness the same way as values, and at some point they may refer to freed memory. Pointers to arrays are even worse as they don’t carry information about boundaries. Herb Sutter made a good case against storing pointers in his CppCon 2015 keynote.
Using auto. Try a little contest with your compiler: pick any number of C++ expressions and try to find out their type. Guess which one will be 100% correct and which one will miss a few corner cases. Spoiler warning: the compiler wins. In fact, Bjarne himself wanted a similar mechanism in the early stages of the design of C++. In C (up until C99) you could declare a variable without a type and it would default to int. C++ wanted to change that to whatever is the type of the right-hand side expression but couldn’t because of backward compatibility. In 2011 we finally got auto, the magical identifier that is always of the right type and raises an error if not initialized at declaration. Yes it doesn’t implicitly propagate const, & and &&, and for a good reason: you want those intents to be explicitly expressed. Again, Herb’s argument in its favor is 5 years old now.
Prefering C++ abstractions over C (or assembly). A generalization of some of the previous tropes cited before, some projects may still argue that C is simpler and closer to the metal than C++, hence better or faster. GCC’s own developers had to fight for the right to use C++ over C. One the reason LLVM/Clang exists today is because GCC’s C codeline was impossible to work with at the time. For those still worried that higher level abstractions mean lower performance, I’d recommend Jason Turner’s CppCon 2016 keynote and Matt Godbolt’s CppCon 2017 keynote.

If you’re arguing, you’re loosing

The reason I wrote this article is mostly to help my readers recognize that they might be working on a unicorn project. One of the big thing I get from conferences is the realization that my problems are in fact, not unique at all and shared by other people who may have solved them, or may be stuck by the same dogmas I’ve been facing. I do believe that the simple fact that you can find someone else from the profession with the same issue helps dispelling the unicorn’s myth.

Still knowing you’re facing a unicorn is only half the battle, you then probably want to slay the beast. I found that most of those antipatterns are linked to the project’s (or company’s) culture and are not easy to change. Academic research has found that humans usually have a strong bias against ideas from the “outside”.

I’ve linked some good papers and talks defending those practices and their history in this article, and you can certainly find a hundred more freely on the internet. On a more generic topic, Dan Saks made a brillant keynote at CppCon 2016 about changing people’s mind on a given practice. I fear that trying to summarize it here will not do it justice, so I strongly suggest my readers to go and watch it.

In conclusion, this time again I encourage my readers to go to conferences and meetups, talk to their peers, watch more talks and read more articles. For every problem you face, chances are someone else has already faced it and the profession has found a good solution to it. Remember: unicorns aren’t real, like most mythical creatures, they’re cool because they can’t exist.