Performance benchmarking 101


We all want top-quality software without any bug. But if there is something as frustrating than buggy software, it's slow software. Who never raged against software that seemed frozen? Who never reloaded a browser tab only to make the interesting make a brief appearance before the page reload?

So how to fix software slowness? Why is it hard to improve? Because before fixing a problem, we need to measure it. And measuring performance is a complex task by itself.

At Octobus, a lot of our work focus on Mercurial, a command-line tool. You invoke it dozens or thousands of times per day creating, rewriting and exchanging commits. The two facets that are interesting for us are the time for commands to finish (independently from the human interaction time) and more low-level functions benchmarking.

Measuring performance

Measuring performance is hard. Why is it hard? Because it's not a single boolean value (fast or not), it's a discrete value with a lot of influences from the environment. Here is a non-exhaustive list of things that can impact performance measurement:

  • Other running processes.
  • Hardware (CPU/Disk/Memory) performance.
  • CRON jobs.
  • AV scans.
  • CPU context switching.
  • Temperature throttling.
  • OS tuning.
  • Network traffic.
  • Cosmic radiations.
  • Noise (no kidding:
  • etc...

The "solution" is to try reducing the impact of these variables on the measurement. The other "solution" is to run the performance test not once but several times, potentially hundreds or thousands of times. This will give us a better representation of the performance of what we test.

One thing also to consider, there is little value to compare values obtained on different machines. What we can and will do is measure the performance of a single action on the same machine before and after applying a patch to see its impact on performances.

Performance tools

Luckily for us, performance tools already exist. But what are the required performance tools features?

  • Track a remote repository and associate measurement with each commit.
  • Build the environment for the selected commit.
  • Run the performance test suite for the built commit.
  • Save the results. As we run performance tests several times, it's useful to store more than 1 value (the median of the results for example) in order to be able to quantify the quality of the run.
  • Generate a report, in HTML it would be good.

For our use-case, we chose ASV ( which stands for airspeed velocity. The documentation can be found here: Our repository with the config and the performance tests are here:


ASV is handling a lot for us:

  • Allow tracking a mercurial repository, better as we are benchmarking mercurial itself.
  • The building part is already tailored for Python projects.
  • Run all the test suite or only a subset of tests for the built commit.
  • Store the results in JSON file, one directory per machine.
  • Create an easy to host HTML report.

The bonus features of ASV are nice:

  • ASV allows passing a complex set of commits to tests (for example all tagged commits on a specific branch).
  • It can also take only a subset of passed commits, you can ask it to take only 10 commits in the whole repository history. Very useful to quickly get a glance at the performance evolution of a project or release cycle.
  • Auto-detection of regressions between commits or range of commits. Like operation A is 1.5x slower between commit N and N+1. Or operation B is 1.8x slower between tag X and X+1.
  • Automatic bisection of regression. In case of a regression in a range of commits, ASV has the tooling to bisect which is the commits that introduced the regression, very useful to pinpoint the responsible commit.

In the next blog posts about ASV we will show you how to get started with ASV to start benchmarking your code.

Further reading

We made a presentation, in French, on the subject:

Our friend Victor Stinner, a Python Core-Developer, also makes very good blog posts about the subject: