Modern Mercurial

Antoine Cezar, Raphaël Gomès
2020-11-26

General perception

Mercurial draws a wide range of opinions from its users, both former and current, and even from those who have not tried it yet. Such opinions range from "it's super easy, way simpler to use than Git" to "it's slow and the UX sucks", along with "Mercurial? Isn't it dead?" and "It's not on Github" and the (usually) more quiet "I like the features" and "it scales very well".

Like with most technical discussions, whether any opinion is "right" really depends on who you ask.

In the case of 15 year-old software like Mercurial, when you've asked a particular question is important, as things may have changed from when you last made up your mind. Your workflow might not be well supported, the tools you need might not exist... you might also just not like it.

Our goal with this article is to re-state some of the evolutions that Mercurial has seen over the past few years and to review some of its strong points and weaknesses.

Hopefully, you come out of this read with more up-to-date information to help you take informed decisions about the tools you're using.

Current state of affairs

Phases

Phases have been introduced to Mercurial in version 2.1 (2012) and have become one of its central workflow features.

Every changeset (or "commit") is in one of three phases: public, draft or secret.

By default, when you commit something, it's in the draft phase; that means that you can change it, move it in history and even delete it using any of the many history-rewriting commands in Mercurial (more on that later). The public phase is meant to make changesets immutable: once they're public, you cannot change anything about them, all rewriting commands will abort when used on a public changeset.

This is useful as a project management tool, much like you would use protected branches in forges such as GitLab, but built-in to your VCS and more granular. You communicate to all other contributors that these changes can be relied on for future work, your CI will not fail because your hash does not exist anymore, etc. This means that this additional protections is preventive, and saves you time in the long run.

The secret phase is simply useful to commit changes that you do not want to share, like local config files for example.

Distributed history rewriting

Because Mercurial is a distributed tool, whenever you rewrite history (change the contents, move changesets around, etc.), you run into the risk of conflicting with or erasing the work of other people.

Users of other DVCS usually come up with rules (be they enforced by humans or through an ad-hoc permissions system) to prevent the pitfalls of concurrent editing.

Mercurial takes a different approach that, to our knowledge, is unique among DVCS: distributed history rewriting should not only be possible, but safe and with very little friction.

Enter "Changeset Evolution": Mercurial keeps records of every operation you've made on your history and shares them along with the standard history. This enables automated conflict resolution when sharing history, turning a usually very tedious task into a simple one-command fix. The core concept was introduced in the evolve extension and subsequently integrated in Mercurial itself. Some commands and experimental features still reside in evolve where they can be iterated on and changed faster according to user feedback.

Old commands and extensions have been ported years ago to use the new mechanism like the standard rebase, histedit (similar to Git's interactive rebase) and absorb, if evolve is detected. This last one is a tool that automatically finds where to amend the different bits of your working directory into the right changesets and does so interactively.

You can run hg help evolve --extension to get more precise info and a list of all commands (scroll down for how to install it).

Feature branches

Mercurial has had the concept of branches since its very inception. These branches differ from what Git calls branches: they are not pointers to a changeset, but instead refer to a set of changesets that may grow over time. They are often used to separate different long-term development efforts. In the Mercurial repository itself, you have the "default" and the "stable" branches, the latter corresponding to the latest stable release and any other bugfixes since that release. A changeset always belongs to a named branch, "default" being the... default.

The Mercurial community struggled to define a nice way to handle 'topic' branches (sometimes also called 'feature' branches), especially when it comes to sharing them with other people mainly for code review or collaborating.

topic is an extension created in 2015 that addresses that problem in a way that integrates nicely with Changeset Evolution and does not get in the way of its users. While still experimental, topics are used by a lot of people from different background, in different companies and the overall feedback is overall very positive.

Topics differ from Git branches in that each changeset carries the topic information, as opposed to just the tip of the "branch". The user can clearly differentiate multiple topics, switch from one to the other, manipulate history within the current topic, etc.

Finally, topics are temporary and fade out once the changesets are published.

This is the workflow that Heptapod chose when adapting GitLab to Mercurial.

Other powerful features

Mercurial supports a functional language for selecting a set of revisions. Expressions in this language are called revsets; most commands have a --rev argument. Here are a few examples:

  • hg diff -r 1.0::. will show the diff for all revisions from the 1.0 tag up to the current revision
  • hg rebase -r my-topic -d "branch(default) and public()" will rebase all changesets in my-topic onto the public tip of the default branch.
  • hg log -r "(keyword(bug) or keyword(issue)) and not ancestors(tag())" will list changesets mentioning "bug" or "issue" that are not in a tagged release.

This allows you to build complex queries that would otherwise require writing a script or hoping that developers thought to include a certain flag on the specific command you're using.

Finally, another common and powerful feature of Mercurial is its templating system. Almost any command's output can be customized from the command line, without external tools, simplifying a lot of tasks that would otherwise require heavy scripting. Here are a two examples:

  • hg log -T"{node|short}\n" will list the short hashes of all revisions, separated by a newline
  • hg log -T. -r "author(rgomes) and date(2020)" | wc -c will output one . character for each revision Raphaël wrote in 2020. wc -c counts these dots to give you the sum
  • hg log -r 0 --template "{ifeq(branch, 'default', 'on the main branch', 'on branch {branch}')}\n" uses a conditional to test for the default branch.

Setup

Backwards-compatibility is one of the strengths of Mercurial, but also one of its weaknesses. A Mercurial 5.5 client will work against a 0.9 server just fine, and the vast majority of users will never encounter a breaking behavior change when upgrading.

However, (almost) never breaking backwards compatibility also means being stuck with the choices made as long ago as 2005, forever. This includes default configuration options and user-facing behavior, both of which can be very subjective (more on that below).

There exists a tweakdefaults config knob that enables recommended defaults for various commands, blessed by the maintainers.

Additionally, here is our take on a simple config to get you started with modern Mercurial. Other Mercurial users and contributors will have differing opinions and other suggestions, but this is what works for us.

[ui]
# Explained above
tweakdefaults = true
# Merge conflict markers in files (they look like "<<<<<<<") are more verbose
mergemarkers = detailed
# Uses the internal non-interactive simple merge algorithm for merging
# files. It will fail if there are any conflicts and leave markers in the
# partially merged file. Marker will have three sections, one from each
# side of the merge and one for the base content.
merge = :merge3

[commands]
# Require the user to pass a revision (or revset) to "push" instead of pushing
# all heads by default.
push.require-revs = true

[paths]
# Default to the current revision (`.`) when pushing to `default`. 
default:pushrev = .

[phases]
# "Server-side" configuration. If you or anyone else pushes to this Mercurial
# install, the changesets will not be published, contrary to the default.
publish = false

[extensions]
# ==== "Core" extensions ====
# Extensions that come pre-packaged with Mercurial:

# Adds the `rebase` command
rebase =
# Shelve is like Git's `stash`
shelve =
# Similar to Git's interactive `rebase`
histedit =
# Automatically finds where to amend the different bits of your working
# directory into the right changesets and does so interactively.
absorb =

# === External extensions

# These two are installed with `pip install hg-evolve`.
# Additional commands and features on top of Mercurial's "Changeset Evolution"
evolve =
# Adds support for topics and new commands like `topic`, `stack`...
topic =

[experimental]
# Force the user to specify the topic when committing.
# Use `topic-mode = random` to generate a random topic name
topic-mode = enforce
# Force Mercurial to use Changeset Evolution for all commands even when evolve
# is not found
evolution = all

Issues and ideas on how to fix them

Slowness

Mercurial suffers from a "slowness threshold": most commands will run slower on a small repository than a comparable command would in Git. This is in part due to Mercurial being written in Python, which has a slow startup, and leads the user to believe that this slowness will get worse the larger the repository.

This is however largely untrue, as the vast majority of Mercurial was designed to handle large repositories and scale well as one of its primary goals, hence why a lot of big companies use it.

This is not to say that the situation cannot be improved; in fact, a few efforts have either landed or started towards fixing that slowness threshold. Octobus contributes a lot of Rust code upstream that has already made greats strides in certain areas and will continue to do so for the foreseeable future.

In 5.6, hg status on a Mozilla working copy (about 4 times larger than Torvalds Linux):

  • hg 5.6 (no rust): 1310ms
  • hg (with rust): 408ms

In general, Mercurial has had a steady story of improving performance, like clone bundles back in 2015 (https://gregoryszorc.com/blog/2015/10/22/cloning-improvements-in-mercurial-3.6/), more recently work around discovery (push/pull performance), and a lot of various algorithmic improvements all over the code base (like https://twitter.com/octobus_net/status/1325833268135124992).

Good/bad defaults and backwards compatibility

As explained above, Mercurial is backwards compatible. For tasks like automation and upgradability, this is very useful as nothing really ever breaks, and it also extends to the command-line interface and the configuration defaults.

That last part means that we cannot really change defaults, UI and behavior that is not good anymore. New users run into the same issues over and over again, and the reputation that Mercurial has had of being user-friendly suffers.

One of the ways we could approach this problem is by splitting Mercurial into the "core" hg and the "consumer" (or interface). This is a very common practice nowadays, and separates the concerns of the actual VCS from its UI.

More specifically, this could mean publishing a new hg executable that is still compatible at the repository and network levels, but has a different interface that is less surprising and more aligned with the modern use of a VCS.

This is not an actual goal of the Mercurial project itself, but has been discussed a few times, and the work Octobus is doing with the Rust rewrite of the core could be a good start.

Conclusion

We hope you enjoyed our presentation of the Modern Mercurial, warts and all, and that you will give it a try (again). Tell us what you think, don't be shy!