Presenting Mercurial Discovery Algorithm at LAGOS 2023

Pierre-Yves David
2023-09-25

Over its almost 20 year of existence, Mercurial has encountered many challenges and developed data-structures and algorithms to overcome them. As the feature-set and scale at which Mercurial is used grow, these challenge evolve and require deeper knowledge about the challenges we face, and iterative improvements to the existing solution.

Yet, this knowledge and algorithms tend to remain "hidden" within the Mercurial source code and community. To improve the situation, Octobus has been working with computer scientists. These researchers are happy to be provided with real-life challenges and algorithms, and in return can offer their expertise in formal analysis. This collaboration allows to document the existing knowledge accumulated by the Mercurial community and to offer a fresh perspective to further improve the existing solutions.

One of the visible results of this joint work is our contribution to the LAGOS 2023 Conference where our paper about The Problem of Discovery in Version Control Systems has been presented. It describes the concept of "changeset discovery" that takes place during each exchange (push or pull) between peers, where the local client queries the server to figure out which parts of the history are missing and need to be exchanged and which parts are common and do not need to be sent over the network. The article described the Mercurial approach to this problem and how it evolved over time. It compares it to what Git does and offers ideas on how to improve it further.

New Evolve extension release: version 11.0.2

2023-07-05

We released a new version of the evolve extension: 11.0.2.

This extension extends core features around history rewriting and draft changesets sharing.

As usual, the release is available on PyPI, and upgrade is recommended.

This is a bugfix release. The most notable change is compatibility with the upcoming Mercurial 6.5.

Thanks to all the people involved:

Evolve: 11.0.2

  • compatibility with Mercurial 6.5

  • packaging: explicitly use python3 for running tests in debian/rules

Topic 1.0.2

  • compatibility with Mercurial 6.5

New Evolve extension release: version 10.5.0

2022-02-23

We released a new version of the evolve extension: 10.5.0.

This extension extends core features around history rewriting and draft changesets sharing.

As usual, the release is available on PyPI and upgrade is recommended.

This is a feature release. The most notable changes are: a new command: fixup, which is used to add changes from the working directory to an arbitrary revision. An issue with histedit not preserving topics in certain situations was also fixed, see issue6550. Evolve is now compatible with the upcoming Mercurial 6.1, which includes head computation code that is obsolescence-aware (adapted from the evolve extension). With an up-to-date client and server you should expect hg push to take much less time. Also compatibility with Mercurial 4.7 was dropped in this release.

Thanks to all the people involved:

Evolve: 10.5.0

  • compatibility with Mercurial 6.1

  • evolve: handle cases when working directory parent has multiple successors

  • multiple commands: do not check for new divergence if divergence is allowed via configuration
  • fixup: a new experimental command to add working directory changes to a specified revision
  • pick: show abort message after pick is aborted for consistency

  • evolve, pullbundle: drop compatibility with Mercurial 4.7

Topic: 0.24.0

  • compatibility with Mercurial 6.1

  • topic: make histedit preserve topics when the first changeset in a stack is rewritten (issue6550)

  • drop compatibility with Mercurial 4.7

New Evolve extension release: version 10.4.1

2021-11-19

We released a new version of the evolve extension: 10.4.1.

This extension extends core features around history rewriting and draft changesets sharing.

As usual, the release is available on PyPI and upgrade is recommended.

This is a maintenance release. It introduces compatibility with the upcoming Mercurial 6.0 and a couple of documentation improvements.

Thanks to all the people involved:

Evolve: 10.4.1

  • compatibility with Mercurial 6.0

  • documentation: add a help section about making evolve skip content-divergence check with experimental.evolution.allowdivergence.

  • documentation: mention that pick uses the active topic if it's set

Topic: 0.23.1

  • compatibility with Mercurial 6.0

New Evolve extension release: version 10.4.0

2021-10-15

We released a new version of the evolve extension: 10.4.0.

This extension extends core features around history rewriting and draft changesets sharing.

As usual, the release is available on PyPI and upgrade is recommended.

This is a feature release. The most notable changes are: evolve will now produce the same result regardless of revision numbers when resolving content-divergence; an annoying topic bug that sometimes prevented various history-rewriting commands from completing with a KeyError was fixed. See issue6500. Also compatibility with Mercurial 4.6 was dropped in this release.

Thanks to all the people involved:

Evolve: 10.4.0

  • evolve: use a more stable criteria for picking p1 when solving content-divergence (most recent evolution will be used)
  • evolve: drop the deprecated --unstable, --divergent and --bumped flags, they were replaced by --orphan, --content-divergent and --phase-divergent respectively a long time ago
  • evolve: remove experimental.obshashrange.lru-size docs, that config option didn't do anything for a long time
  • evolve: use precheck function from Mercurial 5.9+ when available, mostly affects error messages and exit codes

  • next: add an --abort flag

  • evolve, topic, pullbundle: drop compatibility with Mercurial 4.6

Topic: 0.23.0

  • topic: don't cache .topic() of memctx instances, as that could produce KeyError: b'topic' during some rewrite operations (issue6500)
  • topic: drop old code for working with amends on ancient hg versions (~3.6)

New Evolve extension release: version 10.3.3

2021-08-13

We released a new version of the evolve extension: 10.3.3.

This extension extends core features around history rewriting and draft changesets sharing.

As usual, the release is available on PyPI and upgrade is recommended.

This is a bugfix release. The most notable changes are: compatibility with Mercurial 5.9, and a bug that could lead to data loss when using evolve commands on a merge commit was fixed. See https://bz.mercurial-scm.org/show_bug.cgi?id=6416.

Thanks to all the people involved:

Evolve: 10.3.3

  • evolve: compatibility with Mercurial 5.9
  • fold: make sure to save commit messages in last-message.txt, also affects metaedit (issue6549)
  • touch/fold/metaedit/rewind: no longer lose changes from merge commits (issue6416). As a consequence (for technical reasons), when run with Mercurial 5.5 and earlier, these commands now require there to be no unresolved conflicts.

Topic: 0.22.3

  • topic: correctly update from public commits with a (now hidden) topic when hg update is called without any revision (issue6553)
  • topic: fix the help text to show how to disable publishing

New Evolve extension release: version 10.3.2

2021-05-28

We released a new version of the evolve extension: 10.3.2.

This extension extends core features around history rewriting and draft changesets sharing.

As usual, the release is available on PyPI and upgrade is recommended.

This is a bugfix release. The most notable changes are: previously changing topic of a merge commit could lose some file changes, this has been now fixed, and experimental.topic.publish-bare-branch and experimental.auto-publish config options should now interact with each other correctly. To see more information about these experimental config options, try hg help -e evolve and hg help -e topic.

Thanks to all the people involved:

Evolve: 10.3.2

  • next: remove duplicated targets when updating from an unstable changeset
  • evolve: use "served" repo filter to guess what the server will publish

Topic: 0.22.2

  • topic: don't lose any file changes when changing topic of a merge commit
  • topic: announce ext-topics-publish capability in case of SSH and HTTP too

New Evolve extension release: version 10.3.1

2021-04-25

We released a new version of the evolve extension: 10.3.1.

This extension extends core features around history rewriting and draft changesets sharing.

As usual, the release is available on PyPI and upgrade is recommended.

This is a bugfix release that addresses various issues and adds compatibility with the upcoming Mercurial 5.8. Check the changelog for details. The most notable changes are: cache reuse on different systems (such as using 64-bit and 32-bit Python to push/pull the same repo) should now be safe, and hg next now handles unstable changesets with topics more correctly.

Thanks to all the people involved:

Evolve: 10.3.1

  • cache: fix corruption issue when mixing 32-bit and 64-bit environments

  • next: unstable changesets with a different topic are no longer targets for hg next as long as it's invoked without --no-topic flag

  • next: when some potential targets are unstable, ask user which changeset they want to update to (only mixing stable and unstable when --evolve flag is given, which is the default)

  • packaging: default to using Python 3 in Makefile

Topic: 0.22.1

  • compatibility with Mercurial 5.8

New Evolve extension release: version 10.3.0

2021-03-12

We released a new version of the evolve extension: 10.3.0.

This extension extends core features around history rewriting and draft changesets sharing.

As usual, the release is available on PyPI and upgrade is recommended.

This is a feature release that comes with a variety of improvements and bugfixes. Check the changelog for details. The most notable changes are: improved rewind logic in cases involving folds, an experimental ability to perform evolution in-memory (only on Mercurial 5.6 and newer), improved content-divergence resolution logic in cases involving parent changes.

More details on in-memory evolve can be found in hg help -e evolve.

Thanks to all the people involved:

Evolve: 10.3.0

  • doc: document stack as a substitute for MQ's qseries
  • doc: document revsets provided by evolve extension

  • evolve: add a experimental.evolution.in-memory config for running evolve in memory (hg >= 5.6)

  • evolve: improve content-divergence resolution that involves parent changes
  • evolve: preserve wdir parent when using hg evolve --stop

  • obslog: clarify the command name in the help,

  • pdiff, pstatus: drop some irrelevant command flags inherited from hg diff and hg status respectively

  • rewind: detect and abort on cases when we rewind to changesets that are predecessors / successors of each other

  • rewind: when user gives only some parts of a fold, include the other parts as well, or abort if they are missing from local repo

Topic: 0.22.0

  • doc: change topic phrase 'disappear' to 'fade out'

New Evolve extension release: version 10.2.0

2021-02-01

We released a new version of the evolve extension: 10.2.0.post1.

(There was an issue with the 10.2.0 release, so we made a 10.2.0.post1)

This extension extends core features around history rewriting and draft changesets sharing.

As usual, the release is available on PyPI and upgrade is recommended.

This version introduces support for Mercurial 5.7 and comes with a variety of improvements and bugfixes. Check the changelog for details. The most notable change are: an improved logic to decide the resolution parent for content-divergence resolution in case only one of the changesets was moved from its original position, and better handling of topics when rebasing.

Thanks to all the people involved:

Evolve: 10.2.0

  • compatibility with Mercurial 5.7

  • doc: update the MQ To Evolve guide and fix build warning for index.rst

  • evolve: improve resolution of some case of parent divergence

  • evolve: respect command-templates.oneline-summary if configured
  • evolve: remove spurious "working directory is now at ..." messages
  • evolve: various documentation improvements

  • packaging: default to building docs on Python 3

  • strip: remove experimental.prunestrip option

Topic : 0.21.0

  • performance: speed up various operations using an in-memory cache for topic

  • rebase: prevent in-memory rebase from silently dropping topic (by disabling the feature)

  • topic: rework how ctx.branch() is wrapped

  • topic: look for topic heads only when necessary, this fixes the output of e.g. hg heads when topics are in play

Modern Mercurial

Antoine Cezar, Raphaël Gomès
2020-11-26

General perception

Mercurial draws a wide range of opinions from its users, both former and current, and even from those who have not tried it yet. Such opinions range from "it's super easy, way simpler to use than Git" to "it's slow and the UX sucks", along with "Mercurial? Isn't it dead?" and "It's not on Github" and the (usually) more quiet "I like the features" and "it scales very well".

Like with most technical discussions, whether any opinion is "right" really depends on who you ask.

In the case of 15 year-old software like Mercurial, when you've asked a particular question is important, as things may have changed from when you last made up your mind. Your workflow might not be well supported, the tools you need might not exist... you might also just not like it.

Our goal with this article is to re-state some of the evolutions that Mercurial has seen over the past few years and to review some of its strong points and weaknesses.

Hopefully, you come out of this read with more up-to-date information to help you take informed decisions about the tools you're using.

Current state of affairs

Phases

Phases have been introduced to Mercurial in version 2.1 (2012) and have become one of its central workflow features.

Every changeset (or "commit") is in one of three phases: public, draft or secret.

By default, when you commit something, it's in the draft phase; that means that you can change it, move it in history and even delete it using any of the many history-rewriting commands in Mercurial (more on that later). The public phase is meant to make changesets immutable: once they're public, you cannot change anything about them, all rewriting commands will abort when used on a public changeset.

This is useful as a project management tool, much like you would use protected branches in forges such as GitLab, but built-in to your VCS and more granular. You communicate to all other contributors that these changes can be relied on for future work, your CI will not fail because your hash does not exist anymore, etc. This means that this additional protections is preventive, and saves you time in the long run.

The secret phase is simply useful to commit changes that you do not want to share, like local config files for example.

Distributed history rewriting

Because Mercurial is a distributed tool, whenever you rewrite history (change the contents, move changesets around, etc.), you run into the risk of conflicting with or erasing the work of other people.

Users of other DVCS usually come up with rules (be they enforced by humans or through an ad-hoc permissions system) to prevent the pitfalls of concurrent editing.

Mercurial takes a different approach that, to our knowledge, is unique among DVCS: distributed history rewriting should not only be possible, but safe and with very little friction.

Enter "Changeset Evolution": Mercurial keeps records of every operation you've made on your history and shares them along with the standard history. This enables automated conflict resolution when sharing history, turning a usually very tedious task into a simple one-command fix. The core concept was introduced in the evolve extension and subsequently integrated in Mercurial itself. Some commands and experimental features still reside in evolve where they can be iterated on and changed faster according to user feedback.

Old commands and extensions have been ported years ago to use the new mechanism like the standard rebase, histedit (similar to Git's interactive rebase) and absorb, if evolve is detected. This last one is a tool that automatically finds where to amend the different bits of your working directory into the right changesets and does so interactively.

You can run hg help evolve --extension to get more precise info and a list of all commands (scroll down for how to install it).

Feature branches

Mercurial has had the concept of branches since its very inception. These branches differ from what Git calls branches: they are not pointers to a changeset, but instead refer to a set of changesets that may grow over time. They are often used to separate different long-term development efforts. In the Mercurial repository itself, you have the "default" and the "stable" branches, the latter corresponding to the latest stable release and any other bugfixes since that release. A changeset always belongs to a named branch, "default" being the... default.

The Mercurial community struggled to define a nice way to handle 'topic' branches (sometimes also called 'feature' branches), especially when it comes to sharing them with other people mainly for code review or collaborating.

topic is an extension created in 2015 that addresses that problem in a way that integrates nicely with Changeset Evolution and does not get in the way of its users. While still experimental, topics are used by a lot of people from different background, in different companies and the overall feedback is overall very positive.

Topics differ from Git branches in that each changeset carries the topic information, as opposed to just the tip of the "branch". The user can clearly differentiate multiple topics, switch from one to the other, manipulate history within the current topic, etc.

Finally, topics are temporary and fade out once the changesets are published.

This is the workflow that Heptapod chose when adapting GitLab to Mercurial.

Other powerful features

Mercurial supports a functional language for selecting a set of revisions. Expressions in this language are called revsets; most commands have a --rev argument. Here are a few examples:

  • hg diff -r 1.0::. will show the diff for all revisions from the 1.0 tag up to the current revision
  • hg rebase -r my-topic -d "branch(default) and public()" will rebase all changesets in my-topic onto the public tip of the default branch.
  • hg log -r "(keyword(bug) or keyword(issue)) and not ancestors(tag())" will list changesets mentioning "bug" or "issue" that are not in a tagged release.

This allows you to build complex queries that would otherwise require writing a script or hoping that developers thought to include a certain flag on the specific command you're using.

Finally, another common and powerful feature of Mercurial is its templating system. Almost any command's output can be customized from the command line, without external tools, simplifying a lot of tasks that would otherwise require heavy scripting. Here are a two examples:

  • hg log -T"{node|short}\n" will list the short hashes of all revisions, separated by a newline
  • hg log -T. -r "author(rgomes) and date(2020)" | wc -c will output one . character for each revision Raphaël wrote in 2020. wc -c counts these dots to give you the sum
  • hg log -r 0 --template "{ifeq(branch, 'default', 'on the main branch', 'on branch {branch}')}\n" uses a conditional to test for the default branch.

Setup

Backwards-compatibility is one of the strengths of Mercurial, but also one of its weaknesses. A Mercurial 5.5 client will work against a 0.9 server just fine, and the vast majority of users will never encounter a breaking behavior change when upgrading.

However, (almost) never breaking backwards compatibility also means being stuck with the choices made as long ago as 2005, forever. This includes default configuration options and user-facing behavior, both of which can be very subjective (more on that below).

There exists a tweakdefaults config knob that enables recommended defaults for various commands, blessed by the maintainers.

Additionally, here is our take on a simple config to get you started with modern Mercurial. Other Mercurial users and contributors will have differing opinions and other suggestions, but this is what works for us.

[ui]
# Explained above
tweakdefaults = true
# Merge conflict markers in files (they look like "<<<<<<<") are more verbose
mergemarkers = detailed
# Uses the internal non-interactive simple merge algorithm for merging
# files. It will fail if there are any conflicts and leave markers in the
# partially merged file. Marker will have three sections, one from each
# side of the merge and one for the base content.
merge = :merge3

[commands]
# Require the user to pass a revision (or revset) to "push" instead of pushing
# all heads by default.
push.require-revs = true

[paths]
# Default to the current revision (`.`) when pushing to `default`. 
default:pushrev = .

[phases]
# "Server-side" configuration. If you or anyone else pushes to this Mercurial
# install, the changesets will not be published, contrary to the default.
publish = false

[extensions]
# ==== "Core" extensions ====
# Extensions that come pre-packaged with Mercurial:

# Adds the `rebase` command
rebase =
# Shelve is like Git's `stash`
shelve =
# Similar to Git's interactive `rebase`
histedit =
# Automatically finds where to amend the different bits of your working
# directory into the right changesets and does so interactively.
absorb =

# === External extensions

# These two are installed with `pip install hg-evolve`.
# Additional commands and features on top of Mercurial's "Changeset Evolution"
evolve =
# Adds support for topics and new commands like `topic`, `stack`...
topic =

[experimental]
# Force the user to specify the topic when committing.
# Use `topic-mode = random` to generate a random topic name
topic-mode = enforce
# Force Mercurial to use Changeset Evolution for all commands even when evolve
# is not found
evolution = all

Issues and ideas on how to fix them

Slowness

Mercurial suffers from a "slowness threshold": most commands will run slower on a small repository than a comparable command would in Git. This is in part due to Mercurial being written in Python, which has a slow startup, and leads the user to believe that this slowness will get worse the larger the repository.

This is however largely untrue, as the vast majority of Mercurial was designed to handle large repositories and scale well as one of its primary goals, hence why a lot of big companies use it.

This is not to say that the situation cannot be improved; in fact, a few efforts have either landed or started towards fixing that slowness threshold. Octobus contributes a lot of Rust code upstream that has already made greats strides in certain areas and will continue to do so for the foreseeable future.

In 5.6, hg status on a Mozilla working copy (about 4 times larger than Torvalds Linux):

  • hg 5.6 (no rust): 1310ms
  • hg (with rust): 408ms

In general, Mercurial has had a steady story of improving performance, like clone bundles back in 2015 (https://gregoryszorc.com/blog/2015/10/22/cloning-improvements-in-mercurial-3.6/), more recently work around discovery (push/pull performance), and a lot of various algorithmic improvements all over the code base (like https://twitter.com/octobus_net/status/1325833268135124992).

Good/bad defaults and backwards compatibility

As explained above, Mercurial is backwards compatible. For tasks like automation and upgradability, this is very useful as nothing really ever breaks, and it also extends to the command-line interface and the configuration defaults.

That last part means that we cannot really change defaults, UI and behavior that is not good anymore. New users run into the same issues over and over again, and the reputation that Mercurial has had of being user-friendly suffers.

One of the ways we could approach this problem is by splitting Mercurial into the "core" hg and the "consumer" (or interface). This is a very common practice nowadays, and separates the concerns of the actual VCS from its UI.

More specifically, this could mean publishing a new hg executable that is still compatible at the repository and network levels, but has a different interface that is less surprising and more aligned with the modern use of a VCS.

This is not an actual goal of the Mercurial project itself, but has been discussed a few times, and the work Octobus is doing with the Rust rewrite of the core could be a good start.

Conclusion

We hope you enjoyed our presentation of the Modern Mercurial, warts and all, and that you will give it a try (again). Tell us what you think, don't be shy!

New Evolve extension release: version 10.1.0

2020-10-31

We pushed a new release for the evolve extension: 10.1.0.

This extension extends core features around history rewriting and draft changesets sharing.

As usual, the release is available on PyPI and upgrade is recommended.

This version introduces support for Mercurial 5.6 and comes with a variety of improvements and bugfixes. Check the changelog for details. The most notable changes are: rewind command now has a --dry-run flag that shows what would be done; it's now possible to address the whole stack (as defined by topic extension) on newer Mercurial (5.4+) using foo#stack operator (where foo is any revset, multiple revisions supported); and there's now an experimental config option allowing a merge with an ancestor in specific cases.

Thanks to all the people involved:

Evolve: 10.1.0

  • compatibility with Mercurial 5.6

  • numerous minor changes to packaging, Makefile, README moved to README.rst

  • evolve: various improvements to content-divergence resolution

  • evolve: fix various issues with --continue when solving content-divergence
  • evolve: specify the source of config override for server.bundle1=no
  • evolve: avoid leaving mergestate after instability resolution
  • evolve: while resolving conflicts, the evolved node will no longer be a dirstate parent (won't show up in hg parents and not as @ in hg log -G, but it will show up as % with hg >= 5.4)

  • metaedit: update bookmark location when applicable

  • rewind: add a --dry-run flag

  • rewind: properly record rewind of splits as folds

Topic: 0.20.0

  • stack: support foo#stack relation revset (hg-5.4+ only)
  • merge: add an experimental.topic.linear-merge option to allow oedipus merges in some cases

New Evolve extension release: version 10.0.2

2020-09-09

We pushed a new release for the evolve extension: 10.0.2.

This extension extends core features around history rewriting and draft changesets sharing.

As usual, the release is available on PyPI and upgrade is recommended.

This is a bugfix release that addresses issues in split and uncommit commands from evolve extension and revset logic in topic extension. Check the changelog for details.

Thanks to all the people involved:

Evolve: 10.0.2

10.0.2 -- 2020-09-08

  • py3: use '%d' for formatting revision numbers in stable range cache warning (issue6390)

  • split: correctly handle discard action after previously splitting changes into more than one commit

  • uncommit: fix situation where added file would be left in a wrong state

Topic: 0.19.2

  • revset: when processing topic(REVSET), no longer return changesets without topic from REVSET

Announcing the Mercurial public Bitbucket archive

2020-08-05

Back in April of this year we announced a partnership with Software Heritage to archive all public Mercurial data on Bitbucket before they remove it permanently.

Now that the Bitbucket deadline is well behind us, we are happy to say that the public archive website we've promised is complete and visible at the following URL: https://bitbucket-archive.softwareheritage.org/

You will find all of the data we were able to download from Bitbucket's API (and some help from their VCS team, thanks!) in a very simple static index.

Note: if you want to download the entire archive (it's about 6TB) or a large portion of it, please contact us or Software Heritage before writing your own crawlers and abusing the server, we will be happy to help.

New Evolve extension release: version 10.0.1

Anton Shestakov
2020-07-31

We pushed a new release for the evolve extension: 10.0.1.

This extension extends core features around history rewriting and draft changesets sharing.

As usual, the release is available on PyPI and upgrade is recommended.

This version introduces support for Mercurial 5.5 and comes with a variety of improvements and bugfixes. Check the changelog for details. The most notable change is: push --topic and outgoing --topic now work as expected when using a topic name that doesn't exist and "." as the topic.

Thanks to all the people involved:

Evolve: 10.0.1

10.0.1 -- 2020-07-31

  • compatibility with Mercurial 5.5

  • evolve: update the template keywords section in hg help -e evolve

  • obslog: make obslog --no-origin -f work with multiple successor sets

Topic: 0.19.1

  • compatibility with Mercurial 5.5

  • topic: hg push --topic does-not-exist now doesn't try to push unrelated changesets and aborts instead

  • topic: hg outgoing/push --topic . will use current topic

Not everything is UTF-8

Raphaël Gomès
2020-06-05

Over the past few weeks I've helped a new developer get started with both Mercurial and Rust, exposing them to somewhat niche subjects that they've had (understandably) little experience with.

One of them is the encoding (or lack thereof) in Mercurial and how it affects how we write code in both Python and Rust. As easy as it was to explain the issue to said developer, in the few instances of asking around for help on implementation details (mostly to get information about what had already been done and what I needed to do myself) I've noticed that not everyone I'd interacted with outside of our circle of VCS developers even understood the problem I was trying to solve.

Please note that I am not pointing fingers or accusing anyone of being disingenuous, just about everyone I talked to was very much trying to help me and to understand what is it that I wanted to solve in the first place. I usually don't have that much trouble explaining things to people in those situations, so I figured this warranted a full blog post.

The core issue

There Ain’t No Such Thing As Plain Text

This is a quote from Joel Spolsky, most notably known as the co-founder and (until recently) CEO of Stack Overflow. It's from an article of his from 2003 called The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!). Read that one first and then come back, because it covers a lot of the "general not-VCS-related" encoding stuff that serves as a basis for the rest of this post, and it is still relevant today.

In version control software like Mercurial, we have to make no assumptions about what the contents of tracked files are and their encoding. For all we know, file foo could be a binary file, a latin-1 file, or even a mixed encoding file: it is a very real and relevant need for a VCS to be able to track and manipulate data without assuming it to be text (of any encoding).

Take the following example:

$ hg init test-repo
$ cd test-repo
$ echo -n "Raphaël Gomès" > foo  # assuming UTF-8 default
$ hg commit -Am "UTF-8"
$ iconv -f UTF-8 -t WINDOWS-1252 foo > foo2
$ mv foo2 foo
$ hg commit -Am "WINDOWS-1252"

Here, we create a new empty repository, create the (UTF-8) foo file containing my name, commit it, then convert it from UTF-8 to WINDOWS-1252, then commit that.

Running HGPLAIN= hg export here (HGPLAIN= ensures you are not customizing output with a separate diff tool, export is like git show) will show the correct bytes in each "half" of the diff if your terminal encoding is set to UTF8 or CP1252, no bytes are lost by Mercurial. Even without changing encodings in a commit, simply using an encoding other than UTF-8 like KOI-8 would be unusable if not for the diff algorithm being encoding-agnostic. Because the bytes are sent as-is by Mercurial, all the user has to do is have a terminal that has the right encoding, and everything will be fine: nowhere did the user need to provide encoding information.

But forget binary files for a minute, their diffs are usually useless compared to a hexdump and we could also use LFS for them, right? Couldn't users just convert the rest of their repositories to UTF-8 and be done with it? I think that every developer including myself would be much happier if they didn't have to consider multiple encodings and that text were UTF-8 everywhere... but the world is unfortunately more complicated than that.

Say you're designing a new VCS from scratch in Rust or, in my case, rewriting core parts of a VCS in Rust; which type do you use to manipulate file contents? If your answer was String, you've just disqualified any file that isn't UTF-8 from being tracked by your VCS at any point in the history. That means that anyone converting from Mercurial to your shiny new system will lose at least part of their history if not all of it: for example, you can't convert the nginx repo losslessly because early revisions used ISO/CEI 8859-5, not to mention any binary or mixed-encoding files (common in translation files). What type do you use to represent a file path? If your answer was String, you've made valid UNIX and Windows MBCS paths impossible to represent in your software. If your answer was PathBuf (or OsString), good guess, but it is also wrong in our use-case: file paths tracked by Mercurial need to be abstracted away from the current OS, otherwise you open yourself up to normalization and cross-OS/cross-FS compatibility issues that stem from the distributed nature of Mercurial.

EDIT 2020-06-09: The unusual reality is: most of our output has to be mixed-encoding. As mentioned in https://www.mercurial-scm.org/wiki/EncodingStrategy, hg log --patch will contain internal strings in local encoding to mark fields, UTF-8 metadata, and file contents in an unknown encoding.

Whatever the user puts in, the user gets back. It is their responsibility to have a compatible codepage/terminal encoding.

An ecosystem issue

I will be using Rust as the reference language, but this applies to all programmers of all languages, from embedded to web developers. Most of the time you might not have to take encoding into account because you're interacting with only UTF-8 as you have for the past 10 years: if it's the case, I'm happy for you.

But if you're doing anything that may handle text (or data) of unknown origin, I urge you to ask yourself "should there be a bytes API?". Too many times I've stumbled across a library that provides interesting functionality that assumed everything to be UTF-8 when there was no real need for it.

I think part of the reason is because Rust is one of the few languages that actually handles string types correctly. String, OsString, CString all play a distinct role that is needed to properly represent strings: String is for UTF-8 data, OsString for strings in your OS's representation (that may not be UTF-8), and CString for compatibility with C. This last one could die in theory in a world where C didn't exist, but Free Pascal didn't win so here we are. Because Rust makes it easy to properly handle UTF-8 data through String, developers are empowered to... sometimes do the wrong thing: in my opinion this is absolutely not a flaw in Rust, but merely a side-effect of how mis-understood encoding issues are. The decision of not having types and APIs for bytestrings in the Rust stdlib is probably the same as with any other: to keep it minimal.

Even well-known, widely used crates like regex or clap made by programmers that definitely understand the underlying issue did not have a non-String interface (regex#85, clap#262) until a few versions in because an issue was opened. There probably are other reasons why this feature wasn't implemented, but to me this underlines the lack of attention that this problem receives.

Please, look at your crates/packages/gems/whathaveyou and try to think for a minute if that UTF-8/Unicode restriction is really necessary.

Bytestring formatting

Because "There Ain’t No Such Thing As Plain Text", we do a lot of bytestring manipulation in Mercurial; in Python that would be b"this is a bytestring!", and in Rust you would use a Vec<u8> or maybe the bstr crate.

The initial question I had for the people I mentioned at the beginning of the article was as follows: is there a crate that allows me to do bytestring formatting like we use the format!() macro for String formatting? I wasn't able to find anything online in a good hour or so of searching, but I might have missed something. A particular person I interacted with was adamant that "implementing Display is enough", but Display uses std::fmt::Formatter, that only handles String. So all the format!-related macros in the Rust stdlib understandably use String, because Rust is voting for a UTF-8 future, which I am all for.

That however does not help me solve my issue. Even Python, that had bytestring formatting in Python 2, removed it in Python 3.0 and only re-introduced it in 3.5 after it was made clear that it is a very real need, albeit somewhat niche.

I'm planning on writing a macro soon, probably called format_bytes! for that very purpose and put it in a crate.

EDIT 2020-06-09: I should have put a more thorough explanation here, so here goes:

I don't intend the format_bytes! macro to have all the bells and whistles of the original one, but to use it more as a mixed-encoding concatenation helper to not have to write multiple writes to a Vec<u8> all the time, with maybe a few formatting tricks. format_bytes!(b"ascii text {} other ascii text", &vec_of_bytes), could very well end-up being the syntax. This assumes an ASCII bytestring as the format string (as is the policy in Mercurial, see EncodingStrategy), and any slice of bytes as argument(s).

If anyone already has similar functionality somewhere, I'd be happy to not do this work, otherwise I'll keep you posted.

Archiving Bitbucket Content: Status Report

Pierre-Yves David
2020-05-26

We are continuing our effort to archive all of Bitbucket Mercurial content before Atlassian delete it all. And we are making great progress since we have already retrieved all existing public content. So far we identified a total of 244,609 public project using Mercurial.

Currently archived

In these 244,609 projects we identified:

  • 244,569 source code repositories,
  • 81,154 wiki repositories,
  • 213,345 issues in there bug trackers (with 603,334 comments, and 4445 images),
  • 86,372 pull requests and their comments and 1,320 images,
  • 45,977 project attachments (for about 750GB of data).

Looking at the 5.5TB of repositories more closely gives interresting data:

  • 27,662 (8.5%) repositories went missing either deleted or made private since we observed them in February,
  • 98 (0.03%) repositories are inaccessible (Bitbucket itself crashing trying to access them),
  • about a couple hundred repositories are still receiving pushes, less than 1 week away from the initial deadline.

What the plan for this content?

Ultimately, we plan to offer a set of tarballs for each project. People will be able to download:

  • the main mercurial repositories,
  • the wiki repositories,
  • the set of metadata associated to a project (as json),
  • the individual project attachment,

However… we will only do this once the set of data is frozen. So for the coming months, our server will tirelessly gather all new content that gets pushed to Bitbucket. Starting of July 1st, when the Mercurial content stops getting updated. We will start building and servicing tarballs through the Sofware Heritage infrastructure.

Anything else?

Yes! In addition to offering tarballs, we are also planning to import all this content inside the Software Heritage database. This will provide us with an excellent corpus to make Software Heritage's Mercurial importer more robust. This effort can start right now, so stay tuned for more news soon.

New Evolve extension release: version 10.0.0

Pierre-Yves David
2020-05-09

We pushed a new release for the evolve extension: 10.0.0.

This extension extends core features around history rewriting and draft changesets sharing.

As usual, the release is available on PyPI and upgrade is recommended.

This version introduces support for Mercurial 5.4 and comes with a variety of improvements and bugfixes. Check the changelog for details. The most notable changes is: a new obslog flag --origin that shows predecessors instead of successors which is enabled by default in this release (use --no-origin for the old behavior)> The --no-origin mode might be dropped in the future. In adition, this release comes with a new a new template keyword {obsorigin} and improved divergence resolution.

Thanks to all the people involved:

Evolve: 10.0.0

  • compat: clean up old compatibility code
  • compat: compatibility with Mercurial 5.4
  • evolve: add {obsorigin}, a template keyword that works similarly to {obsfate}, but shows predecessors of a changeset
  • evolve: fix permissions of new cache files using SQLite
  • evolve: always create commit when resolving divergence
  • evolve: handle relocation during divergence resolution producing no changes
  • evolve: provide cache to successorssets() in more cases
  • obslog: make --all and --filternonlocal work properly with --no-graph
  • obslog: add --origin flag to show predecessors instead of successors
  • obslog: make --origin flag the default
  • stablerangecache: sanity check subranges

Topic 0.19.0

  • auto-publish: issue the capabilities in all cases
  • topic: provide cache to successorssets() in one more case

BitBucket, Alternative and Archival

Georges Racinet
2020-04-23

After BitBucket's decision to discontinue support for the Mercurial version control system, and to remove Mercurial-based repositories by June 30, 2020, we decided to step up and make sure that all these precious projects would not be lost.

We have created a new hosting platform for active projects to move to, and we are archiving all public projects, in order to include them in the Software Heritage Archive, the universal archive of source code.

We are delighted to announce today that the intense work we have done over the past months is now bearing fruits.

Heptapod: a new home for Mercurial-based projects

We're excited to open up to the world, in a joint effort with Clever Cloud, heptapod.host: the ultimate Mercurial BitBucket replacement. Heptapod.host is a brand new code hosting platform where active projects using Mercurial can now migrate seamlessly.

Heptapod sports a powerful and intuitive BitBucket import feature that allows you to recover not only the repository itself, but also issues and pull requests. Not only do they become Heptapod (GitLab) issues and merge requests, but they also preserve their original numbering.

This feature will help reduce the disruption for lots of project: it is indeed common that commit messages reference issues or pull requests by number, be it in an explicit form (#123) or as a link (https://bitbucket.org/my/repo/issues/123). How many of us developers have been frustrated trying to understand a change done a while ago because such references have become useless in the meantime? How many emails or spreadsheets have turned into useless noise because of that? After an import to Heptapod, these references will still be usable. Learn more in our Heptapod.host presentation at FOSDEM.

In addition to the commercial instance, Heptapod.host, we've also launched a free instance for open source project at foss.heptapod.net. If you are an open-source project, please submit a hosting request.

Archiving data and metadata of 250 000 projects

Over 250 000 mercurial repositories are at risk of disappearing in the next few weeks, and we could not let this happen. In collaboration with Software Heritage, we've started a project to archive hard copies of all public project data and metadata. This means the source code repositories, of course, but also the wikis, the issues, the pull requests, attachments, associated comments and inline images.

The project is public and available on heptapod.host. You can track its progress on the software heritage mailing list. After June 30th, once BitBucket deletes the archived result, we will make them available to the public.

Importing source code in the Software Heritage database

For the last few months, we have been working with Software Heritage to improve its capability to import Mercurial repositories.

Thanks to an European grant managed by NLNet, we are developing specialized connectors that allow for efficiently archiving software developed using the Mercurial version control system. This effort will ensure that no Mercurial public repository will be lost.

Hurry up!

Would you like to import your BitBucket Mercurial projects? Register to the Clever Cloud platform, create a Clever Cloud organization, and sign-in to the Heptapod instance.

Contact-us for more information!

New Evolve extension release: version 9.3.1

Pierre-Yves David
2020-04-08

We pushed a new release for the evolve extension: 9.3.1.

This extension extends core features around history rewriting and draft changesets sharing.

As usual, the release is available on PyPI and upgrade is recommended.

The release contains improvements and bugfixes on multiple aspects. Check the changelog for details. The most notable changes are: fixes and improvements to discovery and exchange algorithm (and pushing in general) and improved support for Mercurial 5.3.

Thanks to all the people involved:

Evolve: 9.3.1

  • compat: make touch-noise and rewind-hash extra field be bytes

  • obsexchange: avoid sending too large request to http server

  • obsdiscovery: server no longer aborts with a 500 error if client sends a request without obscommon
  • obsdiscovery: avoid considering locally hidden changeset
  • single-heads: ignore obsolete section when enforcing one head per branch

  • evolve: improved behavior when evolving above the result of a split

  • evolve: checking for new head on push is no longer confused by mixed branches (or topics)

Topic 0.18.1

  • topic: fix auto-publish=abort with server that auto-publishes bare branches

Augmenting Software Heritage archiving capabilities

2020-03-24

Paris, Tuesday, March 24th 2020 — Two ambitious French companies, Octobus and Tweag, will contribute new key open source components to Software Heritage, a non profit initiative started by Inria, in collaboration with UNESCO.

Software Heritage’s mission is to collect, preserve and share all software that is publicly available in source code form, which is an important part of humankind’s heritage. “We are delighted to welcome Octobus and Tweag, with support from an European grant managed by NLNet, as contributors to Software Heritage’s long term mission”, said Roberto Di Cosmo, director of Software Heritage, “Building the universal archive of software source code is a humbling undertaking, and the participation of leading experts to our development effort is of paramount importance to succeed”.

Octobus will develop specialised connectors that allow to efficiently archive software developed using the Mercurial version control system. "We are happy to put ten years of Mercurial expertise at the service of a great mission. We will improve Software Heritage capability to import Mercurial repositories, and the experience gained in this process will allow us, Mercurial developers, to better understand our user base." said Pierre-Yves David, CEO of Octobus, "Working with Software Heritage to refine their data model is a great opportunity to learn more about the other systems and to find new way to make Mercurial an even better version control system”.

Tweag will develop the components needed to ensure that the source code used to build packages using the Nix functional package manager is systematically archived in the Software Heritage archive. “Reproducible components are the basis of collaboration and progress in software engineering. It is this belief that made Tweag a pioneer of reproducible software systems at industry scale and a fervent supporter of Nix, a tool that allows putting such systems into place.”, said Mathieu Boespflug, CEO of Tweag, Collaborating with Software Heritage to combine Nix with a long-term source code archive is really the natural step forward”.

“The Next Generation Internet Initiative is a significant R&D effort backed by the European Commission to make the internet more trustworthy, resilient and sustainable. The internet is an amazing global technical and social resource, but it was built for the short term - like a house of cards", says Michiel Leenaars, director of Strategy at NLnet Foundation and project lead for NGI Zero. "Everything that we can click to, download or link to today, may be gone tomorrow. Software Heritage secures the technology commons of today for the long future ahead of us, so how could we not support such a forward-looking endeavour given our mission to reimagine the internet for the next millennium and beyond?

"The European Commission appreciates the forward looking nature of Software Heritage, and we are very happy to contribute to the long term sustainability of the technology ecosystem", states Oliver Bringer, Head of the Next Generation Internet Unit at the European Commission. "This is the first R&D programme that takes such a step, and it is a natural fit for the NGI initiative. We need solid technological foundations to build the internet of tomorrow and an open repository of publicly available software source code is clearly one of these foundations."

About Octobus

Octobus is a company focussing on commercial support for the Mercurial source control system. Their work ranges from building a hosting solution (https://heptapod.net) to ad-hoc development for companies who need performance boosts, custom features or workflow consulting. Octobus provides a significant part of current Mercurial development. In particular, for a couple of years, Octobus drove the effort to use the Rust programming language in Mercurial, improving both the performance and the robustness of the codebase. You can learn more at https://octobus.net/.

About Tweag

Tweag is a software innovation lab that helps deep tech startups quickly scale their engineering performance and execute on high-risk, high-reward projects with confidence. Tweag’s team of engineers are behind today’s boldest innovations in machine learning, distributed computing and biotech. Applying mathematics, computer science and the methods of open source to software engineering, Tweag stretches what’s possible for clients. Learn more at tweag.io.

About NLnet Foundation

NLnet Foundation is an independent organisation whose means come from donations and legacies. The history of NLnet goes back to 1982 when Teus Hagen announced the European Unix Network (EUnet) which became the first public wide area network in Europe and the place where internet was introduced to Europe. NLnet also pioneered the worlds first dial-in and ISDN infrastructure with full country coverage. In 1997 all commercial activities were sold to UUnet (now Verizon) and since that time NLnet has focused on supporting the open internet and the privacy and security of internet users. The articles of association for the NLnet Foundation state: "to promote the exchange of electronic information and all that is related or beneficial to that purpose". NLnet's core business is to support independent organizations and people that contribute to an open information society and to a safe, secure and open internet. NLnet currently spearheads NGI Zero, a unique consortium that funds privacy and trust enhancing technologies and improves search and discovery as part of the Next Generation Internet initiative of the European Commission. Please visit https://nlnet.nl

About Software Heritage

Software Heritage is a non-profit initiative with the stated mission is to collect, preserve and share all software that is publicly available in source code form. Started by Inria, in collaboration with UNESCO, Software Heritage is building the largest public archive of software source code, for the benefit of society as a whole. For more information, please visit https://www.softwareheritage.org.

New Evolve extension release: version 9.3.0

Pierre-Yves David
2020-03-04

We pushed a new release for the evolve extension: 9.3.0.

This extension extends core features around history rewriting and draft changesets sharing.

As usual, the release is available on PyPI and upgrade is recommended.

The release contains improvements and bugfixes on multiple aspects. Check the changelog for details. The most notable change are: Improved support for Mercurial 5.3, drop of support for Mercurial 4.5, and a new configuration in topic to hide changesets with a topic set to clients without topic support.

Thanks to all the people involved:

Evolve: 9.3.0

9.3.0 -- 2020-03-04

  • compat: compatibility with Mercurial 5.3
  • compat: drop compatibility with Mercurial 4.5
  • compat: cleanup old compatibility code for Mercurial < 4.5

  • evolve: extensive cleanup of functions, template keywords and compatibility code related to obsfate and successorssets

  • evolve: add content divergence checking to the standard pre-rewrite check
  • evolve: improve the message associated with content divergence
  • evolve: correctly handle --continue and --stop when relocating content-divergent changesets

  • exchange: dropped more bundle-1 related dead code

  • help: categorizing evolve and topic commands

  • obslog: make templatable (more change coming in the next version)

  • obslog: show folds and use more specific verbs when possible

Topic 0.18.0

  • topic: add a experimental.topic.server-gate-topic-changesets config

Talks we gave at FOSDEM 2020

2020-02-5

Octobus was at FOSDEM 2020, which was, as usual, a success for the free and open-source community, and tons of fun.

Georges gave a lightning talk about Heptapod, our friendly fork of Gitlab with support for Mercurial. The video should become available in the coming days at this link. In short, Heptapod is more alive than ever, with a free instance for FOSS projects that qualify and a commercial option coming very soon. Fear not, Heptapod is FOSS itself, you can also run it self-hosted. See the announcements for more information: https://heptapod.net/category/announcements.html

Georges getting ready for his lightning talk:

Georges getting ready for his lightning talk

Raphaël gave two talks on the subject of bridging Python and Rust together, one in the Python devroom and one in the Rust devroom. Interesting progress is being made in Rust/Python FFI within rust-cpython, Mercurial is always getting faster, and even more so in the near future.

Audience at Raphaël's talk in the Python devroom:

Audience at Raphaël's talk in the Python devroom

A few minutes before Raphaël's talk:

A few minutes before Raphaël's talk

I hope you enjoyed your FOSDEM! If you could not come, we hope to see you there next year. Who knows, we might have more good news for Mercurial and FOSS in general. ;)

New Evolve extension release: version 9.2.2

Pierre-Yves David
2020-01-31

We pushed a new release for the evolve extension: 9.2.2.

This extension extends core features around history rewriting and draft changesets sharing.

As usual, the release is available on PyPI and upgrade is recommended.

This version introduces support for Mercurial 5.3, fixes installation issues on Python 3, improves various aspects of the documentation and comes with a variety of others bugfixes.

Thanks to all the people involved:

Evolve: 9.2.2

Bug fixes:

  • debian: allow to build with python 3
  • evolve: fix content-divergence resolution when p1 is null (issue6201)
  • evolve: make sure divergence resolution doesn't undo changes (issue6203)
  • evolve: preserve date when resolving content-divergence (issue6202)
  • metaedit: don't change commit date by default (issue5994)
  • pick: don't create any successors when there were no changes (issue6093)
  • py3: fix documentation generation
  • py3: fix setup.py --version
  • py3: fix some exception formatting

Improvements:

  • amend: abort when both --patch and --extract are passed
  • compatibility for changes in upcoming Mercurial 5.3
  • documentation: update text and add missing figures
  • evolve: also merge the date field when solving content-divergence
  • evolve: use more often seen metavariables in command synopsis strings
  • rewind: preserve date

Topic 0.17.2

  • topic: add more options to command synopsis
  • topic: fix a bug in logic of choosing destination for hg update
  • topic: fix a bug in logic to choose destination when no active topic

News from Heptapod

Georges Racinet
2020-01-28

For the past 6 months, we have put a lot of effort in Heptapod, the friendly fork of GitLab with Mercurial support. A lot of important features have been added, like Continuous Integration (CI), SSH support and Bitbucket import. We also secured a partnership with CleverCloud, a hosting company who is sponsoring the new foss.heptapod.net instance and preparing a commercial offer for Heptapod.

The coming months will keep being full of Heptapod related news. We have work planned to remove the internal usage of hg-git in favor of direct communication between Gitlab and Mercurial and more efforts to catch up with the moving target that is the current GitLab version. To follow Heptapod progress more closely, check the Heptapod blog!

New Evolve extension release: version 9.2.0

Pierre-Yves David
2019-09-28

We pushed a new release for the evolve extension: 9.2.0

This extension extends core features around history rewriting and draft changesets sharing.

As usual, the release is available on pypi and upgrade is recommended.

This version introduces support for hg abort, beta support for python-3.6+ and various other improvemend and bugfix.

Thanks to all the people involved:

Evolve: 9.2.0

Bug fixes:

  • evolve: check that relocating makes sense in _solvedivergent() (issue5958)
  • evolve: test that target is not orig in _solveunstable() (issue6097)
  • fold: check allowdivergence before folding obsolete changesets (issue5817)
  • obslog: correct spacing of patch output with word-diff=yes (issue6175)
  • evolve: avoid possible race conditions by locking earlier

Improvements:

  • prune: improve documentation for --pair
  • python3: beta support for Python 3.6+ (thanks to Ludovic Chabant, Martin von Zweigbergk and Raphël Gomès for their hard work)
  • prune: clarify error message when no revision were passed,
  • abort: add support for evolve and pick to hg abort (hg-5.1+)
  • rewind: add --keep flag to preserve working copy

Topic 0.17.0

  • stack: make sure to preserve dependencies, fixes certain complex case

Heptapod's default workflow

Pierre-Yves David
2019-09-04

Why having a clear default workflow is important?

When we started working on Heptapod, our version of Gitlab with support for the Mercurial version control, an important question arose: What should be the recommended workflow?

Great versatility and extensibility are two of Mercurial strengths. Howerever these properties also have their drawbacks, as different groups of users might end up using the tool in quite different ways, hence forcing collaboration platforms to deal with many variants.

Selecting one standard workflow for Heptapod is important to avoid these traps. Having one, unified and smooth, way to interact with Heptapod makes such platform much more powerful and is much more comfortable to users.

Furthermore, selecting one default workflow means we can optimize the default user experience towards it. This helps us to offer the best to people sticking to that workflow.

However, selecting a default workflow and optimizing for it does not mean blocking all other alternatives. It is possible to configure Heptapod to unlock other behaviors if users want to.

What needs should this workflow meet?

We want a workflow that:

  • stick to Mercurial core concepts and empower them as much as possible,
  • fit the popular Merge Request system of GitLab,
  • keep changesets as first class citizens,
  • accomodate both simple usecase and more advanced usage in a smooth way,
  • in particular, smoothly compatible with Mercurial history editing capability.

What choices did we make ?

To accommodate these goals, we made a handful of choices in the default Heptapod configuration and user experience:

  • Server side, Each name (eg: named branches) cannot have multiple heads.

  • The branching model uses named branches and topics. Long lived branches are based on named branches, feature branches are based on topic-branches,

  • Phase movements use a simple scheme based on the branching model: changesets get published upon pushes unless they have a topic

  • Support for changeset evolution is enabled by default.

  • Bookmarks aren't allowed by default.

What do these choices means

Using a single head per name

While having multiple heads locally offers a great flexibility, it gets confusing for users and tooling when a name points to ambiguous content. Users might learn to work around it but tooling usually can't. To address this, Heptapod server will only accept a single head per named branch. This also applies to topics too, each topic will need to have a single head server side.

To give an example: Having one single head for each branches and topics means the Merge Request logic can be driven by these names without fear of ambiguity or sudden/drastic changes in their meaning.

To accommodate older repositories or different workflow, this constraint can be lifted through configuration. However this might prevent the use of some features of Heptapod on those branches.

More details in our FAQ

Branching model and mutability

We decided to use named branches and topics for our branching model as both fit well in the general philosophy of Mercurial.

In Heptapod, named branches are meant to be used for long term branches, their content is immutable and they come with all the clarity and advantage of what they usually provide. However, named branch are not ideal for short lived feature branches, so we decided to use topics for that. Topics are very similar to named branches but are designed for a shorter and automatic life cycle.

Each changeset can be assigned a topic in addition to its named branch. All changesets within this topic will make a small feature branch that Heptapod will use for Merge Request. However, once the Merge Request gets merged, the changesets move to the public phase, the topic "fades away" and gets fully integrated to its target named branch. This lightweight life cycle is appealing: opening and closing Merge Requests comes at a very low overhead and won't leave unwelcome long term marks in the history.

In addition, since all changesets are explicitly in a topic, the content of each feature branches is clearly defined for both Heptapod, server side and the user locally. This was an important criteria for us.

The strong link between Heptapod Merge Request workflow and Mercurial concept works both way. Users can decide to locally publish a feature branch and Heptapod will be able to acknowledge the operation once pushed.

Users can also decide to use named branches only. The Merge Request system fully supports a branch to branch workflow.

Mutability and evolution

An important choice we made was to enforce everything outside of topics to be public. This comes with various advantages.

First, it provides backward compatibility with how Mercurial has been working for many years. People using default Mercurial configuration before going to Heptapod, without topic, will keep the same behavior. Whatever they push will move to the public space. However it makes it easy for them (even a subset of them) to exchange draft feature branches, by simply start using topic for it. This way, instead of having to deal with the combination of multiple dimensions (branch type + mutability), the different concepts align to expose a clear boundary and behavior: feature branch are mutable, long term branch are not.

Second, having the phase movements deeply integrated with the general workflow comes with an additional benefit. The amount of draft changesets gets naturally controled since they need to get public to reach the main branches. In such context, enabling changesets evolution for everybody seems like a sane default. The topic space keeps on being mutable and can be amended and rebased, while the named branch space provides strong immutability guarantees and will need explicit merge changesets to integrate content from one branch to another.

(Of course, users can also change their instance configuration to whatever publication mode they want to)

No bookmark by default

The branching model and its integration to GitLab Merge Request is based exclusively on branches and topic, we do not support bookmarks yet. Bookmarks have a long history of being less well integrated with other Mercurial concepts and still have some issues. We believe that branches and topics cover enough use cases to avoid dealing with an extra concept here, especially since it could be a source of technical issues.

Still, we might gradually improve bookmarks support in Heptapod as time passes, depending on what kind of needs people express. The most probable first step would be limited support for bookmarks whilst providing a mirror of an external Git source repository.

New Evolve extension release: version 9.1.0

Pierre-Yves David
2019-07-29

We pushed a new release for the evolve extension: 9.1.0

This extension extends core features around history rewriting and draft changesets sharing.

As usual, the release is available on pypi and upgrade is recommended.

This version introduces support for Mercurial 5.1 and comes with a variety of improvements and bug fixes. Check the full changelog below for details. Significant progress toward Python 3 compatibility has been made too, next version might come with experimental Python 3 support.

Thanks to all the people involved:

Evolve: 9.1.0

Bug fixes:

  • pick: no longer forget file in case of conflict (issue6037),
  • pick: properly report and cleanup "unfinished state",
  • prune: don't update working copy parent if pruned revisions are unrelated (issue6137),
  • prune: update to the successor of wdir also with --pair/--biject (issue6142),
  • evolve: properly prune changeset with no change in case of conflict (issue5967),

Improvements:

  • prune: spell --successor flag without any unnecessary shortcuts,

  • evolve: compatibility with Mercurial 5.1,

  • evolve: use the same wording as core in case of unresolved conflict,
  • evolve: minor output message improvements,
  • evolve: improve hg evolve --all behavior when "." is obsolete,

  • touch: detect resulting divergence in more cases (issue6107),

  • touch: now works with merge commit too,
  • rewind: fix behavior with merge commit,
  • fold: allow fold with merge commit,

  • metaedit: now works on merge commit.

Topic 0.16.0

  • topic: fix confusion in branch heads checking logic.

Sharing references between Python and Rust

Raphaël Gomès
2019-07-25

In 2018, the Mercurial project decided to use Rust to improve performance and maintainability of previous high-performance code, you can read more about it in the Oxidation Plan.

While one may argue that Rust took inspiration from Python in some aspects of its semantics, the two languages don't share a lot of similarities at a lower level. Rust strict memory borrowing rules and default immutability don't play very nice with some of Python's features: its dynamic typing, mutability rules, classes, garbage collecting, to name the big ones.

We have faced some interesting challenges when bridging the Python implementation with the new Rust code, and this is one that I have not found any literature about.

Technological stack

There are two main crates used for bridging CPython and Rust: rust-cpython and PyO3. The latter is a fork of the former that happened when rust-cpython was seemingly abandoned, and has more features like support for properties and an arguably nicer syntax thanks to (then unstable) procedural macros.

We are however using rust-cpython because PyO3 does not (yet) compile on stable Rust, but the idea behind this article is still relevant regardless of the bridge used.

The issue at hand

During the rewrite of some core parts of Mercurial, we had to present a class-like interface to Python that would run Rust code. More often than not, that class implemented __iter__, which requires Python to hold a reference to a Rust iterator. Whenever we faced that issue, we just copied the entire structure to something Python understands, which is terrible. But that was good enough for the purpose of continuing the rewrite.

However, as the frontier between Python and Rust got more defined, we knew that we couldn't wait any longer to solve that issue. My colleague Georges (over at blog.racinet.fr) took the opportunity of a long train trip to dig into shared references with a minimalist example.

A minimalist example

Let's implement a stripped-down version of Python's set that only works with int, for simplicity's sake.

The basic features are pretty easy to implement with rust-cpython and would look like this:

extern crate cpython;
use cpython::*;

use std::cell::RefCell;
use std::collections::HashSet;

type Inner = HashSet<u32>;

py_class!(class RustSet |py| {
    data hs: RefCell<Inner>;

    def __new__(_cls) -> PyResult<RustSet> {
        Self::create_instance(py, RefCell::new(Inner::new()))
    }

    def __contains__(&self, v: u32) -> PyResult<bool> {
        Ok(self.hs(py).borrow().contains(&v))
    }

    def add(&self, v: u32) -> PyResult<PyObject> {
        self.borrow_mut(py)?.insert(v);
        Ok(py.None())
    }

    def extend(&self, iterable: &PyObject) -> PyResult<PyObject> {
        let mut hs = self.hs(py).borrow_mut(py)?;
        for vobj in iterable.iter(py)? {
            hs.insert(vobj?.extract::<u32>(py)?);
        }
        Ok(py.None())
    }
});

The py_class! macro helps us define a Python class that we insert in a shared library that Python will treat like a normal Python module. How exactly that is done is explained in rust-cpython's documentation and the repository of this experiment (which uses Heptapod, Gitlab with beta Mercurial support!) is available here.
In short, the hs data attribute will hold all the data in a Rust HashSet, the rest is basic encapsulation. Why a RefCell is used is also explained below.

So far, nothing really exciting is happening, but this allows us to define the following in a Python script using the generated .so as .shared_ref.

import sys

try:
    from .shared_ref import RustSet
except ImportError:
    sys.stderr.write(
        "Rust extension not found. Please run 'cargo build' first.\n"
    )
    sys.exit(1)


def test_basic():
    """Test basic scaffolding API: not needing to share refs."""
    rs = RustSet()
    rs.add(3)
    assert 3 in rs
    assert 4 not in rs
    rs.extend(x**2 for x in range(10))
    assert 4 in rs
    assert 81 in rs
    assert 65 not in rs


def run():
    test_basic()

Now on to the good stuff.

We need an iterator class that exposes __next__ and __iter__ to use from the Python side.

use std::collections::hash_set::Iter;

py_class!(class RustSetIterator |py| {
    data hs: RustSet;
    data it: RefCell<Iter<'static, u32>>;

    def __next__(&self) -> PyResult<Option<u32>> {
        Ok(self.it(py).borrow_mut().next().map(|r| *r))
    }

    def __iter__(&self) -> PyResult<Self> {
        Ok(self.clone_ref(py))  // `clone_ref` gives a new Python reference
    }

});

If RustSet is the Python class, the hs data attribute is an instance of the class. it is a Rust iterable (here from the hash_set module) of u32, just like our Inner type, with a 'static lifetime.

But, you might say, those integers are not static, they will be defined at runtime! Well, since Rust has no way of knowing what Python does, everything must be Send + 'static, hence the use of a RefCell (or Cell, for simpler types) whenever we want our Python object to hold a reference to Rust data.

However, just writing 'static does not magically make the compiler happy. For that, we need to enter the world of unsafe, one of the major use cases of is FFI code just like what we're doing.

Our RustSet class will then implement __iter__ the following way:

def __iter__(&self) -> PyResult<RustSetIterator> {
    let ptr = self.hs(py).as_ptr();
    let as_static: &'static Inner = unsafe {&*ptr};

    RustSetIterator::create_instance(
        py,
        self.clone_ref(py),
        RefCell::new(as_static.iter())
    )
}

Here, we take a raw pointer to our inner HashSet and (unsafely) tell Rust that its lifetime is 'static, which means that we are forgoing the help of the borrow-checker and we will have to do a bit of manual work to make all of that work.

For the time being, we have a basic iterator interface, which looks like:

def test_iter():
    rs = RustSet()
    start_count = sys.getrefcount(rs)  # should be 2 (see Python doc)
    rs.extend(range(4))

    it = iter(rs)
    assert sys.getrefcount(rs) == start_count + 1
    assert set(it) == {0, 1, 2, 3}
    del it

    assert sys.getrefcount(rs) == start_count
    it2 = iter(rs)
    del rs
    assert set(it2) == {0, 1, 2, 3}

    del it2

Nice. But this is just reading from the iterator, and our naive lifetime trick will not work for very long once mutation is involved.

Let's add a clear method to introduce mutation of the data now shared between Python and Rust.

def clear(&self) -> PyResult<PyObject> {
    let mut hs = self.hs(py).borrow_mut();
    hs.clear();
    // Force freeing of underlying memory to underline the risk of
    // segfault
    hs.shrink_to_fit();
    Ok(py.None())
}

As is said in the inline comment, we also force the inner HashSet to free its unused memory to trigger a segfault if Python still tries to access it.

This allows us to showcase the bug in a simple Python test:

def test_race_safety():
    rs = RustSet()
    # have Rust allocate some real amount of memory
    rs.extend(range(10000))
    it = iter(rs)

    # Trigger freeing the underlying memory
    rs.clear()

    next(it)  # segfault

Reference counting

We need to implement a higher-level system for ensuring memory safety. Let's add a reference counter to our RustSet to prevent mutation when Python holds a reference to the unsafe pointer.

Inside RustSet:

// ...
/// Replaces the previous `use`
use std::cell::{Cell, RefCell};
// ...
data leak_count: Cell<usize>;
// ...
def __new__(_cls) -> PyResult<RustSet> {
    Self::create_instance(py, RefCell::new(Inner::new()), Cell::new(0))
}
def add(&self, v: u32) -> PyResult<PyObject> {
      /// Changed from self.hs(py).borrow_mut()
    self.borrow_mut(py)?.insert(v);  
    // ...
}
def extend(&self, iterable: &PyObject) -> PyResult<PyObject> {
    /// Changed from self.hs(py).borrow_mut()
    let mut hs = self.borrow_mut(py)?;
      // ...
}
def __iter__(&self) -> PyResult<RustSetIterator> {
    RustSetIterator::create_instance(
        py,
        self.clone_ref(py),
        RefCell::new(self.leak_immutable(py).iter()),
        Cell::new(false),
    )
}
def clear(&self) -> PyResult<PyObject> {
      /// Changed from self.hs(py).borrow_mut()
    let mut hs = self.borrow_mut(py)?;
    // ...
}

For our private functions, we can step out of the py_class! macro and create the following impl block:

/// Replaces the previous `use`
use std::cell::{Cell, RefCell, RefMut};

impl RustSet {
    fn leak_immutable(&self, py: Python) -> &'static Inner {
        let ptr = self.hs(py).as_ptr();
        self.leak_count.replace(self.leak_count.get() + 1);
        unsafe { &*ptr }
    }
    fn borrow_mut<'a>(&'a self, py: Python<'a>) -> PyResult<RefMut<Inner>> {
        match self.leak_count.get() {
            0 => Ok(self.hs(py).borrow_mut()),
            _ => Err(AlreadyBorrowed::new(
                py,
                "Can't mutate while there are immutable \
                references in Python objects",
            )),
        }
    }
    fn decrease_leak_count(&self, py: Python) {
        self.leak_count
            .replace(self.leak_count.get().saturating_sub(1));
    }
}

Here leak_immutable does the same lifetime extension "trick" that we did earlier, but also increments the leak_count data attribute to keep track of references. Of course, there is also a function to decrease the leak count.

The more interesting bit is the borrow_mut function, which encapsulates the same function of the inner HashSet and checks if the leak count is 0 before allowing Python to borrow, raising a custom Python exception otherwise. This very simple mechanic is only possible because we hold the reference to the GIL (via py, of type Python) and are thus guaranteed a single-threaded context.

Inside our RustSetIterator, we can replace our __next__ method to make use of our newly defined reference counter:

data done: Cell<bool>;

def __next__(&self) -> PyResult<Option<u32>> {
    if self.done(py).get() {
        return Ok(None);
    }
    Ok(match self.it(py).borrow_mut().next() {
        None => {
            self.done(py).replace(true);
            self.hs(py).decrease_leak_count(py);
            None}
        Some(&r) => Some(r)
    })
}

We use a simple boolean to determine if the iterator is done, separating the lifetime of the iterator from that of its data, as shown in this example:

def test_race_safety():
    rs = RustSet()
    # have Rust allocate some real amount of memory
    rs.extend(range(10000))
    it = iter(rs)

    # Trigger freeing the underlying memory
    try:
        rs.clear()
    except AlreadyBorrowed:
        pass  # \o/
    else:
        raise AssertionError("Should not have been able to clear RustSet "
                             "instance while holding an iterator on it")
        next(it)  # that would be a segfault

    # Consume iterator
    assert len([x for x in it]) == 10000
    rs.clear()

    # the consumed iterator is actually still usable (doesn't need the
    # data anymore to raise StopIteration)
    assert [x for x in it] == []

We are unable to mutate our RustSet when an iterator still refers to its data, hurray!

Now, this differs from the Python behavior, let's see what happens if we replicate the example with a standard set:

>>> s = set([1, 2, 3])
>>> it = iter(s)
>>> s.clear()
>>> next(it)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: Set changed size during iteration

Python does not complain when we mutate the set, only when we try to use the iterator, which is arguably a little too late. Most well written Python code should not have to adapt too much to this change to a stricter behavior.

However there exists some use cases where this "feature" it is useful; I've seen those use cases in Mercurial code where this implementation falls apart because some iterator is never depleted, so you add a del my_iterator which only really fixes the Python implementations that are reference counting... blablabla. Plus, "breaking user space" is always a bad idea unless really really important, so we will need to improve on that later.

Harnessing the garbage collector

The code above will run just fine given our conditions, but this example is a tad simplistic and our RustSet still needs a bit of work to play nice with Python in the general case. Indeed, our shared references might get garbage collected by Python and result in a memory leak (and a runtime bug) because Rust still thinks that those references exist.

As suggested in rust-cpython error messages, the proper way to implement Drop for PythonObject is to implement it on one of the data members. Let's create a struct that will manage our reference count on Drop:

struct RustSetLeakedRef {
    rs: RustSet,
}

impl RustSetLeakedRef {
    fn new(py: Python, rs: &RustSet) -> Self {
        RustSetLeakedRef {
            rs: rs.clone_ref(py),
        }
    }
}

impl Drop for RustSetLeakedRef {
    fn drop(&mut self) {
        let gil = Python::acquire_gil();
        let py = gil.python();
        self.rs.decrease_leak_count(py);
    }
}

This struct simply holds a Python reference to the RustSet instance, and decreases the leak count when dropped. Cool, now our RustSetIterator can use this new struct to make rust-cpython's Drop hook happy:

py_class!(class RustSetIterator |py| {
    data rs: RefCell<Option<RustSetLeakedRef>>;
    // ...

    def __next__(&self) -> PyResult<Option<u32>> {
        let mut rs_opt = self.rs(py).borrow_mut();
        if rs_opt.is_some() {
            Ok(match self.it(py).borrow_mut().next() {
                None => {
                    // replace Some(rs) by None, hence drop RustSetLeakedRef
                    rs_opt.take();
                    None
                }
                Some(&r) => Some(r)
            })
        } else {
            Ok(None)
        }
    }
    // ...
});

You will notice that we got rid of the done boolean in favor of an Option, which is much more idomatic.

Lastly, our RustSet needs to update its __iter__ method to use the new RustSetLeakedRef:

def __iter__(&self) -> PyResult<RustSetIterator> {
    RustSetIterator::create_instance(
        py,
        RefCell::new(Some(RustSetLeakedRef::new(py, &self))),
        RefCell::new(self.leak_immutable(py).iter()),
    )
}

We can now test this from the Python side:

def test_drop_before_end():
    rs = RustSet()
    # have Rust allocate some real amount of memory
    rs.extend(range(10))
    it = iter(rs)

    # Implementation of get_leak_count() left as
    # an exercise to the reader ;)
    assert rs.get_leak_count() == 1
    next(it)
    del it
    assert rs.get_leak_count() == 0

Works just as intended!

Did you spot the bug?

This safety mechanism guards us against a mutable borrow after immutable borrows, but fails to prevent immutable borrows following a mutable borrow!
We need to tell mutable and immutable borrows apart. The nice thing is, you can only have at most one mutable reference; this allows us to just use a boolean instead of a more complicated structure.

Let's start by updating the API inside our RustSet:

data mutably_borrowed: Cell<bool>;
// ...
def __new__(_cls) -> PyResult<RustSet> {
    Self::create_instance(py, RefCell::new(
        Inner::new()),
        Cell::new(0),
        Cell::new(false),
    )
}
// ...
fn leak_immutable(&self, py: Python) -> PyResult<&'static Inner> {
    // We add this check right at the top of the function. 
    // Notice the signature change to a `PyResult`.
    if self.mutably_borrowed(py).get() {
        return Err(AlreadyBorrowed::new(
            py,
            "Cannot borrow immutably while there is a \
             mutable reference in Python objects",
        ));
    }
    // ...
    unsafe { Ok(&*ptr) }
}

fn decrease_leak_count(&self, py: Python, mutable: bool) {
    // ...
    if mutable {
        self.mutably_borrowed(py).replace(false);
    }
}

I've purposely kept borrow_mut at the end since it needs a bit more explaining:

fn borrow_mut<'a>(&'a self, py: Python<'a>) -> PyResult<PyRefMut<Inner>> {
    // Same check at the top of the function.
    if self.mutably_borrowed(py).get() {
        return Err(AlreadyBorrowed::new(
            py,
            "Cannot borrow mutably while there exists another \
             mutable reference in a Python object",
        ));
    }
    match self.leak_count(py).get() {
        0 => {
            // Update the flag upon borrow
            self.mutably_borrowed(py).replace(true);
            Ok(PyRefMut::new(py, self.hs(py).borrow_mut(), &self))
        }
        // ...
    }
}

As you can see, we're not returning a RefMut anymore, but a PyRefMut. It's a new struct that wraps the RefMut to add a crucial bit of additional behavior to it.

Let's see how we implement it:

struct PyRefMut<'a, T> {
    inner: RefMut<'a, T>,
    rs: RustSet,
}

The 'a lifetime annotation tells us that out PyRefMut lives exactly as long as the inner RefMut.

impl<'a, T> PyRefMut<'a, T> {
    fn new(py: Python, rm: RefMut<'a, T>, rs: &RustSet) -> Self {
        Self {
            inner: rm,
            rs: rs.clone_ref(py),
        }
    }
}

impl<'a, T> Drop for PyRefMut<'a, T> {
    fn drop(&mut self) {
        let gil = Python::acquire_gil();
        let py = gil.python();
        self.rs.decrease_leak_count(py, true);
    }
}

Here, the drop implementation removes the mutable reference through decrease_leak_count, with mutable set to true.

This means that when our PyRefMut goes out of scope, our reference count is correct and the mutability lock is freed!

The final piece of the puzzle:

use std::ops::{Deref, DerefMut};
// ...
impl<'a, T> Deref for PyRefMut<'a, T> {
    type Target = RefMut<'a, T>;

    fn deref(&self) -> &Self::Target {
        &self.inner
    }
}
impl<'a, T> DerefMut for PyRefMut<'a, T> {
    fn deref_mut(&mut self) -> &mut Self::Target {
        &mut self.inner
    }
}

This ensures that our PyRefMut delegate all of its interface to its inner RefMut, making it essentially a "fat pointer".

Future work

The only thing left to do to have complete support for garbage collection is to allow our RustSet to hold Python objects as well. The documentation of rust-cpython explains that the __traverse__ and __clear__ methods need to be implemented to dispatch the garbage collection to the inner objects.

As of now, the need has yet to present itself in the development of Mercurial... so we didn't do it. Sorry, maybe another time. :)

The actual code inside Mercurial is a bit different as we moved the refcounting and mutable borrow check inside a separate struct, and I've written a simple macro to automate parts of it, but I am not very happy with it yet; it's not as easy to use as I would like to and it's missing some features; leaking multiple attributes, automatic data attributes, method prefixing, etc.. I don't want to write a procedural macro since it's not battle tested yet.

If you have a better way of going about this problem, please let me know.

Mercurial's format source extension version 0.4.0

Pierre-Yves David
2019-07-17

Version 0.4.0 of the formatsource extension for Mercurial just got pushed. The extension deals with codebase formatting and smooths its consequences during merges.

The release is available on pypi and upgrading is recommended.

Version 0.4.0 brings the following changes:

  • Formatting of the help text was improved

  • The format-source command gained an --extra-config-file flag. Use it to record custom config files when running initial formatting.

  • It is now possible to track the tool version number when formatting files. The value will be put to use in later version of the extension to help to react to tool version changes. Tools coming with a default configuration will now track their version.

  • A new format-source.run-mode config option has been added. Set it to yes to forcibly run formatters during any merge involving formatted files. Set it to no to disable any formatting during merges.

  • The format-source command gained a --current flag to run formatter on the working copy files. Each files get formatted according formatter defined in previous hg format-source call.

New Evolve extension release: version 9.0.0

Pierre-Yves David
2019-06-06

We pushed a new release for the evolve extension: 9.0.0

This extension extends core features around history rewriting and draft changesets sharing.

As usual, the release is available on pypi and upgrade is recommended.

This version drops support for Mercurial version 4.4 and introduces a significant behavior change.

The hg evolve command now uses the --all and --no-update flag by default. Starting from this version hg evolve will try to stabilize all changesets related to the current working copy parent (eg: your stack if you use topic), but will remain on the current changesets (or its successors if applicable). To perform iterative evolution use the hg next command or the --no-all or --rev flags.

It also comes with a variety of improvements and bug fixes. Check the full changelog below for details. Thanks to all the people involved:

  • Anton Shestakov
  • Faheem Mitha
  • Joerg Sonnenberger
  • Martin von Zweigbergk
  • Matt Harbison
  • Pierre-Yves David
  • Pulkit Goyal
  • Sushil khanchi

Evolve: 9.0.0

Behavior changes

  • evolve: preserve the working directory after resolving instability (BC)

        (use `hg next` or `hg evolve --update` to get the old behavior)
  • evolve: evolve all relevant revisions by default (BC)

        (use `hg next` or `--no-all` to evolve only one)
  • obsdiscovery: drop support for deprecated discovery protocol obshash (Make sure your servers are configured to use the obshashrange one. It is available in evolve 7.2 and above. If you are just migrating now, check the obshashrange documentation

Compatibility

  • evolve: drop compatibility with 4.4
  • evolve: fix compatibility with narrow repositories

Improvements:

  • evolve: use "unstable" instead of "troubled" in some output
  • evolve: run multiple stabilization in the same transaction
  • evolve: improve username merging during content-divergence
  • evolve: reduce the verbosity of content-divergence resolution
  • documentation: various improvements and vocabulary update
  • packaging: fix documentation build step on Debian
  • progress: improved support in various commands
  • help: avoid duplicated entries for some templates

Topic 0.15.0

  • stack: handle hash sizes when --debug flag is provided
  • stack: remove 'topic.' prefix from colors/labels (BC)
  • stack: always provide (full) node hash to non-default --template
  • topic: drop the b# alias. It conflicted with normal hashes.
  • topic: add an experimental.topic.allow-publish option (default: True)

New Evolve extension release: version 8.5.1

Pierre-Yves David
2019-04-23

We pushed a new bugfix release for the evolve extension: 8.5.1

This extension extends core features around history rewriting and draft changesets sharing.

As usual, the release is available on pypi and upgrade is recommended.

his version brings a couple of bug fixes and compatibility adjustement.

Talking about compatibility, the next non bugfix version version will drop support for Mercurial 4.4 and probably a couple of others.

Evolve 8.5.1

  • evolve: make sure we use upstream merge code with 5.0,
  • evolve: restore compatibility with 4.4
        (This regress the narrow compatibility)
  • evolve: fix progress display with hg <= 4.6

Topic 0.14.1

  • topic: compatibility with mercurial-5.0,
  • topic: improve extensions isolation (issue6121).

Mercurial April 2019 mini-sprint reports

Pierre-Yves David
2019-04-17

From April 4th to April 7th, a group of 12 Mercurial developers and users gathered in Paris for hacking and discussing.

  • Arthur Lutz (Logilab)
  • Boris Feld (Octobus)
  • Denis Laxalde (Logilab)
  • Georges Racinet (Octobus)
  • Joerg Sonnenberger (NetBSD)
  • Manuel Jacob (Pypy)
  • Marcin Kizminski (Rhode-Code)
  • Philippe Pepiot (Logilab)
  • Pierre Verkest (Anybox)
  • Pierre-Yves David (Octobus
  • Raphaël Gomès (Octobus)
  • Simon Sapin (Mozilla)
  • Augie Fackler (Google) also made a 1h video call.

The work and discussion covered various aspects of Mercurial.

Denis improved the user experience around hg commit/revert -i FILES and wrote an amazing code documentation on the "matcher" classes in the Mercurial codebase.

Philippe made some packaging related improvements, clearing up the path to experimental python 3 package and looked into including chg into the official Debian package.

Boris setup multiple alternative builds in Octobus' Jenkins instance. Most notably we now have CI coverage for the hg-git extensions and the main test suite of Mercurial using Pypy.

Manuel Jacob worked with Pierre-Yves on offering an option to access obsolete changesets in hgweb and during pull/clone. The result of this work is currently in review. Manuel also worked on his own on build reproduction and fixes to bugs affecting corner cases around rebase, evolve and topics.

Georges Racinet mostly worked on various aspects of the Heptapod project. He reviewed and integrated various improvements from Thomas Riboulet, worked with Arthur to fix various bugs and gathered additional feedback, supported Pierre who was installing his own heptapod instance and chatted with Marcin, Pierre-Yves (and others) about unifying topic based workflow across platforms.

Raphaël worked on various aspect of the Mercurial oxidation with regard to status, including the matcher classes that Denis has been documenting. He also discussed the interaction of Rust and Python in Mercurial with Simon. Finally he worked on ASV to add support for an environment matrix in build and test phases, which will ease Mercurial benchmarking down the line.

Joerg looked into hg incoming issues with pull-bundle and various hooks needed by the NetBSD project. He also discussed with Pierre-Yves the ability to filter arbitrary revisions on "shares" for servers with, reviewed the "stable-range" feature and discussed its possible application for exchanging arbitrary notes applied to changesets at any point in time (tags, code signing, CI status, etc.).

Marcin updated evolve and topic versions on Rhode-Code and improved topic-based workflow in Rhode-Code. He worked with Marla da Silva (Logilab), Pierre-Yves and Georges to bootstrap the first user-focused Mercurial conference to happen on May 28th in Paris (France).

Pierre-Yves chatted and worked with various people on various topics, obsolescence access, shares filtering, conference, etc. (see above) and also worked on integrating various contributions to the evolve extension as well as fixed a couple of bug. This work resulted in the release of evolve-8.5.0 last week.

Thanks to all who attended and to our 3 hosts, Logilab, Sup'Internet and Mozilla Paris

We are looking forward to our next gathering at the end of May around the Mercurial user conference.

New Evolve extension release: version 8.5.0

Pierre-Yves David
2019-04-12

We pushed a new release for the evolve extension: 8.5.0

This extension extends core features around history rewriting and draft changesets sharing.

As usual, the release is available on pypi and upgrade is recommended.

This version brings a variety of improvements and bug fixes. Notably, evolve now support smooth automatic resolution of a wider range of divergence instabilities.

The next evolve version will likely include behavior changes to the hg evolve command. Making --no-update and --all the defaults. For a few version, users can already rely on hg next for day to day incremental evolution.

Check the full changelog below for details. Thanks to all the people involved:

  • Anton Shestakov
  • Joerg Sonnenberger
  • Manuel Jacob
  • Martijn Pieters
  • Martin von Zweigbergk
  • Matt Harbison
  • Pierre-Yves David
  • Sangeet Kumar Mishra
  • Sushil khanchi

Evolve: 8.5.0

Bugfixes

  • evolve: fix an recoverable state (issue6053),
  • evolve: share evolve related cache between shares,
  • evolve: make sure the extensions are only active on repository that enables it (issue6057).
  • evolve: preserve --[no-]update value over --continue,
  • evolve: make sure divergence resolution keep the initial author (issue6113),
  • pick: align working dir branch with the one from the pick result (issue6089),
  • prune: fix error message when pruning public changesets,
  • split: preserve phases (issue6048),
  • touch: fix error message when touching public changesets,

Improvements

  • evolve: improved compatibility with narrow repositories,
  • evolve: improved support for content-divergence with public changesets,
  • pick: add the standard --tool option,
  • uncommit: abort if an explicitly given file cannot be uncommitted.

Topic 0.14.0

  • prune: fix error message when pruning public changesets,
  • split: preserve phases (issue6048),
  • touch: fix error message when touching public changesets.

New Evolve extension release: version 8.4.0

Pierre-Yves David
2019-01-22

We pushed a new release for the evolve extension: 8.4.0

This extension extends core features around history rewriting and draft changesets sharing.

As usual, the release is available on pypi and upgrade is recommended.

This version drops support for Mercurial version 4.3 and bring a variety of improvements and bug fixes. Check the full changelog below for details. Thanks to all the people involved:

  • Anton Shestakov
  • Boris Feld
  • James Reynolds
  • Martijn Pieters
  • Martin von Zweigbergk
  • Matt Harbison
  • Pierre-Yves David
  • Pulkit Goyal
  • Sushil khanchi

Evolve: 8.4.0

compatibility

  • compat: add compatibility with Mercurial 4.9
  • compat: drop compatibility with Mercurial 4.3

Bug Fixes

  • evolve: avoid potential crash when stabilizing orphan merges
  • evolve: pick right destination in split+prune cases issue5686 (hg-4.9 only)
  • evolve: prioritize --rev/--any/--all option over obsolete working directory

behavior changes

  • fold: concatenate commit message in revision order
  • next: evolve aspiring children by default (use --no-evolve to skip)
  • next: pick lower part of a split as destination
  • push: have --publish overrule the auto-publish config
  • split: accept file patterns
  • split: improve and update the user prompt (BC)
  • split: make it possible to drop change during a split
  • split: no longer accept revision without --rev (BC)
  • split: support for non interactive splits

Topic 0.13.0

  • stack: introduce a --children flag (see help for details)
  • stack: support for '#stack[idx]' absolute indexing in revset (hg-4.9+ only)
  • topic: support for '#topic[idx]' relative indexing in revset (hg-4.9+ only)
  • topic: make --age compatible with the usual other display for hg topic
  • topics: improve the message around topic changes

Mercurial gathering in New York City

Pierre-Yves David
2019-01-16

We are organizing a meeting in New York City on Tuesday, January 22th with the folks from Backstage.com, an online casting platform. They are happy users and contributors of Mercurial and its evolve extension.

We will be talking about the news in the Mercurial world, and in particular of the new features of the upcoming 4.9. So, if you want to chat about latest performance improvements, the Heptapod collaboration platform, the changeset evolution concept, modern workflows or the rise of Rust in Mercurial, this is the perfect opportunity. You'll be able to meet some of the core developers involved in these topics, chat with other Mercurial users or hack with us on your favorite version control system.

We are in particular the people behind Heptapod, an ongoing effort to add Mercurial support to Gitlab that led to a working prototype last year. You can expect exciting news about it in 2019. In the meanwhile, by attending the meeting, you can learn more about it, get a demo, or become an early tester.

Backstage will welcome you with drink and food from 5:30 PM at the Backstage office: 45 Main St, Brooklyn, NY 11201, USA.

If you can't make it on the 22th, we are in town for the week. Ping us on our twitter or come say hi on our IRC channel #octobus-hg on freenode.net.

Key info

  • What: Meeting with Mercurial users and core developers,
  • When: 22th January 2019 from 17:30
  • Where: 45 Main St, Suite 416, Brooklyn, NY 11201, USA
  • Contact: please email jreynolds@backstage.com if you plan to attend

Format Source version 0.3.0 is released

Pierre-Yves David
2019-01-13

We pushed a new release for the format-source extension: 0.3.0.

This extension smooth out the consequence of changes to source-code formatting policy in a code base.

The release is available on pypi.

Changelog:

  • ship with default configuration for the following tools:
    • clang-format,
    • black,
    • yapf,
    • gofmt,
    • rustfmt,
    • prettier,
  • improved file pattern support,
  • improved error handling,
  • improved configuration for tools inputs/outputs method,
  • dropped compatibility with Mercurial < 4.4,
  • Format-source's sources are now formatted with black.

New Evolve extension release: version 8.3.3

Pierre-Yves David
2018-12-24

We pushed a new bugfix release for the evolve extension: 8.3.3

This extension extends core features around history rewriting and draft changesets sharing.

As usual, the release is available on pypi and upgrade is recommended.

This version fixes a couple of minor issues

  • evolve: properly detect unresolved merge conflicts (issue-5966)
  • evolve: fix possible crash when the repo changes during evolve (issue-6028)
  • test: avoid leaking hg serve process
  • topic: fix error message for the ngtip revset

Be aware that the next non-bugfix version should drop support for Mercurial 4.3 and 4.4.

Mercurial's format source extensions

Pierre-Yves David
2018-12-03

Last Friday, Mozilla pushed a massive patch to their Firefox repository. This patch enforce their new coding style and automatically formatting of their source-code using clang.

This formatting implies a large amount of change spread all across the code base. This can result in a lot of conflict with all the other branches and in-progress code made before the formatting.

To address this issue, Mozilla has been using Mercurial's format-source extensions. This extension tracks the formatting that has been applied and the file they have been applied to. Mercurial can then leverage this information during merge to smooth the formatting out of the equation, focusing on the conflict from the actual edition.

How does it works?

To perform its duty, the format-source extension tracks formatting operation within the change history. To start formatting, one using the hg format-source commands, specifying the tools used for formatting and the affected files. The command then formats the target files and record the operation. Later, when merges happen, the extensions checks if formatting is different on each side of the merge. When one side of the merge has a more recent formatting, all content involved in the automated merging process will be pre-formatted using the appropriate tools. This formatting of all items (including the merge base) remove the formatting changes from the equation. The tool can then focus on solving conflicts on actual code changes, and provide readable conflict to users if they occur.

What next?

More projects are now considering using format-source and we are improving the extension to support their more diverse use-case. Once the extension has matured a solid support of a wide enough set of use case, we will work to getting the feature integrated in the main version of Mercurial.

The initial implementation of this extension was made by Octobus, last year, with a Mozilla funding. Thanks to Sylvestre Ledru for making it happen.

New Evolve extension release: version 8.3.2

Pierre-Yves David
2018-11-27

We pushed a new bugfix release for the evolve extension: 8.3.2

This extension extends core features around history rewriting and draft changesets sharing.

As usual, the release is available on pypi and upgrade is recommended.

This version fixes a couple of minor issues

  • evolve: not longer attempt to translate revision's descriptions (issue6016)
  • evolve: fix compatibility with mercurial 4.8's narrow extension.
  • pick: fix summary help text
  • topic: only use pager when it make senses

Be aware that the next non-bugfix version should drop support for Mercurial 4.3 and 4.4.

New Evolve extension release: version 8.3.1

Pierre-Yves David
2018-10-25

We pushed a new bugfix release for the evolve extension: 8.3.1

This extension extends core features around history rewriting and draft changesets sharing.

As usual, the release is available on pypi and upgrade is recommended.

This version introduces various bugfixes and others changes.

This version is the first one compatible with mercurial 4.8 and fix a couple of other issue.

  • evolve+topic: fix possible crash during content-divergence evolution
  • use "new" unstabilities vocabulary in help
  • compat: compatibility with Mercurial 4.8rc0

Be aware that the next non-bugfix version should drop support for Mercurial 4.3 and 4.4.

New Evolve extension release: version 8.3.0

Pierre-Yves David
2018-10-12

We pushed a new release for the evolve extension: 8.3.0

This extension extends core features around history rewriting and draft changesets sharing.

As usual, the release is available on pypi and upgrade is recommended.

This version introduces various bugfixes and others changes.

Full changelog:

Evolve (8.3.0)

  • evolve: avoid redundant output when handling linear orphans
  • evolve: use stack alias s# in hg evolve messages
  • next, prev: use stack alias s# when relevant
  • rewind: add an undo alias (thanks to Pulkit Goyal)
  • caches: skip warming the stablerange cache on strip in "auto" mode

Topic (0.12.0)

  • topic: properly register the '{topicidx}' for mercurial <= 4.5

In addition, this release contains version 0.1.0 of the 'pullbundle' prototype that slice pulls and cache associated bundle. It is mostly unrelated to evolve. Read the mailing list discussion about pullbundle for details

Adding Mercurial support to Gitlab

2018-09-18

Good news everyone!

We are taking on the largest pain-point in the Mercurial community: integration with newest collaborative development platforms. Using the Mercurial version control system locally is pleasant, but when it comes to start collaborating, things get bumpy.

For that reason, we have experimented with adding Mercurial to the Gitlab platform.

Yes, you read right, we are making it possible to use our favorite version control system with the trendiest collaborative development platform.

We have built a prototype, that we and a few others now use internally. Now that we have proven feasibility, we are in contact with Gitlab to work on adding Mercurial support upstream. So far, we have received warm feedback. It cheers us up to push forward our efforts to offer the best development workflow and developer experience for all Mercurial users.

Why Work On A Collaborative Development Platform

Version control is a critical part of the development workflow, but it is only a part of it. It needs to integrate with other tools in order to provide a complete workflow, including access management, code review, continuous integration, continuous deployment, ticket tracking, etc.

Over the years, these tools grew more and more powerful, becoming a larger part of the developer experience. Collaborative development platforms (forges) supporting Mercurial exist but they now tend to lag behind the solutions existing around another version control tool: Git.

Fortunately, a lot of what makes current collaborative development platforms so appealing is only loosely related to the underlying version control system. Which makes it relatively easy to add support for an alternative one.

That lack of support in the best collaborative development platforms out there has been affecting Mercurial users for several years. We have decided to contribute to a solution. At Octobus, our core expertise is version control so the best course of action for us is to collaborate with an existing platform for the rest of the development tooling.

Why Pick Gitlab?

Gitlab has been one of the fastest growing platforms in the past couple of years. It has a strong feature set praised by many. We have seen all kinds of users switching from Mercurial to Git just to be able to use it. So the quality and the reputation of the product itself made us look into it.

Moreover, Gitlab has an open source version "Gitlab CE", provided under an MIT license. They have a healthy community with significant external open source contribution. It fits with our own open source philosophy and allows us to directly hack on the product core to add support for Mercurial.

Also, other open source communities, Gnome and Debian, picked Gitlab as their tool. Having such serious actors of the free software world trust Gitlab was important in our choice.

Why Mercurial?

You might be wondering: why pick Mercurial in 2018?.

Mercurial and Git are similar by many aspects. They were created at the same time, by people from the same project, using the same sources of inspiration and for the same use-cases. Yet they have different implementations and made different choices that have a significant impact on the user experience. Having multiple tools offering similar services is beneficial for innovation. Since their creations, the two projects have been reusing each other's good ideas.

Overall, we believe Mercurial has a more extensible codebase allowing for more innovation. It has been used by large companies for several years, giving it pretty good scalability properties. We find it has a clearer user experience for both simple and advanced use-cases. Furthermore, the Mercurial development is very much alive, meaning that things will keep getting better. For example, we are getting real narrow cloning as a first class feature, allowing developers to combine the performance of using a small repository and the benefits of having a larger mono-repository. The Changeset Evolution feature is also unlocking new collaboration possibilities for team members.

However, we realized that all Mercurial powerful features needed an appropriate ecosystem to achieve their full potential. That is why it has become critical for Mercurial to be supported by state-of-the-art collaborative development platforms. In return, new features from Mercurial will help shape the workflow that those platforms will offer in the future.

Why us?

At Octobus, we have been working on Mercurial for many years, getting involved in its community. Part of that involvement is talking to many users with diverse workflow, structure and backgrounds. We have been hearing more and more complaints about the lack of a good collaborative development platform for Mercurial. In the past two years, those grew at an alarming rate. We are convinced that is the biggest challenge that Mercurial is facing today. So, as much as we enjoy making the core of Mercurial faster and better, we decided we had to do something to address that tooling issues.

What now?

In addition to discussing with Gitlab how to get the changes needed for Mercurial support upstreamed, we are now improving our prototype, cleaning up the code and making it more capable. Our main goal is to provide people with a first class Mercurial experience with Gitlab.

If you are interested in testing the prototype or in helping us achieve that goal, please contact us at contact@octobus.net, or join us on #octobus-hg IRC channel on Freenode .

(And if you are a developer interested in the source control version world, we are hiring.)

New Evolve extension release: version 8.2.1

Pierre-Yves David
2018-09-14

We pushed a new release for the evolve extension: 8.2.1

This extension extends core features around history rewriting and draft changesets sharing.

As usual, the release is available on pypi and upgrade is recommended.

This version fixes issues around the obshashrange caches (enabled by default since 8.2.0).

Thanks to all people who reported issues. Special Thank to Gerald Squelart from Mozilla who submitted a patch along his bug report ☺

Full changelog:

Evolve (8.2.1)

  • obshashrange: issue the "long warmup time" message only once
  • obshashrange: reduce impact of cache invalidation from many new obsmarkers
  • obshashrange: properly silence permission error related to caches

New Evolve extension release: version 8.2.0

Pierre-Yves David
2018-09-03

We pushed a new release for the evolve extension: 8.2.0

This extension extends core features around history rewriting and draft changesets sharing.

As usual, the release is available on pypi and upgrade is recommended.

This version introduces user interface adjustment and turn the obshashrange based discovery on by default. Thanks to Pulkit Goyal and Dan Villiom Podlaski Christiansen for helping with those questions.

The topic extension received improvement too.

Full changelog:

Evolve (8.2.0)

  • prune: rename --biject flag to --pair (old flag is kept as an alias)
  • pick: rename the grab command to pick (to avoid ambiguity with graft)
  • discovery: enable obshashrange based discovery by default

topic (0.11.0)

  • revset: topic("patterns") now handle standard patterns ("re:", etc)
  • revset: topic(REVS) matches revisions with same topic as REVS
  • topic: using s# alias instead of t# and b# alias (compat with old form is preserved)

Status of Mercurial's Changeset Evolution

Pierre-Yves David
2018-09-03

TL;DR; All the questions that we needed to clarify for Changeset Evolution to complete have been answered. We are now working on having a final implementation enabled by default in Mercurial core.

In the past couple of years, multiple important milestones have been reached for the Mercurial's Changeset Evolution project. Changeset Evolution allows for simple, safe and powerful exchange of draft changesets that can be rewritten by a distributed team. (The concept is also useful locally, but most innovative part are the distributed ability). Those progresses are the result of a combined effort of a dozen people, thanks to all of them.

In particular, we have been working on final algorithms for two of the central concepts of changeset evolution: exchanging markers and automatic resolution of instabilities. They now just need a final implementation in Core.

In addition, the last uncharted territory, the restoration of older evolutions of changesets, have been explored. We discovered that we have to track more data regarding changeset folding to perform this task well. This will trigger what we expect to be the last update in tracked data and marker format.

There are many improvements made apart from the ones we just highlighted, and we'll go over them later in this post. However, the highlighted ones are important milestones. All the major algorithms questions are now cleared, we know how each complex issues will be handled and what associated data storage we need. Removing unknowns on those questions was a priority because they can have a high impact on how we design user interfaces and on-disk data formats. Two things that are complicated to change afterwards.

Now that all those areas have been explored, we have a clear sight of the route to Evolve completion and can focus on actions directly moving toward this goal. We need to polish and upstream the concept and code that are still inside the Evolve extension. Some of that code is already in a good shape to go upstream.

The road is clear, but there is a lot to carry along, any kind of help is welcome!

Summary of work done since 2017

In general, all areas received many fixes, improvement, and polish. Some important areas made significant progress:

Vocabulary changes

Evolution provides powerful features, capable to handle complex issues. We must do all we can to smooth this complexity, in particular when it comes to naming new concepts. To address multiple feedback regarding the naming scheme, a group of people dedicated a significant amount of time to come up with a new naming scheme for most of the changeset evolution concept. The resulting scheme was then enforced in Core Mercurial, the Evolve extensions and Tortoisehg:

  • Instability replaces Troubles
  • Unstable replaces Troubled
  • Predecessor replaces Precursor
  • Orphan replaces Unstable
  • Content-divergent replaces Divergent
  • Phase-Divergent replaces Bumped

Check the wiki page associated with the renaming discussion for details.

In addition, we made some update to command and flag names:

  • The hg uncommit command is merged into hg amend as hg amend --extract
  • The unloved hg prune --biject flag has been renamed to hg prune --pair

Obsolescence History

To make distributed history edition manageable by the users involved, it is important to make it easy for someone to understand what happened outside of its local repository.

Changeset evolution tracks the change to all changesets that have been rewritten. This information is valuable for many different facets, for example for displaying what happened to a changeset, or automatically stabilize it. We made multiple improvements to make sure the information tracked was more informative and better put to use. Previously the obsolescence information was mostly used to detect and resolve instabilities. Obsmarkers now store more data:

  • the effect of an obsolescence marker (message change, patch change, etc),
  • the operation that created it (amend, rebase, ...),
  • possibly record a user-defined note.

And that data get used in more places:

  • New hg obslog command that displays the evolution of a changeset,
  • Display successors and predecessors in hgweb when appropriate,
  • Point at latest successors of obsolete changeset in hg log,
  • Point to latest successors of obsolete working copy and accessed hidden changesets.

This greatly improved users ability to understand evolution that they, and their team members, have performed. Making Changeset Evolution more accessible to people unfamiliar with it.

This area is in a very good shape, there are multiple possible improvements, but overall we have the feature we need to provide a reasonable experience. A good share of them have been implemented in Core already, the rest needs to be upstreamed. See the "What next?" section for possible future improvements in this area.

History Rewriting Commands

Smooth distributed history rewriting requires a good set of commands to rewrite history. They need to be easy to use and to record the user intend well so that we can leverage it.

Changesets evolutions allow more flexible history editing but also rely on the user using the right command for the right operation to build a useful evolution history. A quality evolution history provides the user with the best experience. To achieve this goal we need a clear, compact and powerful set of command to edit history. Progresses were made on existing commands and new commands:

  • hg fold got flags to clarify its two modes --from and --exact, (We are not entirely happy with the result yet)
  • hg uncommit is now available as hg amend --extract. This reduces the number of commands and closes the debate about the uncommit name,
  • hg amend gains a new --patch flag to directly edit changesets diff,
  • The hg grab alias got upgraded to a full hg pick command (with proper --abort/--continue support). The command is a mix of graft and rebase. The command can be used to reorganize a stack of changesets.

An important milestone is the introduction of a rewind command. It is a command to restore a stack of changeset to a predecessors state. The command is at an early stage and will need some time to mature, however, it is already useful and part of some people main workflow.

Implementing the rewind command was part of our exploration of the evolution problem space. And, indeed, we gathered very important insights. The hg rewind command can automatically find and restore predecessors of other changesets. To do so, it walks the evolution graph, but it does it in the opposite direction than the hg evolve command. It turns out that, in the same way we had to store special information for split, we will need extra information to properly differentiate fold from divergences resolutions. We currently do not have this information.

This is a good example of data requirement we had to discover early in order to shape a functional changeset evolution in the end.

Obsolescence-markers exchange

Changeset Evolution core purpose it to unlock history edition in a distributed setting. To achieve this goal, the exchange of obsolescence information between repositories is critical.

The question of which obsolescence markers should be exchanged during push and pull have been solved a while ago. Further testing in diverse environment setups have confirmed this logic is correct.

However, at the start of 2017, an important issue remained: how to discover which markers are missing on the other side? Without an efficient way to detect them, we could not provide an efficient synchronization of the obsolescence data.

Fortunately, we developed an algorithm and protocol that can perform an efficient discovery for obsolescence markers. In our daily usage on the Mercurial repository, this saves us the extra minute we were spending on obsmarkers discovery using the previous method.

The data structure used by this algorithm scale well with repository size, an important point that qualifies this solution for all usages.

This new method got tested in various settings and it will be used by default in the next version of the Evolve extension.

In addition one data structure we developed for this algorithm, "stablerange", will likely be useful to help with exchanges and caching of other data. It could be used for pull bundle caching decisions.

The main goal of the current implementation was to validate the approach and the scaling property of our algorithm. Its performance and cache storage implementation is not great and this will have to be reworked when upstreamed.

Workflow and stack

Something important connected to evolution is the clear definition of the group of changesets related to the current user work. Such clarity is a great help to provide good behavior and information to the users regarding possible instability in their current work. We made many progress in this area in the past couple of year.

In particular:

  • Automatic instability resolution, using hg evolve, restrict itself to the current topic by default. Avoiding selection of unrelated changeset during stabilization, something that has been confusing users.
  • The next and prev commands are also restricting themselves to the current topic, making their movement more predictable and useful.
  • The hg stack command allows for a quick view of the work in progress, including listing changeset in a semantic order, same as if they were all stabilized.

This work also offers a simple, yet powerful, workflow for feature branching in Mercurial. Having topic tightly linked to phases make them a good tools to enforce a healthy phase movement practice in a project. Since Changeset evolution is also tightly coupled with phases, healthy phase movements is important here. Here are the progresses made:

  • New server repository mode: changeset with topic stay draft of push, other get published. This is especially useful as it bridges the gap between publishing and non-publishing repository. Using this mode by default will preserve backward compatibility with current Mercurial behavior regarding phase movement while allowing the user to opt in from the client side for exchanging drafts when they want to.

  • New push flag --publish: to publish selected changeset on push.

There have been various improvements on topic usability. Notably:

  • clarification of topic activation, state, and movement on push/pull and publish,
  • ability to force new draft commit to have a topic (or automatically assign a random one),
  • improved topic discoverability with hg topic --age,
  • Introduction of a s0 label referring to the parent of the topic root
    • update to s0 will keep the topic active, making insertion of new changeset at the start of the topic simpler
    • hg prev can go down to s0

We are getting good feedback from people using this workflow. We would like to upstream all these improvements. Some of them are not really attached to topic (eg: hg push --publish) but we feels like topic is overall a good branching solution for Mercurial and would like to see it more of it upstream.

Instability Management

This is a critical part of Evolution and we made many important progress in this area. Exchanging draft changesets can bring "instabilities" in one's repository. We can use various strategy to reduce the odds for it to happen, however, given the distributed nature of Mercurial, we cannot guarantee it won't.

Because instability can happen, we need a good automatic resolution of it. We are now is a pretty good shape. The hg evolve command can now keep tracks of multi-step operations unlocking important features:

  • Good handling of evolution interrupted by merge conflict, --abort, --continue and --stop flags works as expected. (for both hg evolve and hg next)
  • Automatic stabilization of situation involving and orphan merge,
  • Automatic stabilization of phase-divergence in most complex case,
  • Automatic stabilization of content-divergence in most complex case.

So we now have all the instability types well under control. There are still some corner case involving split, fold or merge that remain to be properly handled. However, the core tool to handle them now exists.

With all these improvements to hg evolve and hg next, we will be able to update the default behavior of this two commands to something final, and upstream them (more about this in the next section).

What next?

This section focus on concrete actions to bring Changeset Evolution to completion. It is intended for people with some knowledge of Mercurial internals.

Summary

The most important things to clear up right now is that need for better tracking of fold. Fixing it right might eventually affect disk storage and various algorithms. Getting this out of the way as soon as possible is important.

In the same move, we need to mature the hg rewind command. Easy undo is a critical item if we are to hand changeset evolution to all users.

In parallel, there are multiple areas in the Evolve extension that are ready to be cleaned up and upstreamed (obsmarkers discovery, cache, internal rewriting toolkit, …).

We should also resume the upstreaming of the improvement regarding stacks management and publishing workflow contained in the topic extension. At first, this area might seem unrelated to enable changeset evolution into Core, in practice the clarification in working set and publishing workflows greatly reduce complexity around local and distributed work on draft changesets. So time spent improving these areas is well balanced by the one we do not have to spend solving more complex user experience issue elsewhere.

Finally, there are a couple of areas we that are mostly done but need to focus to be wrapped up. This is mainly about the history rewriting command and automatic instabilities resolution. In these cases, there are small actions to take before freezing the user experience in Core. Of course, more improvement will be made to them once into Core. However such improvement does not seem necessary to enable evolution by default.

Internals

Fold tracking

We need better tracking of fold operation. This is necessary to provide a fully functional hg rewind command. This command is important because users need a simple way to undo mistakes.

We need to update the marker creation API and to store this information. Such change might impact the on-disk storage format and more. The way we currently store split have a couple of quirks (eg: when only some of the split successors are exchanged). The current ideas for storing fold data could also be applied to split, handling these cases better than with the current split encoding. Tracking splits and folds is a quite interresting issues, we'll get back to it in an independant blog post.

These possible low-level changes makes the tracking of fold a priority target. Upgrading up-disk storage of existing users is never simple and it might also affect multiple algorithms that will have to cope with a different split encoding.

History rewriting toolkits

To power its history rewriting commands and its UI, Evolve has a full toolkit for building history rewriting commands. This toolkit is ready to be upstreamed and would be a good first step toward upstreaming the commands, free of any question related to user experience.

Evolution history

All the current work done around better evolution history is either already upstream, or ready to be upstreamed. We should keep that effort going.

There is one extra feature that users have been requesting, a clear transaction log to keep trace of what exactly happens from each past operations. This would be especially useful on push and pull as they may introduce many changes from other people in one repository. This would fit the journal extension, making it tracking a wider amount of data. This is not a road blocker, but it would be nice to have.

Commands

The Evolve extension provides multiples commands, most of them are close to be "done" enough for our goal of Evolve enabled by default in Core.

Rewind command

The one command that needs serious attention to mature is the rewind command. Users can "rewind" evolution of stack of changeset using this command. The first version of this command revealed we needed to record more data about changeset folding. Once this data is available we should have all the pieces to build a final user experience for this command. Having this command is very important for the Changeset Evolution experience. To trust the tool, users need to be confident they can undo their mistake.

History rewriting command

Over the year history rewriting commands in the Evolve extension evolved into a pretty solid base. As mentioned in a previous subsection, the internal toolkit powering these functions could be upstreamed now. The commands themselves need various adjustments to ensure consistency and solve some long lasting questions (those we'll cover in the second part of this blog post). All these adjustments are small and should be easy to perform. That last round of polish is best done in Evolve extension to offer it the wider testing possible before we settle it down.

Here are examples of small adjustment we need to do:

  • The hg amend --extract need to expose a flag similar to hg uncommit --rev before can fully drop the older command,
  • Currently hg fold requires either one of --from or --exact. We should pick one to be the default. --exact has a bit more mercurial developer supporters, but also requires the user to learn some revsets which is bad. If we have access to the shorter in-stack reference from topic (s1, s2, etc), the hg fold s1+s2 form might be good enough. Provided we give a proper example of how to use this in the documentation.
  • The hg split comment is inconsistent with other similar commands (amend, revert, uncommit, etc). It does not accept filename and is interactive by default. We should align its behavior with the other commands.

Of course, this command-set will keep getting larger improvements. For example, ideas have been floating around about making it easier to control changeset order in the stack via new commands flags. However, those improvements are not on the critical path to enable changeset evolution in Core. So we don't plan to spend time on them until then.

Automatic evolution

There has been a long-standing debate: should we automatically evolve descendants of rewritten changesets". Automatically evolving orphans provide a smoother experience to users in many cases but come with multiple drawbacks:

  • It can trigger merge earlier than the user wants, forcing them to switch content from the changeset they were currently rewriting.
  • When rewriting all changesets in a stack, automatically evolving all descendant means the creation of an O(len(stack)²) changesets. N² complexity scale very badly and this can have a very dire impact on users repository and overall experience. So we tend to focus users into gradual evolution instead. This issue is referred as "obs markers explosion".

However, there has been a large demand for automatic evolution at least in some cases. Some command can guarantee they won't introduce conflict, lifting related concern. We could even consider extending that logic to all command as long as no conflict is detected.

The N² obsmarkers explosion is trickier because all command could possible triggers it. Math does not really make compromises, N² is unsustainable for most values of N. However, we still have room for improved experience. We could imagine allowing auto evolution if the number of descendants is small (maybe 4 or 5), or try to detect repeated rewrite in the stack. Once we detect a problematic pattern or stack size, we could skip auto evolution and redirect the user toward a helpful documentation page.

The hg stack command provided by the topic extensions provide a clear view of the current work in progress, even when orphans are involved. It is a good tool to reduce the user surge to run hg evolve --all prematurely.

Instability prevention

A good way to avoid exposing users to the complexity inherent to instability is to avoid creating it in the first place. Core mercurial already have some mechanism that history rewriting commands can use to check if their planned rewrite is valid. However, we need to make sure it is used by all history rewriting operation and that it catches all the instability types we want to prevent.

Obsmarker Discovery

All the logic we use for obsmarkers discovery is sound and ready to be upstreamed. However, the code will have to be rewritten as it gets in. The current implementation evolved from a multi-step experiment, and use too many complicated indirections and is not very effective in general. Some of the cache storage we use is also problematic and will have to be re-implemented.

In practice, most of the data we cache are not volatile. They are an inherent property of changeset, so we could imagine storing these value directly into a "changelog v3" format instead of keeping an independent cache structure.

I expect the discovery to keep being improved over time, however, these improvement can come later and are not in the critical path for enabling evolution in Core.

The work needed to upstream this is well defined but significant. Finding other use cases for the stablerange data might get more people interested in making it happens.

The caches used for obsmarkers discovery shares a lot of code with other performance related caches that should move upstream too.

Stack and workflow

The topic extensions provide various features that contribute to providing a smooth Changeset Evolution experience. While not strictly necessary these features simplify the task of making changeset evolution accessible to all users.

Stack definition

One of the core features of topic is the clear definition of the current working set: the stack. As each changeset explicitly carries the topic information, there is no room for ambiguity. Related changesets that a user is actively working on are not always linear. Either the user action or the instability brought in by distributed rewriting can spread them on multiple topological branches. The clear stack definition from topic handle these situations, making it an ideal candidates.

This stack definition simplifies operation around evolution. For example, restricting most operations within the current stack make things much more predictable: hg prev and hg next ignores other unrelated changesets; hg evolve only select items in the stack. Having a predictable outcome for these commands is important for users to trust them. Without clear stack boundary, the behavior of there command becomes either more limited or more complicated to explain.

The stack defines a limited number of changeset relevant to the current situation. A small number allows for better UI. For example, we can provide a hg stack command that displays orphan changeset as if they were already evolved. This provides a preview of the final structure of their stack, even if some of it is still orphan. This is a powerful tool to make changeset evolution accessible to all kind of users.

The limited amount of changeset make it possible to bring back incremental numbers to refer to a changeset. With topic the first item in the stack is nicknamed "s1", the second "s2". This is very useful to refer to changeset without having to copy paste obscure hash around. The numbering is also preserved across rewrite, making them useful for a longer time. Changeset Evolution empower commit centric workflow and "s#" alias make it easier to reference individual commits.

In practice, other ways to define resilient stacks that provide this benefits. For example, a stack definition based on phases and named-branch would do. Named-branches are a strong fit for long-lived branches, but they have life cycle limitations. Topic feels like a better solution for feature branching.

Exchange and Publishing workflow

Another interesting aspect of topic is how their life cycle is tied to phases.

Changeset Evolution can be used on the non-public part of history, and for sanity reason, this part should stay fairly limited. Nobody wants to see a six-month-old changeset rewritten, creating tons of orphan changesets in the process. A core property of topic is to fade away when they are published. This means that accepting a topic into the main branch requires to publish it. Having a workflow step that explicitly involves publishing make sure the set of non-public changeset remains reasonable.

Topic also solves another pain point with regarding phases. The main purpose of Changeset evolution is to unlock distributed collaboration on draft changesets. However, the current default for any Mercurial server is to publish changeset on push … so long for exchanging drafts. It is possible to configure the server to no longer publish on push but this has multiple drawbacks. First, this is a server-side config, usually more complicated to setup. Second, it means the phase cycle has to be manually though about and handled for all changesets. Finally, the change affects all users, you can't have a small group of advanced users playing with advanced feature without impacting other users. A common workaround is to have two repositories, a publishing, and a non-publishing one. This makes things more complicated than we would like.

Because topic is a new concept that people have to opt-in, we have more flexibility there. We could have a new default mode for servers where changesets are published on push as usual unless they have a topic. This way, users can opt-in draft changeset exchange without server side configuration and without impacting the other users. And since exchanging draft requires them to have a topic, they won't interfere with the usual branch resolution of users not interested in the topic. This offer a smooth path to draft changesets exchange without breaking backward compatibility. As a bonus that scheme force draft changeset to explicitly belong to a feature branches.

To deal with phases cycle, the topic extensions provides various options:

  • A new hg push --publish flag to push that make a push publishing even when pushing to a non-publishing server
  • A config option to have server behave as described above (non-publishing for topic only).

The topics also comes with small workflow improvement, like intuitive rebase destination, making it easier to use. One of the key improvement is the ability to push a new topic to a server without --force. Removing the cumbersome need to use this dangerous flag.

Upstreaming

The concept explained above are in a good overall shape, and we got good feedback on them. So, what would it take to add support for all this stack and workflow related concept upstream?

Some of them are clear and good workflow improvement not strongly related to topic:

  • The hg push --publish option proved very useful,
  • The experimental config option to limit a repository to 1 head per name should learn about closed heads and graduate from experimental.

Some of the "stack" related logic is also independent of topic and could be first implemented around named branches.

  • "s#" alias for changeset in a stack,
  • Feature from hg stack (eg: evolution aware order),
  • Constraint to the range of hg evolve, hg next and hg prev.

Finally, there are interesting pieces directly related to topic:

  • The topic manipulation commands and lifecycle,
  • The new publishing mode for servers,
  • The ability to push new topic without --force.

General performance and cleanup work will be needed. However implementing these specific features directly in Core will be significantly simpler than from the extension.

Instability Management

Commands changes and upstreaming

There are some incoming change to hg next: The clear definition of a working set to move within and the improved recovery when conflict occurs during evolution means we no longer need to hide hg next triggered evolution behind a --evolve flag. It should become the default.

And related change to hg evolve: step by step evolution is an important feature. However, since hg next will be able to fill this role, is make sense for hg evolve to evolve all changesets in the current stack by default. In addition, matching rebase behavior of preserving the working copy parent seems in order.

These changes will happen in the Evolve extension. However, after they happen, both hg evolve and hg next should be ready to be upstreamed provided we have access to a clear definition of the "current working set" of revisions (as topic provides).

We already have a couple of ways to trigger automatic instability resolution (eg: next, evolve, …). However, there could be another good vector to expose it to the user hg evolve. The command is already dealing with working copy change and merge conflict. Adding a hg update --evolve flag could make sense and offer a simple user experience for some of the simpler case. Lowering the barrier of entry is always useful.

Instability resolution:

Even if hg evolve can now handle the majority of instability cases, some remains to be handled. The unhandled case usually involves a mix of phase-divergence and content-divergence or some merges, split or fold. We need to hunt down and handle these remaining cases.

It is probably simpler to tackle the last corner case of automatic stabilization from the Evolve extension. So that we can have a wider set of users to test them more quickly. Besides that, the whole logic is in good shape and ready to be upstreamed. We can probably start to upstream some of the core bits sooner (eg: upstreaming hg next would make a good excuse to upstream the orphan resolution). An alternative approach would be to upstreaming the current instabilities resolution logic in a way where the Evolve extension can monkey patch fixes for people using an older version. This might be more work overall.

User Documentation

There have been inline documentation and tutorials written alongside evolve development. However, most of it does not contain the latest commands and workflow. Help is welcome to refresh them.

Conclusion

I'm happy with the progress made in the past years. As many, I wished we could have done more, faster, but I'm very excited to have a clear view on project completion now. By project completion, I mean mercurial Core to contain a good enough subset of Changeset Evolution so it could be enabled by default.

At Octobus, we do our best to bring this to completion and mobilize people and resource to reach this goal. Next, our own effort will focus on getting the fundations concept in Core: rewind command, automated resolution of instabilities and efficient obsmarkers exchange. In parallel, we'll take care of the supporting concept necessary to safely enable Changeset Evolution to all users (obs-history viewing, supporting command, stack and workflow, …).

Of course, this effort is not just performed by people at Octobus or people working closely with us. Other Mercurial contributors are also working on their own to complete this project. What they will exactly do and in what orders is something for them to decide.

All sort of help is welcome. For multiple years now we have helped the project moved in many ways: through direct contributions of course, but also by training new people to contribute to the concept and finally by efficiently gathering and spending money to make the concept move forward, reaching out, funding, and steering other members of our open source community to make the project move forward. Reach out to us if you want to contribute time or money to see Changeset Evolution enabled by default in Mercurial.

Discussions around the Changeset Evolution concept usually happens on the #hg-evolve IRC channel on freenode. If you are using the Evolve extension, do not forget to subscribe to the user list, Evolve-testers@mercurial-scm.org.

New Evolve extension release: version 8.1.2

Pierre-Yves David
2018-08-28

We pushed a new bugfix release for the Mercurial's evolve extension: 8.1.2

This extension extends core features around history rewriting and draft changesets sharing.

As usual, the release is available on pypi and upgrade is recommended.

This release fixes concurrency robustness issues around the sqlite based storage used by the experimental version of the obshashrange discovery caches.

With now more experience of running python sqlite in very concurrent situation, it does not seems to handle the job too well. Since it is only used for cache, the problematic case can be silently ignored when they occur.

Thanks to the people who have been actively testing the obshashrange discovery. This help detect these situations.

In addition, this release invalidate previous versions of the final cache layer that was affected by issues fixed in 8.1.1. This will ensure everybody has a valid cache content without requiring manual intervention (beside the update). This cache will be recomputed once (and is not the slowest to compute).

Version changelog

evolve (8.1.2)

Bug fixes

  • obshashrange: improved robustness of the cache under heavy load
  • obshashrange: force recomputation of the final obshash related cache (to make sure people benefit from the 8.1.1 fixes)

New Evolve extension release: version 8.1.1

Pierre-Yves David
2018-08-21

We pushed a new bugfix release for the Mercurial's evolve extension: 8.1.1

This extension extends core features around history rewriting and draft changesets sharing.

As usual, the release is available on pypi and upgrade is recommended.

This release contains fixes for hg next, the obshashrange cache and other areas.

The obshashrange protocol is used to discovery missing markers during exchanges exchanging. It might get enabled by default in the next version of the evolve extension. We recommend trying it now by adding the following to your config:

[experimental]
obshashrange = yes

If you were already using it, we recommend clearing your caches after upgrading to this release, their consistency might have been impacted by (now fixed) bugs.

hg rm .hg/cache/evoext*

Version changelog

evolve (8.1.1)

Bug fixes

  • evolve: properly set second parent during conflict (issue5927)
  • next: delete the evolvestate after aborting interrupted next --evolve
  • next: fix topic restriction when passing --evolve
  • prune: improve documentation
  • clone: fix possible crash when using clone bundle and forcing cache warming
  • obshashrange: fix speed and consistency issues during cache invalidation
  • obshashrange: properly persist all caches involved in obshashrange discovery

New Evolve extension release: version 8.1.0

Pierre-Yves David
2018-08-04

We pushed a new release for the Mercurial's evolve extension: 8.1.0

This extension extends core features around history rewriting and draft changesets sharing.

As usual, the release is available on pypi and upgrade is recommended.

This release contains many improvements to the hg evolve command. It now handles content-divergence and phase-divergence much better. In addition, a first version of the hg rewind command has been pushed. This command let you restore stack of changesets to a previous state. See the Changelog for others changes

This release support Mercurial version from 4.3 to the latest 4.7, tagged a couple of days ago. Support for Mercurial 4.3 might be dropped in the next version.

Version changelog

evolve (8.1.0)

New features and improvements

  • evolve: improve multiple aspects of content-divergence automatic resolution
    • branch changes handling,
    • parent changes handling,
    • description changes handling,
    • divergent stack handling,
    • improved resume, stop and abort of divergent resolution
  • evolve: improve orphan resolution when combined with divergence (issue5946)
  • evolve: improved automatic resolution of phase-divergence
  • rewind: first limited version of rewind command to restore stack of commit to a
        precusors state (check command help for detail and limitation)
  • evolve: add a --update and --no-update flags to evolve. It control final
        working copy parent
  • obslog: new --filternonlocal flag to hide change unknown locally
  • evolve: new help section dedicated to resuming operation interrupted by
        merge conflit, `hg help evolve.interrupted`.
  • evolve: show unfinished state information in hg status -v (issue5886)

Bug fixes

  • evolve: move bookmarks also when updating to successors (issue5923)
  • amend: abort --patch by saving an empty file (issue5925)

Compatibility changes

  • compatibility with mercurial 4.7

topic (0.10.0)

  • display a hint when a topic becomes empty
  • compatibility with mercurial 4.7

Performance benchmarking 101

2018-07-30

We all want top-quality software without any bug. But if there is something as frustrating than buggy software, it's slow software. Who never raged against software that seemed frozen? Who never reloaded a browser tab only to make the interesting make a brief appearance before the page reload?

So how to fix software slowness? Why is it hard to improve? Because before fixing a problem, we need to measure it. And measuring performance is a complex task by itself.

At Octobus, a lot of our work focus on Mercurial, a command-line tool. You invoke it dozens or thousands of times per day creating, rewriting and exchanging commits. The two facets that are interesting for us are the time for commands to finish (independently from the human interaction time) and more low-level functions benchmarking.

Measuring performance

Measuring performance is hard. Why is it hard? Because it's not a single boolean value (fast or not), it's a discrete value with a lot of influences from the environment. Here is a non-exhaustive list of things that can impact performance measurement:

  • Other running processes.
  • Hardware (CPU/Disk/Memory) performance.
  • CRON jobs.
  • AV scans.
  • CPU context switching.
  • Temperature throttling.
  • OS tuning.
  • Network traffic.
  • Cosmic radiations.
  • Noise (no kidding: https://www.youtube.com/watch?v=tDacjrSCeq4).
  • etc...

The "solution" is to try reducing the impact of these variables on the measurement. The other "solution" is to run the performance test not once but several times, potentially hundreds or thousands of times. This will give us a better representation of the performance of what we test.

One thing also to consider, there is little value to compare values obtained on different machines. What we can and will do is measure the performance of a single action on the same machine before and after applying a patch to see its impact on performances.

Performance tools

Luckily for us, performance tools already exist. But what are the required performance tools features?

  • Track a remote repository and associate measurement with each commit.
  • Build the environment for the selected commit.
  • Run the performance test suite for the built commit.
  • Save the results. As we run performance tests several times, it's useful to store more than 1 value (the median of the results for example) in order to be able to quantify the quality of the run.
  • Generate a report, in HTML it would be good.

For our use-case, we chose ASV (https://github.com/airspeed-velocity/asv) which stands for airspeed velocity. The documentation can be found here: https://asv.readthedocs.io/en/stable/. Our repository with the config and the performance tests are here: https://bitbucket.org/octobus/bighgperf.

Why ASV

ASV is handling a lot for us:

  • Allow tracking a mercurial repository, better as we are benchmarking mercurial itself.
  • The building part is already tailored for Python projects.
  • Run all the test suite or only a subset of tests for the built commit.
  • Store the results in JSON file, one directory per machine.
  • Create an easy to host HTML report.

The bonus features of ASV are nice:

  • ASV allows passing a complex set of commits to tests (for example all tagged commits on a specific branch).
  • It can also take only a subset of passed commits, you can ask it to take only 10 commits in the whole repository history. Very useful to quickly get a glance at the performance evolution of a project or release cycle.
  • Auto-detection of regressions between commits or range of commits. Like operation A is 1.5x slower between commit N and N+1. Or operation B is 1.8x slower between tag X and X+1.
  • Automatic bisection of regression. In case of a regression in a range of commits, ASV has the tooling to bisect which is the commits that introduced the regression, very useful to pinpoint the responsible commit.

In the next blog posts about ASV we will show you how to get started with ASV to start benchmarking your code.

Further reading

We made a presentation, in French, on the subject: https://octobus.net/presentations/perf_test.html.

Our friend Victor Stinner, a Python Core-Developer, also makes very good blog posts about the subject: https://vstinner.readthedocs.io/benchmark.html.

New Evolve extension release: version 8.0.1

Pierre-Yves David
2018-06-11

We pushed a new release for the Mercurial's evolve extension: 8.0.1

This extension extends core features around history rewriting and draft changesets sharing.

As usual, the release is available on pypi and upgrade is recommended.

Version changelog

evolve (8.0.1)

  • compatibility with mercurial 4.6.1,
  • next-prev: respect commands.update.check config option (issue5808),
  • next-prev: fix evolve --abort on conflicts (issue5897),
  • obslog: fix breakage when commit has no description,
  • amend: use context manager for locks (issue5887),
  • evolve: fix detection of interactive shell.

topic (9.0.1)

  • topic: fix documentation formatting.

Mercurial Mini-Sprint - May 2018

Pierre-Yves David
2018-05-28

From Monday 21th to may 25th, around 12 people gathered in the Logilab office to discuss version control and hack on Mercurial. A big thanks to Logilab for hosting us. We covered multiple topics:

Mercurial 4.6

  • Fixing Mercurial 4.6 compatibility for the following extensions:
    • hgsubversion,
    • hg-git (and preparation of the associated release),
    • confman,
    • preparing a Windows installer for tortoise-hg 4.6

hgweb

  • First version of a Readme extension that displays the lasted content of a project README on hgweb landing page,
  • Various usability improvements for hg serve (automatic browser opening, printing URL, etc),
  • Displaying obsolescence log information in hgweb,
  • Ability to view and exchange hidden changeset through hgweb,
  • Discussion how to facilitate theme development and maintenance.

Performance tracking

  • Upstreaming of some of our work on ASV,
  • Work on benchmarking clone and stream clone time,
  • Discussion with a Git core developer about similar efforts in the git community.

Evolution and feature branches

  • New flag for obslog to filter out changesets that are unknown locally,
  • Improvement of core ability to do in memory merge and in memory diffing,
  • New message about local phase movement on pull,
  • Improvement of message around empty topics,
  • Implementation of an "internal phase" for changesets that are a byproduct of Mercurial command (eg: evolve, shelve).
  • Discussion about feature branch and workflow.

Mercurial hosting

  • Installation of an experimental heptapod instance (gitlab + Mercurial) for further testing,
  • Mercurial compatibility for the Gitlab CI runners.

Other

  • Experiment with tortoise-hg styling to get a mode closer to what hgview does,
  • Implementing strip-less shelve using the new internal phase,
  • Demo of an experimental test orchestrator atop the Mercurial test runner,
  • Refactoring of context.py to better facilitate in-memory merges.

Thanks to Anton Shestakov, Arthur Lutz, Aurelien Campeas, Boris Feld, Christian Couder, David Douard, Denis Laxalde, Nicolas Spanti, Paul Morelle, Phillipe Pépiot, Pierre-Yves David and Sean Farley for being part of this event. Extra thanks go to Pulkit Goyal for remotely helping with the shelve code.

New Evolve extension release: version 8.0.0

Pierre-Yves David
2018-04-25

We pushed a new release for the Mercurial's evolve extension: 8.0.0

This extension extends core features around history rewriting and draft changesets sharing.

As usual, the release is available on pypi and upgrade is recommended.

The release drops support for Mercurial 4.1 and 4.2 and adds support for Mercurial 4.6 to be released in a couple of days. Support for some deprecated flag or template has been dropped too.

Version changelog

evolve (8.0.0)

New feature

  • evolve: a new --abort flag which aborts an interrupted evolve resolving orphans,
  • hg evolve now return 0 if there is nothing to evolve,
  • amend: a new --patch flag to make changes to current changeset by editing patch,

Bug fixes

  • evolve: fixed some memory leak issue,
  • evolve: prevent some crash with merge and split (issue5833 and issue5832),
  • evolve: improved support for solving phase-divergence situation,
  • evolve: improved support for solving orphan situation,
  • obs-discovery: added unit to various progress bars,
  • evolve: record "operation" for command where it was missing,

Compatibility changes

(for both evolve version 8.0.0 and topic version 0.9.0)

  • compatibility with Mercurial 4.6,
  • drop support for Mercurial 4.1 and 4.2,
  • --obsolete and --old-obsolete flags for hg graft are dropped,
  • templatekw: remove obsfatedata templatekw. Individuals fields are available in core as single template functions,
  • topic: restraining name to letter, '-', '_' and '.'

New Evolve extension release: version 7.2.0

Pierre-Yves David
2018-01-16

We pushed a new release of evolve and topic: 7.2.0.

As usual, the release is available on pypi and upgrade is recommended.

This version comes with the usual various bug fixes and improvement. It also includes a large rework of the experimental obsolescence markers discovery protocol we call "obshashrange". The newer protocol is faster to compute and cache. It was able to handle all the repositories it has been tested on (up to around 1 millions revisions). The experimental is still turned off by default, use experimental.obshashrange=yes to use it.

Version changelog

evolve (7.2.0)

  • evolve: changes to the on-disk format for interrupted evolve
  • evolve: --continue now properly preserve phase (issue5720)
  • evolve: --continue now properly reports merges as evolve
  • commit: suggest using topic on new heads
  • uncommit: --revert flag added to clean the wdir after uncommit
  • obslog: add color support to content-diff output with --patch
  • fix hg prev behavior on obsolete changesets
  • no longer issues "obsolete working copy" message during no-op
  • use the new instabilities names from mercurial 4.4+ (in hg evolve --list and other messages)

New algorithm for obshashrange discovery:

The new algorithm is faster, simpler to cache and with better complexity. It is able to handle repository of any size (naive python implementation is a bit slow). Support for the previous experimental approach has been dropped, please update both clients and servers. This new discovery method is disabled by default. Use experimental.obshashrang=yes on both client and server.

topic (0.7.0)

  • fix compatibility with Mercurial-4.3
  • new template keyword topic to get changesets topic

New Evolve extension release: version 7.0.1

Pierre-Yves David
2017-11-14

We just pushed small bug fix release for Evolve: 7.0.1.

As usual, the release is available on pypi and upgrade is recommended.

Version changelog

evolve (7.0.1)

  • obsdiscovery: extend the config option to disable discovery to
              server-side (it was previously only honored on the client
              side),
  • server: avoid exposing 'abort' to evolution enabled client talking
        to server with the extension bu obsolescence marker exchange
        disabled.

topic (0.5.1)

  • fix new-heads check when pushing new topic with --publish.

New ConfigExpress extension relase: 0.3.0

Pierre-Yves David
2017-11-03

The Config Express extension allows for clients to share more details with the server and the server to push recommended config to the client.

Version 0.3.0 has just been published. It brings various improvements and bug fixes:

  • improvement to the extension documentation,
  • Mercurial-4.4 compatibility,
  • fixes configuration recommendations on windows,
  • allows client to send the list of enabled extensions and their version.

New Evolve extension release: version 7.0.0

Pierre-Yves David
2017-11-02

We just pushed a release for Evolve: 7.0.0.

As usual, the release is available on pypi and upgrade is recommended.

This release does not bring new features to evolve but drop compatibility for older Mercurial versions (4.0 and below) and drop multiple old and deprecated protocols. This allowed us to clean up the code and should help the situation where Evolve is selectively turned on only some repository server side.

On the other hand, Topic got multiple new workflow related options to experiments with.

Version changelog

evolve (7.0.0)

  • drop compatibility with Mercurial 3.8, 3.9 and 4.0,
  • drop support for old and deprecated method to exchange obsmarkers (now requires bundle2 enabled clients),
  • forbid usage of the old pushkey based protocol to exchange obsmarkers,
  • evolve: rename --contentdivergent flag to --content-divergent,
  • evolve: rename --phasedivergent flag to --phase-divergent.

topic (0.5.0)

  • add an experimental flag to enforce one head per name policy, (off by default, see hg help -e topic for details),
  • add an experimental flag to have changesets without topic published on push, (off by default, see 'hg help -e topic' for details),
  • add a --publish flag to hg push (4.4+ only).

New Evolve extension release: version 6.8.0

Pierre-Yves David
2017-10-23

We just pushed a new release for evolution: 6.8.0.

As usual, the release is available on pypi and upgrade is recommended.

The main thing about this release is compatibility with the Mercurial 4.4-rc. Multiple bits of evolve have been uploaded into core recently and evolution now relies on them when available.

Be advised that we are planning to drop compatibility for multiple older Mercurial versions and deprecated older experimental protocols in the next version. Our current plan is to support Mercurial 4.1 and above in the next release (7.0.0).

Version changelog

evolve (6.8.0)

  • compatibility with Mercurial 4.4 (use upstream implementation for obsfate and effect-flags starting hg 4.4+).
  • pager: pager support to obslog and evolve --list.

topic(0.4.0)

  • topic: fix handling of bookmarks and phases while changing topics. (Mercurial 4.2 and above only)
  • topic: fix 'topic-mode' behavior when amending.
  • pager: pager support to topics and stack.

New evolve extension release: version 6.7.0

Pierre-Yves David
2017-09-28

We just pushed a new release for evolution: 6.7.0.

As usual, the release is available on pypi and upgrade is recommended.

The most notable new features are the --interactive flag support for hg amend --extract (also known as hg uncommit) and a command to help migration from bookmark to topic (hg debugconvertbookmark).

It also contains multiple small improvements to both evolve and topic. In particular regarding documentation following the evolve mini-sprint in August.

We would like to thank all the people who helped with this version of evolve during the August minisprint and the Pycon-fr sprint last week. Here is the contributor list for this release:

  • Aurélien Campéas
  • Boris Feld
  • Denis Laxalde
  • FUJIWARA Katsunori
  • Philippe Pépiot
  • Pierre-Yves David
  • Pulkit Goyal
  • Ryan McElroy

Version changelog

evolve (6.7.0)

  • compatibility with change in future 4.4 at this release date,
  • documentation: improvement to content, wording and graphs,
  • obslog: improved templatability,
  • obslog/log: improve verb used to describe and evolution,
  • pstatus/pdiff: update to full command. They now appears in the help,
  • uncommit: add a --interactive option (4.3+ only).

topic (0.3.0)

  • push: add a --topic option to mirror --bookmark and --branch,
  • stack: improve display of interleaved topic,
  • stack: improve display of merge commit,
  • topic: add a new 'debugconvertbookmark' commands (4.3+ only), It helps migrating from bookmark feature branch to topic feature branch,
  • topic: --age flag also shows the user who last touched the topic,
  • topic: be more informative about topic activation and deactivation,
  • topic: gain a --current flag,
  • topic: small clarification and cleanup on various output.

Evolve documentation mini-sprint

Pierre-Yves David
2017-09-17

A few weeks ago we organized a mini-sprint on Evolve to improve the project documentation, smoothen some rough-edges and makes the project more newcomers friendly.

The sprint has gathered several folks:

  • Boris Feld (me),
  • Pierre-Yves David (Octobus Founder),
  • Pulkit Goyal invited by Octobus,
  • Denis Laxalde (from Logilab),
  • Philippe Pépiot (from Logilab),
  • Christophe de Vienne (from orus.io) who made a nice debrief,
  • Aurélien campéas (from Pythonian),
  • Ryan McElroy from (from Facebook).

The sprint was hosted by Logilab in Paris for two days. Thanks a lot!

The sprint was focused on three main goals:

Improve the documentation content

A big goal of the sprint was to improve the evolve documentation. Pierre-Yves, Philippe Pepiot and Ryan McElroy worked in this subject to:

  • Update installation instructions to use pip, it should facilitate the extension installation for newcomers.
  • Reorder and lighten the landing page.
  • Various fixes of typos, cases consistency and general rewording.
  • Missing and outdated content is now listed in a page, providing raw pointers. This should help reader to be aware of what they are missing and investigate on their own.
  • Content from the inline documentation is now directly reused in the online sphinx documentation (eg: hg help amend).

The online documentation now reflect these changes.

Improve the documentation tooling

Multiple thing that was desperately missing from evolve tutorials were history graphs. We wanted to have graphs like this one:

graphviz

We improved one of our extension, mercurial-docgraph, which can generate a graphviz img based on a Mercurial repository and a revset.

Boris Feld and Christophe de Vienne worked to:

  • Make it a proper Mercurial extension.
  • Add command line arguments to choose whether to output a png image or the graphviz dot content on stdout.
  • Also add support for --sphinx-directive option that output a sphinx graphviz extension compatible directive on stdout.
  • Work on grouping and aligning nodes.
  • Polish the theme used.

The evolve tutorials are now updated with nice graphs.

Usability bugs

Pulkit Goyal worked on:

  • Add the --current option to hg topic to helps setting the current topic to several changesets at the time.
  • Restrict topic names to classic rules for tags and branches (no integer only name, no reserved words...).
  • Add a new debugconvertbookmark command to convert bookmarks to topics.
  • Improving various wording and rendering of various topic related commands.
  • add a --interactive support for hg amend --extract (also known as uncommit).

Bugs cleaning

  • Fix a bug where obsolete tag could sneak back into the tag file (issue5539).
  • Improve behavior of rebase when obsolete and unstable revision are present (issue5300) (still in progress).

Thanks again to all attendees for the progress made and to Logilab for hosting. We are looking forward to the next mini-sprint.

Mercurial 4.3 release

2017-08-02

Mercurial 4.3 has been released and it's a pretty big release, you can find the release note here or below where we will highlight some changes:

Notable changes

  • Support for Python 2.6 has been dropped, so if you are still using it, stay on version 4.2 or update to Python 2.7.
  • Bundles now store phase information, you can now safely share draft and public changesets with your colleagues using bundles.
  • Two security issues have been fixed related to ssh repository URL and symlinks, for all details, see below.
  • The share extension now properly shares relevant caches between shares.
  • Server can now allow more concurrent pushes as long as they do not affect the same heads.

Multiple interesting experiments are going on:

  • Status is gaining a 'terse' mode. In this mode, the status of files is aggregated by directory.
  • A new option to improve manifest storage on repository with many branches has been added.
  • The sparse extensions have been integrated into the core distribution as experimental.

There have been a couple of update on the changeset evolution side:

  • Strip extension now removes relevant obsmarkers. So if you want to get rid of an amend, strip the successor and relevant obs-markers will be stripped too. The backup bundle will contain the stripped obsmarkers so they can be restored if necessary.
  • pushs and pulls now informs the user when changesets get obsoleted.

A handful of significant internal reworks happened in this release:

  • The support for Python 3 has been improved, every release brings us closer to be fully Python 3 compatible.
  • Transactions are now storing precise data about the changes it contains. This allows for accurate reports and more advanced extensions logic.
  • The rebase extensions internals were reworked to allow more advance rebase scenario with multiple destinations at the same time.
  • Config options can now be formally registered. This will unlock nice feature in the future like config validation and automatic documentation generation.

Security issue

The 4.3 release contains a fix for two CVE: CVE-2017-1000115 and CVE-2017-1000116. Upgrade as soon as possible. See the mailing-list for more information about the vulnerabilities.

Please be aware that the 4.3 release drop support for Python 2.7. Due to the vulnerabilities, a 4.2.3 version has been released with the security fixes. Upgrade to 4.2.3 if you are still using Python 2.6. You can download the release here.

We had to backport the security patches for Mercurial 4.1.3 for some of our customers. Due to the severity of the vulnerabilities, we made the patches available for everyone at the following address: https://bitbucket.org/octobus/mercurial-backport/branch/backport-4.1.

User-visible changes

Packaging

We did the necessary work to generate Linux wheels for the 4.3 release. For people who don't know what a wheel is, it's basically a compiled packaged of Mercurial installable by pip. We built wheels for Linux:

  • No need for a compiler
  • Declaratives metadata
  • Faster installation

On my laptop, the installation of Mercurial through pip went from 13 seconds with the default package to 1 second only.

If you want to test them, here are the instructions but be aware that we provide them as an experiment, you are using them at your own risk:

Run pip install -U --index-url=https://packagecloud.io/octobus/Mercurial/pypi/simple mercurial to update your mercurial installation with the 4.3 release of mercurial. Be warned that they are only Linux wheels, Windows wheels are available on Pypi and should be selected automatically. Mac OS X wheels are not available yet.

Behavior changes

There is a couple of backward incompatible changes, be sure to take a look at them! We compile a non-exhaustive list of the most notable ones:

  • The backup bundle name has changed for most operations.
  • The clonebundle hook arguments have changed.
  • hg update now show the commit it updated to in case of multiple heads.

Sharing caches

The share extension helps working with big repositories. It can share the 'store' (ie history) portion of a repository between multiple working copy. One thing was still painful, the various caches were not shared so each repository has to compute the phase cache, tag cache, bookmark cache, etc... which takes a significant amount of time per repository.

But that was before and with Mercurial 4.3 caches are now correctly shared, improving the performance when working with big repositories.

Concurrent pushes

When, pushing, a client first exchanges data with the server to discover the minimal information that actually needs to be pushed. Then it prepares a bundle containing these information and push it. It is possible for another client to update the repository between the discovery phase and the actual push phase. Voiding the validation that the client might have made before sending the data. Mercurial has a mechanism to detect such situation and abort the raced push.

This race detection can get in the way for repository with a lot of incoming traffic. So we introduced a new server.concurrent-push-mode option to control this behavior. The server config can be setup to accept pushed prepared concurrently if they affect unrelated heads.

    [server]
    concurrent-push-mode = related

Change in the Experimental realms

Manifest size

We have been investigating a sudden increase in the manifest size for some of our clients. The issue comes from a heuristic responsible of triggering new full snapshots. That heuristic fails badly when the number of concurrent branches becomes too large.

During the 4.3 cycle, we have upstreamed a new experimental option named 'maxdeltachainspan'. With this option set to 0, we have seen a reduction of manifest size in the order of 70x in our clients. This impacts push, pull and day-to-day commands durations. The manifest size exposition triggers when the number of parallel branch growth. So the more branches and merges you have, the bigger the win should be.

If you want to test it against your repositories:

  1. Make a new local clone
  2. Update the config to contains

         [format]
         aggressivemergedeltas = yes
         [experimental]
         maxdeltachainspan = 0 # no limit
     
  3. Run hg debugupgraderepo --optimize redeltaall --run --config format.aggressivemergedeltas=yes --config experimental.maxdeltachainspan=0 (this will likely take a long time).

Terse status

Pulkit Goyal started to integrate the terse status feature in the core hg status commands. When this flag is passed, directory summary will be used whenever possible. For example:

    $ hg status
    ? foo/x
    ? foo/y

Becomes:

    $ hg status
    ? foo/

[Experimental] - Sparse extension

The sparse extension is now shipped as an experimental extension. It is helpful with big mono-repositories when you only want a subset of your repository files to be checked out.

All the information can be found on the sparse extension documentation accessible with: hg help sparse.

Contribution statistics

To conclude, we compiled contribution in number of changesets from various contribution sources. Changesets are not an absolute metric, but it helps to get an idea of where contributions come from. Contributors from the same company have been grouped for clarity.

361 @octobus.net
291 @google.com
220 @fb.com
197 yuya@tcha.org
127 @mozilla.com
104 7895pulkit@gmail.com
84 matt_harbison@yahoo.com
61 foozy@lares.dti.ne.jp
18 sean@farley.io
12 kbullock@ringworld.org
6 wbruna@softwareexpress.com.br
6 @logilab.com
5 @nokia.com
4 demelier.david@gmail.com
4 danek.duvall@oracle.com
4 av6@dwimlabs.net
3 rishabhmadan96@gmail.com
3 marutosijp2@gmail.com
2 @unity.com
2 steve@borho.org
2 bamccaig@gmail.com
1 mtietze@gmx.com
1 cryo@cyanite.org
1 blacktrash@gmx.net
1 andrew.zwicky@gmail.com