Sharing references between Python and Rust
Raphaël Gomès2019-07-25
In 2018, the Mercurial project decided to use Rust to improve performance and maintainability of previous high-performance code, you can read more about it in the Oxidation Plan.
While one may argue that Rust took inspiration from Python in some aspects of its semantics, the two languages don't share a lot of similarities at a lower level. Rust strict memory borrowing rules and default immutability don't play very nice with some of Python's features: its dynamic typing, mutability rules, classes, garbage collecting, to name the big ones.
We have faced some interesting challenges when bridging the Python implementation with the new Rust code, and this is one that I have not found any literature about.
Technological stack
There are two main crates used for bridging CPython and Rust: rust-cpython
and PyO3
. The latter is a fork of the former that happened when rust-cpython
was seemingly abandoned, and has more features like support for properties and an arguably nicer syntax thanks to (then unstable) procedural macros.
We are however using rust-cpython
because PyO3
does not (yet) compile on stable Rust, but the idea behind this article is still relevant regardless of the bridge used.
The issue at hand
During the rewrite of some core parts of Mercurial, we had to present a class-like interface to Python that would run Rust code. More often than not, that class implemented __iter__
, which requires Python to hold a reference to a Rust iterator. Whenever we faced that issue, we just copied the entire structure to something Python understands, which is terrible. But that was good enough for the purpose of continuing the rewrite.
However, as the frontier between Python and Rust got more defined, we knew that we couldn't wait any longer to solve that issue. My colleague Georges (over at blog.racinet.fr) took the opportunity of a long train trip to dig into shared references with a minimalist example.
A minimalist example
Let's implement a stripped-down version of Python's set
that only works with int
, for simplicity's sake.
The basic features are pretty easy to implement with rust-cpython
and would look like this:
extern crate cpython;
use cpython::*;
use std::cell::RefCell;
use std::collections::HashSet;
type Inner = HashSet<u32>;
py_class!(class RustSet |py| {
data hs: RefCell<Inner>;
def __new__(_cls) -> PyResult<RustSet> {
Self::create_instance(py, RefCell::new(Inner::new()))
}
def __contains__(&self, v: u32) -> PyResult<bool> {
Ok(self.hs(py).borrow().contains(&v))
}
def add(&self, v: u32) -> PyResult<PyObject> {
self.borrow_mut(py)?.insert(v);
Ok(py.None())
}
def extend(&self, iterable: &PyObject) -> PyResult<PyObject> {
let mut hs = self.hs(py).borrow_mut(py)?;
for vobj in iterable.iter(py)? {
hs.insert(vobj?.extract::<u32>(py)?);
}
Ok(py.None())
}
});
The py_class!
macro helps us define a Python class that we insert in a shared library that Python will treat like a normal Python module. How exactly that is done is explained in rust-cpython's documentation and the repository of this experiment (which uses Heptapod, Gitlab with beta Mercurial support!) is available here.
In short, the hs
data attribute will hold all the data in a Rust HashSet
, the rest is basic encapsulation. Why a RefCell
is used is also explained below.
So far, nothing really exciting is happening, but this allows us to define the following in a Python script using the generated .so
as .shared_ref
.
import sys
try:
from .shared_ref import RustSet
except ImportError:
sys.stderr.write(
"Rust extension not found. Please run 'cargo build' first.\n"
)
sys.exit(1)
def test_basic():
"""Test basic scaffolding API: not needing to share refs."""
rs = RustSet()
rs.add(3)
assert 3 in rs
assert 4 not in rs
rs.extend(x**2 for x in range(10))
assert 4 in rs
assert 81 in rs
assert 65 not in rs
def run():
test_basic()
Now on to the good stuff.
We need an iterator class that exposes __next__
and __iter__
to use from the Python side.
use std::collections::hash_set::Iter;
py_class!(class RustSetIterator |py| {
data hs: RustSet;
data it: RefCell<Iter<'static, u32>>;
def __next__(&self) -> PyResult<Option<u32>> {
Ok(self.it(py).borrow_mut().next().map(|r| *r))
}
def __iter__(&self) -> PyResult<Self> {
Ok(self.clone_ref(py)) // `clone_ref` gives a new Python reference
}
});
If RustSet is the Python class, the hs
data attribute is an instance of the class. it
is a Rust iterable (here from the hash_set
module) of u32
, just like our Inner
type, with a 'static
lifetime.
But, you might say, those integers are not static, they will be defined at runtime! Well, since Rust has no way of knowing what Python does, everything must be Send + 'static
, hence the use of a RefCell
(or Cell
, for simpler types) whenever we want our Python object to hold a reference to Rust data.
However, just writing 'static
does not magically make the compiler happy. For that, we need to enter the world of unsafe
, one of the major use cases of is FFI code just like what we're doing.
Our RustSet
class will then implement __iter__
the following way:
def __iter__(&self) -> PyResult<RustSetIterator> {
let ptr = self.hs(py).as_ptr();
let as_static: &'static Inner = unsafe {&*ptr};
RustSetIterator::create_instance(
py,
self.clone_ref(py),
RefCell::new(as_static.iter())
)
}
Here, we take a raw pointer to our inner HashSet
and (unsafely) tell Rust that its lifetime is 'static
, which means that we are forgoing the help of the borrow-checker and we will have to do a bit of manual work to make all of that work.
For the time being, we have a basic iterator interface, which looks like:
def test_iter():
rs = RustSet()
start_count = sys.getrefcount(rs) # should be 2 (see Python doc)
rs.extend(range(4))
it = iter(rs)
assert sys.getrefcount(rs) == start_count + 1
assert set(it) == {0, 1, 2, 3}
del it
assert sys.getrefcount(rs) == start_count
it2 = iter(rs)
del rs
assert set(it2) == {0, 1, 2, 3}
del it2
Nice. But this is just reading from the iterator, and our naive lifetime trick will not work for very long once mutation is involved.
Let's add a clear
method to introduce mutation of the data now shared between Python and Rust.
def clear(&self) -> PyResult<PyObject> {
let mut hs = self.hs(py).borrow_mut();
hs.clear();
// Force freeing of underlying memory to underline the risk of
// segfault
hs.shrink_to_fit();
Ok(py.None())
}
As is said in the inline comment, we also force the inner HashSet
to free its unused memory to trigger a segfault if Python still tries to access it.
This allows us to showcase the bug in a simple Python test:
def test_race_safety():
rs = RustSet()
# have Rust allocate some real amount of memory
rs.extend(range(10000))
it = iter(rs)
# Trigger freeing the underlying memory
rs.clear()
next(it) # segfault
Reference counting
We need to implement a higher-level system for ensuring memory safety.
Let's add a reference counter to our RustSet
to prevent mutation when Python holds a reference to the unsafe pointer.
Inside RustSet
:
// ...
/// Replaces the previous `use`
use std::cell::{Cell, RefCell};
// ...
data leak_count: Cell<usize>;
// ...
def __new__(_cls) -> PyResult<RustSet> {
Self::create_instance(py, RefCell::new(Inner::new()), Cell::new(0))
}
def add(&self, v: u32) -> PyResult<PyObject> {
/// Changed from self.hs(py).borrow_mut()
self.borrow_mut(py)?.insert(v);
// ...
}
def extend(&self, iterable: &PyObject) -> PyResult<PyObject> {
/// Changed from self.hs(py).borrow_mut()
let mut hs = self.borrow_mut(py)?;
// ...
}
def __iter__(&self) -> PyResult<RustSetIterator> {
RustSetIterator::create_instance(
py,
self.clone_ref(py),
RefCell::new(self.leak_immutable(py).iter()),
Cell::new(false),
)
}
def clear(&self) -> PyResult<PyObject> {
/// Changed from self.hs(py).borrow_mut()
let mut hs = self.borrow_mut(py)?;
// ...
}
For our private functions, we can step out of the py_class!
macro and create the following impl
block:
/// Replaces the previous `use`
use std::cell::{Cell, RefCell, RefMut};
impl RustSet {
fn leak_immutable(&self, py: Python) -> &'static Inner {
let ptr = self.hs(py).as_ptr();
self.leak_count.replace(self.leak_count.get() + 1);
unsafe { &*ptr }
}
fn borrow_mut<'a>(&'a self, py: Python<'a>) -> PyResult<RefMut<Inner>> {
match self.leak_count.get() {
0 => Ok(self.hs(py).borrow_mut()),
_ => Err(AlreadyBorrowed::new(
py,
"Can't mutate while there are immutable \
references in Python objects",
)),
}
}
fn decrease_leak_count(&self, py: Python) {
self.leak_count
.replace(self.leak_count.get().saturating_sub(1));
}
}
Here leak_immutable
does the same lifetime extension "trick" that we did earlier, but also increments the leak_count
data attribute to keep track of references. Of course, there is also a function to decrease the leak count.
The more interesting bit is the borrow_mut
function, which encapsulates the same function of the inner HashSet
and checks if the leak count is 0
before allowing Python to borrow, raising a custom Python exception otherwise. This very simple mechanic is only possible because we hold the reference to the GIL (via py
, of type Python
) and are thus guaranteed a single-threaded context.
Inside our RustSetIterator
, we can replace our __next__
method to make use of our newly defined reference counter:
data done: Cell<bool>;
def __next__(&self) -> PyResult<Option<u32>> {
if self.done(py).get() {
return Ok(None);
}
Ok(match self.it(py).borrow_mut().next() {
None => {
self.done(py).replace(true);
self.hs(py).decrease_leak_count(py);
None}
Some(&r) => Some(r)
})
}
We use a simple boolean to determine if the iterator is done, separating the lifetime of the iterator from that of its data, as shown in this example:
def test_race_safety():
rs = RustSet()
# have Rust allocate some real amount of memory
rs.extend(range(10000))
it = iter(rs)
# Trigger freeing the underlying memory
try:
rs.clear()
except AlreadyBorrowed:
pass # \o/
else:
raise AssertionError("Should not have been able to clear RustSet "
"instance while holding an iterator on it")
next(it) # that would be a segfault
# Consume iterator
assert len([x for x in it]) == 10000
rs.clear()
# the consumed iterator is actually still usable (doesn't need the
# data anymore to raise StopIteration)
assert [x for x in it] == []
We are unable to mutate our RustSet when an iterator still refers to its data, hurray!
Now, this differs from the Python behavior, let's see what happens if we replicate the example with a standard set
:
>>> s = set([1, 2, 3])
>>> it = iter(s)
>>> s.clear()
>>> next(it)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: Set changed size during iteration
Python does not complain when we mutate the set, only when we try to use the iterator, which is arguably a little too late. Most well written Python code should not have to adapt too much to this change to a stricter behavior.
However there exists some use cases where this "feature" it is useful; I've seen those use cases in Mercurial code where this implementation falls apart because some iterator is never depleted, so you add a del my_iterator
which only really fixes the Python implementations that are reference counting... blablabla. Plus, "breaking user space" is always a bad idea unless really really important, so we will need to improve on that later.
Harnessing the garbage collector
The code above will run just fine given our conditions, but this example is a tad simplistic and our RustSet
still needs a bit of work to play nice with Python in the general case.
Indeed, our shared references might get garbage collected by Python and result in a memory leak (and a runtime bug) because Rust still thinks that those references exist.
As suggested in rust-cpython
error messages, the proper way to implement Drop
for PythonObject
is to implement it on one of the data members.
Let's create a struct that will manage our reference count on Drop
:
struct RustSetLeakedRef {
rs: RustSet,
}
impl RustSetLeakedRef {
fn new(py: Python, rs: &RustSet) -> Self {
RustSetLeakedRef {
rs: rs.clone_ref(py),
}
}
}
impl Drop for RustSetLeakedRef {
fn drop(&mut self) {
let gil = Python::acquire_gil();
let py = gil.python();
self.rs.decrease_leak_count(py);
}
}
This struct simply holds a Python reference to the RustSet
instance, and decreases the leak count when dropped. Cool, now our RustSetIterator
can use this new struct to make rust-cpython
's Drop
hook happy:
py_class!(class RustSetIterator |py| {
data rs: RefCell<Option<RustSetLeakedRef>>;
// ...
def __next__(&self) -> PyResult<Option<u32>> {
let mut rs_opt = self.rs(py).borrow_mut();
if rs_opt.is_some() {
Ok(match self.it(py).borrow_mut().next() {
None => {
// replace Some(rs) by None, hence drop RustSetLeakedRef
rs_opt.take();
None
}
Some(&r) => Some(r)
})
} else {
Ok(None)
}
}
// ...
});
You will notice that we got rid of the done
boolean in favor of an Option
, which is much more idomatic.
Lastly, our RustSet
needs to update its __iter__
method to use the new RustSetLeakedRef
:
def __iter__(&self) -> PyResult<RustSetIterator> {
RustSetIterator::create_instance(
py,
RefCell::new(Some(RustSetLeakedRef::new(py, &self))),
RefCell::new(self.leak_immutable(py).iter()),
)
}
We can now test this from the Python side:
def test_drop_before_end():
rs = RustSet()
# have Rust allocate some real amount of memory
rs.extend(range(10))
it = iter(rs)
# Implementation of get_leak_count() left as
# an exercise to the reader ;)
assert rs.get_leak_count() == 1
next(it)
del it
assert rs.get_leak_count() == 0
Works just as intended!
Did you spot the bug?
This safety mechanism guards us against a mutable borrow after immutable borrows, but fails to prevent immutable borrows following a mutable borrow!
We need to tell mutable and immutable borrows apart. The nice thing is, you can only have at most one mutable reference; this allows us to just use a boolean instead of a more complicated structure.
Let's start by updating the API inside our RustSet
:
data mutably_borrowed: Cell<bool>;
// ...
def __new__(_cls) -> PyResult<RustSet> {
Self::create_instance(py, RefCell::new(
Inner::new()),
Cell::new(0),
Cell::new(false),
)
}
// ...
fn leak_immutable(&self, py: Python) -> PyResult<&'static Inner> {
// We add this check right at the top of the function.
// Notice the signature change to a `PyResult`.
if self.mutably_borrowed(py).get() {
return Err(AlreadyBorrowed::new(
py,
"Cannot borrow immutably while there is a \
mutable reference in Python objects",
));
}
// ...
unsafe { Ok(&*ptr) }
}
fn decrease_leak_count(&self, py: Python, mutable: bool) {
// ...
if mutable {
self.mutably_borrowed(py).replace(false);
}
}
I've purposely kept borrow_mut
at the end since it needs a bit more explaining:
fn borrow_mut<'a>(&'a self, py: Python<'a>) -> PyResult<PyRefMut<Inner>> {
// Same check at the top of the function.
if self.mutably_borrowed(py).get() {
return Err(AlreadyBorrowed::new(
py,
"Cannot borrow mutably while there exists another \
mutable reference in a Python object",
));
}
match self.leak_count(py).get() {
0 => {
// Update the flag upon borrow
self.mutably_borrowed(py).replace(true);
Ok(PyRefMut::new(py, self.hs(py).borrow_mut(), &self))
}
// ...
}
}
As you can see, we're not returning a RefMut
anymore, but a PyRefMut
. It's a new struct that wraps the RefMut
to add a crucial bit of additional behavior to it.
Let's see how we implement it:
struct PyRefMut<'a, T> {
inner: RefMut<'a, T>,
rs: RustSet,
}
The 'a
lifetime annotation tells us that out PyRefMut
lives exactly as long as the inner RefMut
.
impl<'a, T> PyRefMut<'a, T> {
fn new(py: Python, rm: RefMut<'a, T>, rs: &RustSet) -> Self {
Self {
inner: rm,
rs: rs.clone_ref(py),
}
}
}
impl<'a, T> Drop for PyRefMut<'a, T> {
fn drop(&mut self) {
let gil = Python::acquire_gil();
let py = gil.python();
self.rs.decrease_leak_count(py, true);
}
}
Here, the drop
implementation removes the mutable reference through decrease_leak_count
, with mutable
set to true
.
This means that when our PyRefMut
goes out of scope, our reference count is correct and the mutability lock
is freed!
The final piece of the puzzle:
use std::ops::{Deref, DerefMut};
// ...
impl<'a, T> Deref for PyRefMut<'a, T> {
type Target = RefMut<'a, T>;
fn deref(&self) -> &Self::Target {
&self.inner
}
}
impl<'a, T> DerefMut for PyRefMut<'a, T> {
fn deref_mut(&mut self) -> &mut Self::Target {
&mut self.inner
}
}
This ensures that our PyRefMut
delegate all of its interface to its inner RefMut
, making it essentially a "fat pointer".
Future work
The only thing left to do to have complete support for garbage collection is to allow our RustSet
to hold Python objects as well. The documentation of rust-cpython
explains that the __traverse__
and __clear__
methods need to be implemented to dispatch the garbage collection to the inner objects.
As of now, the need has yet to present itself in the development of Mercurial... so we didn't do it. Sorry, maybe another time. :)
The actual code inside Mercurial is a bit different as we moved the refcounting and mutable borrow check inside a separate struct, and I've written a simple macro to automate parts of it, but I am not very happy with it yet; it's not as easy to use as I would like to and it's missing some features; leaking multiple attributes, automatic data attributes, method prefixing, etc.. I don't want to write a procedural macro since it's not battle tested yet.
If you have a better way of going about this problem, please let me know.