## January 21, 2017

### Planet Mozilla — Assigning blame to unsafe code

While I was at POPL the last few days, I was reminded of an idea regarding how to bring more struture to the unsafe code guidelines process that I’ve been kicking around lately, but which I have yet to write about publicly. The idea is fresh on my mind because while at POPL I realized that there is an interesting opportunity to leverage the “blame” calculation techniques from gradual typing research. But before I get to blame, let me back up and give some context.

### The guidelines should be executable

I’ve been thinking for some time that, whatever guidelines we choose, we need to adopt the principle that they should be automatically testable. By this I mean that we should be able to compile your program in a special mode (“sanitizer mode”) which adds in extra assertions and checks. These checks would dynamically monitor what your program does to see if it invokes undefined behavior: if they detect UB, then they will abort your program with an error message.

Plenty of sanitizers or sanitizer-like things exist for C, of course. My personal favorite is valgrind, but there are a number of other examples (the data-race detector for Go also falls in a similar category). However, as far as I know, none of the C sanitizers is able to detect the full range of undefined behavior. Partly this is because C UB includes untestable (and, in my opinion, overly aggressive) rules like “every loop should do I/O or terminate”. I think we should strive for a sound and complete sanitizer, meaning that we guarantee that if there is undefined behavior, we will find it, and that we have no false positives. We’ll see if that’s possible. =)

The really cool thing about having the rules be executable (and hopefully efficiently executable) is that, in the (paraphrased) words of John Regehr, it changes the problem of verifying safety from a formal one into a matter of test coverage, and the latter is much better understood. My ultimate goal is that, if you are the developer of an unsafe library, all you have to do is to run cargo test --unsafe (or some such thing), and all of the normal tests of your library will run but in a special sanitizer mode where any undefined behavior will be caught and flagged for you.

But I think there is one other important side-effect. I have been (and remain) very concerned about the problem of programmers not understanding (or even being aware of) the rules regarding correct unsafe code. This is why I originally wanted a system like the Tootsie Pop rules, where programmers have to learn as few things as possible. But having an easy and effective way of testing for violations changes the calculus here dramatically: I think we can likely get away with much more aggressive rules if we can test for violations. To play on John Regehr’s words, this changes the problem from being one of having to learn a bunch of rules to having to interpret error messages. But for this to work well, of course, the error messages have to be good. And that’s where this idea comes in.

### Proof of concept: miri

As it happens, there is an existing project that is already doing a limited form of the kind of checks I have in mind: miri, the MIR interpreter created by Scott Olson and now with significant contributions by Oliver Schneider. If you haven’t seen or tried miri, I encourage you to do so. It is very cool and surprisingly capable – in particular, miri can not only execute safe Rust, but also unsafe Rust (e.g., it is able to interpret the definition of Vec).

The way it does this is to simulate the machine at a reasonably low-level. So, for example, when you allocate memory, it stores that as a kind of blob of bytes of a certain size. But it doesn’t only store bytes; rather, it tracks additional metadata about what has been stored into various spots. For example, it knows whether memory has been initialized or not, and it knows which bits are pointers (which are stored opaquely, not with an actual address). This allows is to interpret a lot of unsafe code, but it also allows it to detect various kinds of errors.

### An example

fn main() {
let mut b = Box::new(22);
innocent_looking_fn(&b);
*b += 1;
}

fn innocent_looking_fn(b: &Box<usize>) {
// This wicked little bit of code will take a borrowed
// Box and free it.
unsafe {
let p: *const usize = &**b;
let q: Box<usize> = std::mem::transmute(p);
}
}


The problem here is that this “innocent looking function” claims to borrow the box b but it actually frees it. So now when main() comes along to execute *b += 1, the box b has been freed. This situation is often called a “dangling pointer” in C land. We might expect then that when you execute this program, something dramatic will happen, but that is not (necessarily) the case:

> rustc tests/dealloc.rs
> ./dealloc


As you can see, I got no error or any other indication that something went awry. This is because, internally, freeing the box just throws its address on a list for later re-use. Therefore when I later make use of that address, it’s entirely possible that the memory is still sitting there, waiting for me to use it, even if I’m not supposed to. This is part of what makes tracking down a “use after free” bug incredibly frustrating: oftentimes, nothing goes wrong! (Until it does.) It’s also why we need some kind of sanitizer mode that will do additional checks beyond what really happens at runtime.

### Detecting errors with miri

But what happens when I run this through miri?

> cargo run tests/dealloc.rs
Finished dev [unoptimized + debuginfo] target(s) in 0.2 secs
Running target/debug/miri tests/dealloc.rs
error: dangling pointer was dereferenced
--> tests/dealloc.rs:8:5
|
8 |     *b += 1;
|     ^^^^^^^
|
note: inside call to main
--> tests/dealloc.rs:5:1
|
5 |   fn main() {
|  _^ starting here...
6 | |     let mut b = Box::new(22);
7 | |     evil(&b);
8 | |     *b += 1;
9 | | }
| |_^ ...ending here

error: aborting due to previous error


(First, before going further, let’s just take a minute to be impressed by the fact that miri bothered to give us a nice stack trace here. I had heard good things about miri, but before I started poking at it for this blog post, I expected something a lot less polished. I’m impressed.)

You can see that, unlike the real computer, miri detected that *b was freed when we tried to access it. It was able to do this because when miri is interpreting your code, it does so with respect to a more abstract model of how a computer works. In particular, when memory is freed in miri, miri remembers that the address was freed, and if there is a later attempt to access it, an error is thrown. (This is very similar to what tools like valgrind and electric fence do as well.)

So even just using miri out of the box, we see that we are starting to get a certain amount of sanitizer rules. Whatever the unsafe code guidelines turn out to be, one can be sure that they will declare it illegal to access freed memory. As this example demonstrates, running your code through miri could help you detect a violation.

### Blame

This example also illustrates another interesting point about a sanitizer tool. The point where the error is detected is not necessarily telling you which bit of code is at fault. In this case, the error occurs in the safe code, but it seems clear that the fault lies in the unsafe block in innocent_looking_fn(). That function was supposed to present a safe interface, but it failed to do so. Unfortunately, for us to figure that out, we have to trawl through the code, executing backwards and trying to figure out how this freed pointer got into the variable b. Speaking as someone who has spent years of his life doing exactly that, I can tell you it is not fun. Anything we can do to get a more precise notion of what code is at fault would be tremendously helpful.

It turns out that there is a large body of academic work that I think could be quite helpful here. For some time, people have been exploring gradual typing systems. This is usually aimed at the software development process: people want to be able to start out with a dynamically typed bit of software, and then add types gradually. But it turns out when you do this, you have a similar problem: your statically typed code is guaranteed to be internally consistent, but the dynamically typed code might well feed it values of the wrong types. To address this, blame systems attempt to track where you crossed between the static and dynamic typing worlds so that, when an error occurs, the system can tell you which bit of code is at fault.

Traditionally this blame tracking has been done using proxies and other dynamic mechanisms, particularly around closures. For example, Jesse Tov’s Alms language allocated stateful proxies to allow for owned types to flow into a language that didn’t understand ownership (this is sort of roughly analogous to dynamically wrapping a value in a RefCell). Unfortunately, introducing proxies doesn’t seem like it would really work so well for a “no runtime” language like Rust. We could probably get away with it in miri, but it would never scale to running arbitrary C code.

Interestingly, at this year’s POPL, I saw a paper that seemed to present a solution to this problem. In Big types in little runtime, Michael Vitousek, Cameron Swords (ex-Rust intern!), and Jeremy Siek describe a system for doing gradual typing in Python that works even without modifying the Python runtime – this rules out proxies, because the runtime would have to know about them. Instead, the statically typed code keeps a log “on the side” which tracks transitions to and from the unsafe code and other important events. When a fault occurs, they can read this log and reconstruct which bit of code is at fault. This seems eminently applicable to this setting: we have control over the safe Rust code (which we are compiling in a special mode), but we don’t have to modify the unsafe code (which might be in Rust, but might also be in C). Exciting!

### Conclusion

This post has two purposes, in a way. First, I want to advocate for the idea that we should define the unsafe code guidelines in an executable way. Specifically, I think we should specify predicates that must hold at various points in the execution. In this post we saw a simple example: when you dereference a pointer, it must point to memory that has been allocated and not yet freed. (Note that this particular rule only applies to the moment at which the pointer is dereferenced; at other times, the pointer can have any value you want, though it may wind up being restricted by other rules.) It’s much more interesting to think about assertions that could be used to enforce Rust’s aliasing rules, but that’s a good topic for another post.

Probably the best way for us to do this is to start out with a minimal “operational semantics” for a representative subset of MIR (bascally a mathematical description of what MIR does) and then specify rules by adding side-clauses and conditions into that semantics. I have been talking to some people who might be interested in doing that, so I hope to see progress here.

That said, it may be that we can instead do this exploratory work by editing miri. The codebase seems pretty clean and capable, and a lot of the base work is done.

In the long term, I expect we will want to instead target a platform like valgrind, which would allow us to apply these rules even around to unsafe C code. I’m not sure if that’s really feasible, but it seems like the ideal.

The second purpose of the post is to note the connection with gradual typing and the opportunity to apply blame research to the problem. I am very excited about this, because I’ve always felt that guidelines based simply on undefined behavior were going to be difficult for people to use, since errors are are often detected in code that is quite disconnected from the origin of the problem.

### Planet Mozilla — An Overview of Asia Tech Conferences in 2017

I’ve been attending and even talking at tech conferences for some time. One of the challenge is to keep track of when those conference will take place. Also there is no single list of all conferences I’m interested. There are some website that collects them, but they often missed some community-organized events in Asia. Or there are some community-maintained list of open source conferences (Thanks Barney!), but they don’t include for-profit conferences.

Therefore I build a simple website that collects all conferences I know in Asia, focusing on open source software, web, and startup:

#The Technology Stack Since I don’t really need dynamic-generated content, I use the Jekyll static site generator. For the look and feel, I use the Material Design Lite (MDL) CSS framework. (I did try other material design frameworks like Materialize or MUI, but MDL is the most mature and clean one I can find.)

One of the challenge is to provide the list in different languages. I found a plugin-free way to make Jekyll support I18N (Internationalization). The essence is to create language specific sub-directories like en/index.md and zh_tw/index.md. Then put all language specific string in the index.md files. One pitfall is that by adding another level of directory, the relative paths (e.g. path to CSS and JS files) might not work, so you might need to use absolute path instead. For Traditional and Simplified Chinese translation, I’m too lazy to maintain two copy of the data. So I use a JavaScript snippet to do the translation on-the-fly.

### How to Contribute

If you know any conference, meetup or event that should be on the list, please feel free to drop and email to asia.confs@gmail.com. Or you can create a pull request or file and issue to our GitHub repo.

Enjoy the conferences and Happy Chinese New Year!

### Planet Mozilla — Adding CSP to bugzilla.mozilla.org

We're about to enable a Content Security Policy (CSP) on bugzilla.mozilla.org. CSP will mitigate several types of attack on our users and our site, including Cross-Site Request Forgery (XSRF) and Cross-Site Scripting (XSS).

The first place we're deploying this is in the bug detail page in the new Modal view (which, you may recall, we're making the default view) with a goal for the site to have complete CSP coverage.

As a side-effect of this work, CSP may break add-ons that modify the bug detail page. If we have broken something of yours, we can quickly fix it. We're already enabling the Socorro Lens add-on. You can see how that was addressed.

WebExtensions can modify the DOM of a bug detail page through content.js. Add-ons and WebExtentions will not be able to load resources from third parties into the bug detail page unless we make an exception for you.

Long term, if you have a feature from an add-on you'd like to make part of BMO, please seek me out on irc://irc.mozilla.org/bteam or open a new ticket in the bugzilla.mozilla.org product in Bugzilla and set the severity to 'enhancement'.

ETA: clarify what an add-on or WebExtension is allowed to do. Thanks to the WebExtensions team for answering questions on IRC tonight.

### Planet Mozilla — 45.7.0 available for realsies

Let's try that again. TenFourFox 45.7.0 is now available for testing ... again (same download location, same release notes, new hashes), and as before, will go live late Monday evening if I haven't been flooded out of my house by the torrential rains we've been getting in currently-not-so-Sunny So Cal. You may wish to verify you got the correct version by manually checking the hash on the off-chance the mirrors are serving the old binaries.

## January 20, 2017

### Planet Mozilla — Migrating to WebExtensions: port your stored data

WebExtensions are the new standard for add-on development in Firefox, and will be the only supported type of extension in release versions of Firefox later this year. Starting in Firefox 57, which is scheduled to arrive in November 2017, extensions other than WebExtensions will not load, and developers should be preparing to migrate their legacy extensions to WebExtensions.

If you have a legacy extension that writes data to the filesystem, and you’re planning to port it to WebExtensions, Embedded WebExtensions are available now in Firefox 51 to help you transition. Embedded WebExtensions can be used to transfer the stored data of your add-on to a format that can be used by WebExtensions. This is essential because it lets you to convert your users without the need for them to take any actions.

### What is an Embedded WebExtension?

An Embedded WebExtension is an extension that combines two types of extensions in one, by incorporating a WebExtension inside of a bootstrapped or SDK extension.

### Why use an Embedded WebExtension?

There are attributes (functions) of legacy add-ons that are used to store information related to the add-on that are not available in WebExtensions. Examples of these functions include user preferences, arbitrary file system access for storing assets, configuration information, stateful information, and others. If your add-on makes use of functionality like these to store information, you can use an Embedded WebExtension to access your legacy add-on data and move it over to a WebExtension. The earlier you do this, the more likely all your users will transition over smoothly.

It’s important to emphasize that Embedded WebExtensions are intended to be a transition tool, and will not be supported past Firefox 57. They should not be used for add-ons that are not expected to transition to WebExtensions.

### How do I define an Embedded WebExtension?

https://github.com/mdn/webextensions-examples/tree/master/embedded-webextension-sdk

### Planet Mozilla — Nightlies in TaskCluster - go team!

As catlee has already mentioned, yesterday we shipped the first nightly builds for Linux and Android off our next-gen Mozilla continuous integration (CI) system known as TaskCluster. I eventually want to talk more about why this important and how we got to here, but for now I’d like to highlight some of the people who made this possible.

Thanks to Aki’s meticulous work planning and executing on a new chain of trust (CoT) model, the nightly builds we now ship on TaskCluster are arguably more secure than our betas and releases. Don’t worry though, we’re hard at work porting the chain of trust to our release pipeline. Jordan and Mihai tag-teamed the work to get the chain-of-trust-enabled workers doing important things like serving updates and putting binaries in the proper spots. Kim did the lion’s share of the work getting our task graphs sorted to tie together the disparate pieces. Callek wrangled all of the l10n bits. On the testing side, gbrown did some heroic work getting reliable test images setup for our Linux platforms. Finally, I’d be remiss if I didn’t also call out Dustin who kept us all on track with his migration tracker and who provided a great deal of general TaskCluster platform support.

Truly it was a team effort, and thanks to all of you for making this particular milestone happen. Onward to Mac, Windows, and release promotion!

### Planet Mozilla — Communicating the Dangers of Non-Secure HTTP

HTTPS, the secure variant of the HTTP protocol, has long been a staple of the modern Web. It creates secure connections by providing authentication and encryption between a browser and the associated web server. HTTPS helps keep you safe from eavesdropping and tampering when doing everything from online banking to communicating with your friends. This is important because over a regular HTTP connection, someone else on the network can read or modify the website before you see it, putting you at risk.

To keep users safe online, we would like to see all developers use HTTPS for their websites. Using HTTPS is now easier than ever. Amazing progress in HTTPS adoption has been made, with a substantial portion of web traffic now secured by HTTPS:

Changes to Firefox security user experience
Up until now, Firefox has used a green lock icon in the address bar to indicate when a website is using HTTPS and a neutral indicator (no lock icon) when a website is not using HTTPS. The green lock icon indicates that the site is using a secure connection.

Current secure (HTTPS) connection

Current non-secure (HTTP) connection

In order to clearly highlight risk to the user, starting this month in Firefox 51 web pages which collect passwords but don’t use HTTPS will display a grey lock icon with a red strike-through in the address bar.

Clicking on the “i” icon, will show the text, “Connection is Not Secure” and “Logins entered on this page could be compromised”.

This has been the user experience in Firefox Dev Edition since January 2016. Since then, the percentage of login forms detected by Firefox that are fully secured with HTTPS has increased from nearly 40% to nearly 70%, and the number of HTTPS pages overall has also increased by 10%, as you can see in the graph above.

In upcoming releases, Firefox will show an in-context message when a user clicks into a username or password field on a page that doesn’t use HTTPS.  That message will show the same grey lock icon with red strike-through, accompanied by a similar message, “This connection is not secure. Logins entered here could be compromised.”:

In-context warning for a password field on a page that doesn’t use HTTPS

What to expect in the future
To continue to promote the use of HTTPS and properly convey the risks to users, Firefox will eventually display the struck-through lock icon for all pages that don’t use HTTPS, to make clear that they are not secure. As our plans evolve, we will continue to post updates but our hope is that all developers are encouraged by these changes to take the necessary steps to protect users of the Web through HTTPS.

Thanks!
Thank you to the engineering, user experience, user research, quality assurance, and product teams that helped make this happen – Sean Lee, Tim Guan-tin Chien, Paolo Amadini, Johann Hofmann, Jonathan Kingston, Dale Harvey, Ryan Feeley, Philipp Sackl, Tyler Downer, Adrian Florinescu, and Richard Barnes. And a very special thank you to Matthew Noorenberghe, without whom this would not have been possible.

### Planet Mozilla — What is participation design anyway?

As part of our insights phase for Diversity & Inclusion for Participation at Mozilla, we’ve identified ‘Participation Design’ as being as one of 5 important topics for focus group discussion.  Here is how I describe Participation Design (and thanks to Paul) for the question:

Participation design is the framework(s) we use to generate contribution opportunities that empower volunteers to ….

• Recognize, and embrace and personalize the opportunity of lending time and skills to a project at Mozilla – technical and non-technical.
• Understand the steps they need to take to be successful and engaged at a very basic level.  (task trackers, chat rooms, blogs, newsletters, wikis).
• Complete a contribution  with  success on project goals, and value to the volunteer.
• Grow in skills, knowledge and influence as community members, and leaders/mobilizers at Mozilla and in the broader open source community.

In our focus group for this topic, we’ll explore from both contributor and maintainer perspectives – what it means to design for participation for diversity, equality and inclusion.   If you want to know more about how focus groups work – here’s a great resource.

If you, or someone you know from Mozilla past, present or future has insights, experience and vision for inclusive participation design. Please nominate them!  (and select the topic ‘Participation Design’).

### Planet Mozilla — Webdev Beer and Tell: January 2017

Once a month web developers across the Mozilla community get together (in person and virtually) to share what cool stuff we've been working on in...

### Planet WebKit — Introducing Riptide:WebKit’s Retreating Wavefront Concurrent Garbage Collector

As of r209827, 64-bit ARM and x86 WebKit ports use a new garbage collector called Riptide. Riptide reduces worst-case pause times by allowing the app to run concurrently to the collector. This can make a big difference for responsiveness since garbage collection can easily take 10 ms or more, even on fast hardware. Riptide improves WebKit’s performance on the JetStream/splay-latency test by 5x, which leads to a 5% improvement on JetStream. Riptide also improves our Octane performance. We hope that Riptide will help to reduce the severity of GC pauses for many different kinds of applications.

This post begins with a brief background about concurrent GC (garbage collection). Then it describes the Riptide algorithm in detail, including the mature WebKit GC foundation, on which it is built. The field of incremental and concurrent GC goes back a long time and WebKit is not the first system to use it, so this post has a section about how Riptide fits into the related work. This post concludes with performance data.

## Introduction

Garbage collection is expensive. In the worst case, for the collector to free a single object, it needs to scan the entire heap to ensure that no objects have any references to the one it wants to free. Traditional collectors scan the entire heap periodically, and this is roughly how WebKit’s collector has worked since the beginning.

The problem with this approach is that the GC pause can be long enough to cause rendering loops to miss frames, or in some cases it can even take so long as to manifest as a spin. This is a well-understood computer science problem. The originally proposed solution for janky GC pauses, by Guy Steele in 1975, was to have one CPU run the app and another CPU run the collector. This involves gnarly race conditions that Steele solved with a bunch of locks. Later algorithms like Baker’s were incremental: they assumed that there was one CPU, and sometimes the application would call into the collector but only for bounded increments of work. Since then, a huge variety of incremental and concurrent techniques have been explored. Incremental collectors avoid some synchronization overhead, but concurrent collectors scale better. Modern concurrent collectors like DLG (short for Doligez, Leroy, Gonthier, published in POPL ’93 and ’94) have very cheap synchronization and almost completely avoid pausing the application. Taking garbage collection off-core rather than merely shortening the pauses is the direction we want to take in WebKit, since almost all of the devices WebKit runs on have more than one core.

The goal of WebKit’s new Riptide concurrent GC is to achieve a big reduction in GC pauses by running most of the collector off the main thread. Because Riptide will be our always-on default GC, we also want it to be as efficient — in terms of speed and memory — as our previous collector.

## The Riptide Algorithm

The Riptide collector combines:

• Marking: The collector marks objects as it finds references to them. Objects not marked are deleted. Most of the collector’s time is spent visiting objects to find references to other objects.
• Constraints: The collector allows the runtime to supply additional constraints on when objects should be marked, to support custom object lifetime rules.
• Parallelism: Marking is parallelized on up to eight logical CPUs. (We limit to eight because we have not optimized it for more CPUs.)
• Generations: The collector lets the mark state of objects stick if memory is plentiful, allowing the next collection to skip visiting those objects. Sticky mark bits are a common way of implementing generational collection without copying. Collection cycles that let mark bits stick are called eden collections in WebKit.
• Concurrency: Most of the collector’s marking phase runs concurrently to the program. Because this is by far the longest part of collection, the remaining pauses tend to be 1 ms or less. Riptide’s concurrency features kick in for both eden and full collections.
• Conservatism: The collector scans the stack and registers conservatively, that is, checking each word to see if it is in the bounds of some object and then marking it if it is. This means that all of the C++, assembly, and just-in-time (JIT) compiler-generated code in our system can store heap pointers in local variables without any hassles.
• Efficiency: This is our always-on garbage collector. It has to be fast.

This section describes how the collector works. The first part of the algorithm description focuses on the WebKit mark-sweep algorithm on which Riptide is based. Then we dive into concurrency and how Riptide manages to walk the heap while the heap is in flux.

### Efficient Mark-Sweep

Riptide retains most of the basic architecture of WebKit’s mature garbage collection code. This section gives an overview of how our mark-sweep collector works: WebKit uses a simple segregated storage heap structure. The DOM, the Objective-C API, the type inference runtime, and the compilers all introduce custom marking constraints, which the GC executes to fixpoint. Marking is done in parallel to maximize throughput. Generational collection is important, so WebKit implements it using sticky mark bits. The collector uses conservative stack scanning to ease integration with the rest of WebKit.

#### Simple Segregated Storage

WebKit has long used the simple segregated storage heap structure for small and medium-sized objects (up to about 8KB):

• Small and medium-sized objects are allocated from segregated free lists. Given a desired object size, we perform a table lookup to find the appropriate free list and then pop the first object from this list. The lookup table is usually constant-folded by the compiler.
• Memory is divided into 16KB blocks. Each block contains cells. All cells in a block have the same cell size, called the block’s size class. In WebKit jargon, an object is a cell whose JavaScript type is “object”. For example, a string is a cell but not an object. The GC literature would typically use object to refer to what our code would call a cell. Since this post is not really concerned with JavaScript types, we’ll use the term object to mean any cell in our heap.
• At any time, the active free list for a size class contains only objects from a single block. When we run out of objects in a free list, we find the next block in that size class and sweep it to give it a free list.

Sweeping is incremental in the sense that we only sweep a block just before allocating in it. In WebKit, we optimize sweeping further with a hybrid bump-pointer/free-list allocator we call bump’n’pop (here it is in C++ and in the compilers). A per-block bit tells the sweeper if the block is completely empty. If it is, the sweeper will set up a bump-pointer arena over the whole block rather than constructing a free-list. Bump-pointer arenas can be set up in O(1) time while building a free-list is a O(n) operation. Bump’n’pop achieves a big speed-up on programs that allocate a lot because it avoids the sweep for totally-empty blocks. Bump’n’pop’s bump-allocator always bumps by the block’s cell size to make it look like the objects had been allocated from the free list. This preserves the block’s membership in its size class.

Large objects (larger than about 8KB) are allocated using malloc.

#### Constraint-Based Marking

Garbage collection is ordinarily a graph search problem and the heap is ordinarily just a graph: the roots are the local variables, their values are directional edges that point to objects, and those objects have fields that each create edges to some other objects. WebKit’s garbage collector also allows the DOM, compiler, and type inference system to install constraint callbacks. These constraints are allowed to query which objects are marked and they are allowed to mark objects. The WebKit GC algorithm executes these constraints to fixpoint. GC termination happens when all marked objects have been visited and none of the constraints want to mark anymore objects. In practice, the constraint-solving part of the fixpoint takes up a tiny fraction of the total time. Most of the time in GC is spent performing a depth-first search over marked objects that we call draining.

#### Parallel Draining

Draining takes up most of the collector’s time. One of our oldest collector optimizations is that draining is parallelized. The collector has a draining thread on each CPU. Each draining thread has its own worklist of objects to visit, and ordinarily it runs a graph search algorithm that only sees this worklist. Using a local worklist means avoiding worklist synchronization most of the time. Each draining thread will check in with a global worklist under these conditions:

• It runs out of work. When a thread runs out of work, it will try to steal 1/Nth of the global worklist where N is the number of idle draining threads. This means acquiring the global worklist’s lock.
• Every 100 objects visited, the draining thread will consider donating about half of its worklist to the global worklist. It will only do this if the global worklist is empty, the global worklist lock can be acquired without blocking, and the local worklist has at least two entries.

This algorithm appears to scale nicely to about eight cores, which is good enough for the kinds of systems that WebKit usually runs on.

Draining in parallel means having to synchronize marking. Our marking algorithm uses a lock-free CAS (atomic compare-and-swap instruction) loop to set mark bits.

#### Sticky Mark Bits

Generational garbage collection is a classic throughput optimization first introduced by Lieberman and Hewitt and Ungar. It assumes that objects that are allocated recently are unlikely to survive. Therefore, focusing the collector on objects that were allocated since the last GC is likely to free up lots of memory — almost as much as if we collected the whole heap. Generational collectors track the generation of objects: either young or old. Generational collectors have (at least) two modes: eden collection that only collects young objects and full collection that collects all objects. During an eden collection, old objects are only visited if they are suspected to contain pointers to new objects.

Generational collectors need to overcome two hurdles: how to track the generation of objects, and how to figure out which old objects have pointers to new objects.

The collector needs to know the generation of objects in order to determine which objects can be safely ignored during marking. In a traditional generational collector, eden collections move objects and then use the object’s address to determine its generation. Our collector does not move objects. Instead, it uses the mark bit to also track generation. Quite simply, we don’t clear any mark bits at the start of an eden collection. The marking algorithm will already ignore objects that have their mark bits set. This is called sticky mark bit generational garbage collection.

The collector will avoid visiting old objects during an eden collection. But it cannot avoid all of them: if an old object has pointers to new objects, then the collector needs to know to visit that old object. We use a write barrier — a small piece of instrumentation that executes after every write to an object — that tells the GC about writes to old objects. In order to cheaply know which objects are old, the object header also has a copy of the object’s state: either it is old or it is new. Objects are allocated new and labeled old when marked. When the write barrier detects a write to an old object, we tell the GC by setting the object’s state to old-but-remembered and putting it on the mark stack. We use separate mark stacks for objects marked by the write barrier, so when we visit the object, we know whether we are visiting it due to the barrier or because of normal marking (i.e. for the first time). Some accounting only needs to happen when visiting the object for the first time. The complete barrier is simply:

object->field = newValue;
if (object->cellState == Old)
remember(object);


Generational garbage collection is an enormous improvement in performance on programs that allocate a lot, which is common in JavaScript. Many new JavaScript features, like iterators, arrow functions, spread, and for-of allocate lots of objects and these objects die almost immediately. Generational GC means that our collector does not need to visit all of the old objects just to delete the short-lived garbage.

#### Conservative Roots

Garbage collection begins by looking at local variables and some global state to figure out the initial set of marked objects. Introspecting the values of local variables is tricky. WebKit uses C++ local variables for pointers to the garbage collector’s heap, but C-like languages provide no facility for precisely introspecting the values of specific variables of arbitrary stack frames. WebKit solves this problem by marking objects conservatively when scanning roots. We use the simple segregated storage heap structure in part because it makes it easy to ask whether an arbitrary bit pattern could possibly be a pointer to some object.

We view this as an important optimization. Without conservative root scanning, C++ code would have to use some API to notify the collector about what objects it points to. Conservative root scanning means not having to do any of that work.

#### Mark-Sweep Summary

Riptide implements complex notions of reachability via arbitrary constraint callbacks and allows C++ code to manipulate objects directly. For performance, it parallelizes marking and uses generations to reduce the average amount of marking work.

### Handling Concurrency

Riptide makes the draining phase of garbage collection concurrent. This works because of a combination of concurrency features:

• Riptide is able to stop the world for certain tricky operations like stack scanning and DOM constraint solving.
• Riptide uses a retreating wavefront write barrier to manage races between marking and object mutation. Using retreating wavefront allows us to avoid any impedance mismatch between generational and concurrent collector optimizations.
• Retreating wavefront collectors can suffer from the risk of GC death spirals, so Riptide uses a space-time scheduler to put that in check.
• Visiting an object while it is being reshaped is particularly hard, and WebKit reshapes objects as part of type inference. We use an obstruction-free double collect snapshot to ensure that the collector never marks garbage memory due to a visit-reshape race.
• Lots of objects have tricky races that aren’t on the critial path, so we put a fast, adaptive, and fair lock in every JavaScript object as a handy way to manage them. It fits in two otherwise unused bits.

While we wrote Riptide for WebKit, we suspect that the underlying intuitions could be useful for anyone wanting to write a concurrent, generational, parallel, conservative, and non-copying collector. This section describes Riptide in detail.

#### Stopping The World and Safepoints

Riptide does draining concurrently. It is a goal to eventually make other phases of the collector concurrent as well. But so long as some phases are not safe to run concurrently, we need to be able to bring the application to a stop before performing those phases. The place where the collector stops needs to be picked so as to avoid reentrancy issues: for example stopping to run the GC in the middle of the GC’s allocator would create subtle problems. The concurrent GC avoids these problems by only stopping the application at those points where the application would trigger a GC. We call these safepoints. When the collector brings the application to a safepoint, we say that it is stopping the world.

Riptide currently stops the world for most of the constraint fixpoint, and resumes the world for draining. After draining finishes, the world is again stopped. A typical collection cycle may have many stop-resume cycles.

#### Retreating Wavefront

Draining concurrently means that just as we finish visiting some object, the application may store to one of its fields. We could store a pointer to an unmarked object into an object that is already visited, in which case the collector might never find that unmarked object. If we don’t do something about this, the collector would be sure to prematurely delete objects due to races with the application. Concurrent garbage collectors avoid this problem using write barriers. This section describes Riptide’s write barrier.

Write barriers ensure that the state of the collector is still valid after any race, either by marking objects or by having objects revisited (GC Handbook, chapter 15). Marking objects helps the collector make forward progress; intuitively, it is like advancing the collector’s wavefront. Having objects revisited retreats the wavefront. The literature of full of concurrent GC algorithms, like the Metronome, C4, and DLG, that all use some kind of advancing wavefront write barrier. The simplest such barrier is Dijkstra’s, which marks objects anytime a reference to them is created. I used these kinds of barriers in my past work because they make it easy to make the collector very deterministic. Adding one of those barriers to WebKit would be likely to create some performance overhead since this means adding new code to every write to the heap. But the retreating wavefront barrier, originally invented by Guy Steele in 1975, works on exactly the same principle as our existing generational barrier. This allows Riptide to achieve zero barrier overhead by reusing WebKit’s existing barrier.

It’s easiest to appreciate the similarity by looking at some barrier code. Our old generational barrier looked like this:

object->field = newValue;
if (object->cellState == Old)
remember(object);


Steele’s retreating wavefront barrier looks like this:

object->field = newValue;
if (object->cellState == Black)
revisit(object);


Retreating wavefront barriers operate on the same principle as generational barriers, so it’s possible to use the same barrier for both. The only difference is the terminology. The black state means that the collector has already visited the object. This barrier tells the collector to revisit the object if its cellState tells us that the collector had already visited it. This state is part of the classic tri-color abstraction: white means that the GC hasn’t marked the object, grey means that the object is marked and on the mark stack, and black means that the object is marked and has been visited (so is not on the mark stack anymore). In Riptide, the tri-color states that are relevant to concurrency (white, grey, black) perfectly overlap with the sticky mark-bit states that are relevant to generations (new, remembered, old). The Riptide cell states are as follows:

• DefinitelyWhite: the object is new and white.
• PossiblyGrey: the object is grey, or remembered, or new and white.
• PossiblyBlack: the object is black and old, or grey, or remembered, or new and white.

A naive combination generational/concurrent barrier might look like this:

object->field = newValue;
if (object->cellState == PossiblyBlack)
slowPath(object);


This turns out to need tweaking to work. The PossiblyBlack state is too ambiguous, so the slowPath needs additional logic to work out what the object’s state really was. Also, the order of execution matters: the CPU must run the object->cellState load after it runs the object->field store. That’s hard, since CPUs don’t like to obey store-before-load orderings. Finally, we need to guarantee that the barrier cannot retreat the wavefront too much.

##### Disambiguating Object State

The GC uses the combination of the object’s mark bit in the block header and the cellState byte in the object’s header to determine the object’s state. The GC clears mark bits at the start of full collection, and it sets the cellState during marking and barriers. It doesn’t reset objects’ cellStates back to DefinitelyWhite at the start of a full collection, because it’s possible to infer that the cellState should have been reset by looking at the mark bit. It’s important that the collector never scans the heap to clear marking state, and even mark bits are logically cleared using versioning. If an object is PossiblyBlack or PossiblyGrey and its mark bit is logically clear, then this means that the object is really white. Riptide’s barrier slowPath is almost like our old generational slow path but it has a new check: it will not do anything if the mark bit of the target object is not set, since this means that we’re in the middle of a GC and the object is actually white. Additionally, the barrier will attempt to set the object back to DefinitelyWhite so that the slowPath path does not have to see the object again (at least not until it’s marked and visited).