Planet MozillaAssigning blame to unsafe code

While I was at POPL the last few days, I was reminded of an idea regarding how to bring more struture to the unsafe code guidelines process that I’ve been kicking around lately, but which I have yet to write about publicly. The idea is fresh on my mind because while at POPL I realized that there is an interesting opportunity to leverage the “blame” calculation techniques from gradual typing research. But before I get to blame, let me back up and give some context.

The guidelines should be executable

I’ve been thinking for some time that, whatever guidelines we choose, we need to adopt the principle that they should be automatically testable. By this I mean that we should be able to compile your program in a special mode (“sanitizer mode”) which adds in extra assertions and checks. These checks would dynamically monitor what your program does to see if it invokes undefined behavior: if they detect UB, then they will abort your program with an error message.

Plenty of sanitizers or sanitizer-like things exist for C, of course. My personal favorite is valgrind, but there are a number of other examples (the data-race detector for Go also falls in a similar category). However, as far as I know, none of the C sanitizers is able to detect the full range of undefined behavior. Partly this is because C UB includes untestable (and, in my opinion, overly aggressive) rules like “every loop should do I/O or terminate”. I think we should strive for a sound and complete sanitizer, meaning that we guarantee that if there is undefined behavior, we will find it, and that we have no false positives. We’ll see if that’s possible. =)

The really cool thing about having the rules be executable (and hopefully efficiently executable) is that, in the (paraphrased) words of John Regehr, it changes the problem of verifying safety from a formal one into a matter of test coverage, and the latter is much better understood. My ultimate goal is that, if you are the developer of an unsafe library, all you have to do is to run cargo test --unsafe (or some such thing), and all of the normal tests of your library will run but in a special sanitizer mode where any undefined behavior will be caught and flagged for you.

But I think there is one other important side-effect. I have been (and remain) very concerned about the problem of programmers not understanding (or even being aware of) the rules regarding correct unsafe code. This is why I originally wanted a system like the Tootsie Pop rules, where programmers have to learn as few things as possible. But having an easy and effective way of testing for violations changes the calculus here dramatically: I think we can likely get away with much more aggressive rules if we can test for violations. To play on John Regehr’s words, this changes the problem from being one of having to learn a bunch of rules to having to interpret error messages. But for this to work well, of course, the error messages have to be good. And that’s where this idea comes in.

Proof of concept: miri

As it happens, there is an existing project that is already doing a limited form of the kind of checks I have in mind: miri, the MIR interpreter created by Scott Olson and now with significant contributions by Oliver Schneider. If you haven’t seen or tried miri, I encourage you to do so. It is very cool and surprisingly capable – in particular, miri can not only execute safe Rust, but also unsafe Rust (e.g., it is able to interpret the definition of Vec).

The way it does this is to simulate the machine at a reasonably low-level. So, for example, when you allocate memory, it stores that as a kind of blob of bytes of a certain size. But it doesn’t only store bytes; rather, it tracks additional metadata about what has been stored into various spots. For example, it knows whether memory has been initialized or not, and it knows which bits are pointers (which are stored opaquely, not with an actual address). This allows is to interpret a lot of unsafe code, but it also allows it to detect various kinds of errors.

An example

Let’s start with a simple example of some bogus unsafe code.

fn main() {
    let mut b = Box::new(22);
    innocent_looking_fn(&b);
    *b += 1;
}

fn innocent_looking_fn(b: &Box<usize>) {
    // This wicked little bit of code will take a borrowed
    // `Box` and free it.
    unsafe {
        let p: *const usize = &**b;
        let q: Box<usize> = std::mem::transmute(p);
    }
}

The problem here is that this “innocent looking function” claims to borrow the box b but it actually frees it. So now when main() comes along to execute *b += 1, the box b has been freed. This situation is often called a “dangling pointer” in C land. We might expect then that when you execute this program, something dramatic will happen, but that is not (necessarily) the case:

> rustc tests/dealloc.rs
> ./dealloc

As you can see, I got no error or any other indication that something went awry. This is because, internally, freeing the box just throws its address on a list for later re-use. Therefore when I later make use of that address, it’s entirely possible that the memory is still sitting there, waiting for me to use it, even if I’m not supposed to. This is part of what makes tracking down a “use after free” bug incredibly frustrating: oftentimes, nothing goes wrong! (Until it does.) It’s also why we need some kind of sanitizer mode that will do additional checks beyond what really happens at runtime.

Detecting errors with miri

But what happens when I run this through miri?

> cargo run tests/dealloc.rs
    Finished dev [unoptimized + debuginfo] target(s) in 0.2 secs
     Running `target/debug/miri tests/dealloc.rs`
error: dangling pointer was dereferenced
 --> tests/dealloc.rs:8:5
  |
8 |     *b += 1;
  |     ^^^^^^^
  |
note: inside call to main
 --> tests/dealloc.rs:5:1
  |
5 |   fn main() {
  |  _^ starting here...
6 | |     let mut b = Box::new(22);
7 | |     evil(&b);
8 | |     *b += 1;
9 | | }
  | |_^ ...ending here

error: aborting due to previous error

(First, before going further, let’s just take a minute to be impressed by the fact that miri bothered to give us a nice stack trace here. I had heard good things about miri, but before I started poking at it for this blog post, I expected something a lot less polished. I’m impressed.)

You can see that, unlike the real computer, miri detected that *b was freed when we tried to access it. It was able to do this because when miri is interpreting your code, it does so with respect to a more abstract model of how a computer works. In particular, when memory is freed in miri, miri remembers that the address was freed, and if there is a later attempt to access it, an error is thrown. (This is very similar to what tools like valgrind and electric fence do as well.)

So even just using miri out of the box, we see that we are starting to get a certain amount of sanitizer rules. Whatever the unsafe code guidelines turn out to be, one can be sure that they will declare it illegal to access freed memory. As this example demonstrates, running your code through miri could help you detect a violation.

Blame

This example also illustrates another interesting point about a sanitizer tool. The point where the error is detected is not necessarily telling you which bit of code is at fault. In this case, the error occurs in the safe code, but it seems clear that the fault lies in the unsafe block in innocent_looking_fn(). That function was supposed to present a safe interface, but it failed to do so. Unfortunately, for us to figure that out, we have to trawl through the code, executing backwards and trying to figure out how this freed pointer got into the variable b. Speaking as someone who has spent years of his life doing exactly that, I can tell you it is not fun. Anything we can do to get a more precise notion of what code is at fault would be tremendously helpful.

It turns out that there is a large body of academic work that I think could be quite helpful here. For some time, people have been exploring gradual typing systems. This is usually aimed at the software development process: people want to be able to start out with a dynamically typed bit of software, and then add types gradually. But it turns out when you do this, you have a similar problem: your statically typed code is guaranteed to be internally consistent, but the dynamically typed code might well feed it values of the wrong types. To address this, blame systems attempt to track where you crossed between the static and dynamic typing worlds so that, when an error occurs, the system can tell you which bit of code is at fault.

Traditionally this blame tracking has been done using proxies and other dynamic mechanisms, particularly around closures. For example, Jesse Tov’s Alms language allocated stateful proxies to allow for owned types to flow into a language that didn’t understand ownership (this is sort of roughly analogous to dynamically wrapping a value in a RefCell). Unfortunately, introducing proxies doesn’t seem like it would really work so well for a “no runtime” language like Rust. We could probably get away with it in miri, but it would never scale to running arbitrary C code.

Interestingly, at this year’s POPL, I saw a paper that seemed to present a solution to this problem. In Big types in little runtime, Michael Vitousek, Cameron Swords (ex-Rust intern!), and Jeremy Siek describe a system for doing gradual typing in Python that works even without modifying the Python runtime – this rules out proxies, because the runtime would have to know about them. Instead, the statically typed code keeps a log “on the side” which tracks transitions to and from the unsafe code and other important events. When a fault occurs, they can read this log and reconstruct which bit of code is at fault. This seems eminently applicable to this setting: we have control over the safe Rust code (which we are compiling in a special mode), but we don’t have to modify the unsafe code (which might be in Rust, but might also be in C). Exciting!

Conclusion

This post has two purposes, in a way. First, I want to advocate for the idea that we should define the unsafe code guidelines in an executable way. Specifically, I think we should specify predicates that must hold at various points in the execution. In this post we saw a simple example: when you dereference a pointer, it must point to memory that has been allocated and not yet freed. (Note that this particular rule only applies to the moment at which the pointer is dereferenced; at other times, the pointer can have any value you want, though it may wind up being restricted by other rules.) It’s much more interesting to think about assertions that could be used to enforce Rust’s aliasing rules, but that’s a good topic for another post.

Probably the best way for us to do this is to start out with a minimal “operational semantics” for a representative subset of MIR (bascally a mathematical description of what MIR does) and then specify rules by adding side-clauses and conditions into that semantics. I have been talking to some people who might be interested in doing that, so I hope to see progress here.

That said, it may be that we can instead do this exploratory work by editing miri. The codebase seems pretty clean and capable, and a lot of the base work is done.

In the long term, I expect we will want to instead target a platform like valgrind, which would allow us to apply these rules even around to unsafe C code. I’m not sure if that’s really feasible, but it seems like the ideal.

The second purpose of the post is to note the connection with gradual typing and the opportunity to apply blame research to the problem. I am very excited about this, because I’ve always felt that guidelines based simply on undefined behavior were going to be difficult for people to use, since errors are are often detected in code that is quite disconnected from the origin of the problem.

Planet MozillaAn Overview of Asia Tech Conferences in 2017

I’ve been attending and even talking at tech conferences for some time. One of the challenge is to keep track of when those conference will take place. Also there is no single list of all conferences I’m interested. There are some website that collects them, but they often missed some community-organized events in Asia. Or there are some community-maintained list of open source conferences (Thanks Barney!), but they don’t include for-profit conferences.

Therefore I build a simple website that collects all conferences I know in Asia, focusing on open source software, web, and startup:

https://asia-confs.github.io/asia-tech-confs/

#The Technology Stack Since I don’t really need dynamic-generated content, I use the Jekyll static site generator. For the look and feel, I use the Material Design Lite (MDL) CSS framework. (I did try other material design frameworks like Materialize or MUI, but MDL is the most mature and clean one I can find.)

One of the challenge is to provide the list in different languages. I found a plugin-free way to make Jekyll support I18N (Internationalization). The essence is to create language specific sub-directories like en/index.md and zh_tw/index.md. Then put all language specific string in the index.md files. One pitfall is that by adding another level of directory, the relative paths (e.g. path to CSS and JS files) might not work, so you might need to use absolute path instead. For Traditional and Simplified Chinese translation, I’m too lazy to maintain two copy of the data. So I use a JavaScript snippet to do the translation on-the-fly.

How to Contribute

If you know any conference, meetup or event that should be on the list, please feel free to drop and email to asia.confs@gmail.com. Or you can create a pull request or file and issue to our GitHub repo.

Enjoy the conferences and Happy Chinese New Year!

Planet MozillaAdding CSP to bugzilla.mozilla.org

We're about to enable a Content Security Policy (CSP) on bugzilla.mozilla.org. CSP will mitigate several types of attack on our users and our site, including Cross-Site Request Forgery (XSRF) and Cross-Site Scripting (XSS).

The first place we're deploying this is in the bug detail page in the new Modal view (which, you may recall, we're making the default view) with a goal for the site to have complete CSP coverage.

As a side-effect of this work, CSP may break add-ons that modify the bug detail page. If we have broken something of yours, we can quickly fix it. We're already enabling the Socorro Lens add-on. You can see how that was addressed.

WebExtensions can modify the DOM of a bug detail page through content.js. Add-ons and WebExtentions will not be able to load resources from third parties into the bug detail page unless we make an exception for you.

Long term, if you have a feature from an add-on you'd like to make part of BMO, please seek me out on irc://irc.mozilla.org/bteam or open a new ticket in the bugzilla.mozilla.org product in Bugzilla and set the severity to 'enhancement'.

ETA: clarify what an add-on or WebExtension is allowed to do. Thanks to the WebExtensions team for answering questions on IRC tonight.



comment count unavailable comments

Planet Mozilla45.7.0 available for realsies

Let's try that again. TenFourFox 45.7.0 is now available for testing ... again (same download location, same release notes, new hashes), and as before, will go live late Monday evening if I haven't been flooded out of my house by the torrential rains we've been getting in currently-not-so-Sunny So Cal. You may wish to verify you got the correct version by manually checking the hash on the off-chance the mirrors are serving the old binaries.

Planet MozillaMigrating to WebExtensions: port your stored data

WebExtensions are the new standard for add-on development in Firefox, and will be the only supported type of extension in release versions of Firefox later this year. Starting in Firefox 57, which is scheduled to arrive in November 2017, extensions other than WebExtensions will not load, and developers should be preparing to migrate their legacy extensions to WebExtensions.

If you have a legacy extension that writes data to the filesystem, and you’re planning to port it to WebExtensions, Embedded WebExtensions are available now in Firefox 51 to help you transition. Embedded WebExtensions can be used to transfer the stored data of your add-on to a format that can be used by WebExtensions. This is essential because it lets you to convert your users without the need for them to take any actions.

What is an Embedded WebExtension?

An Embedded WebExtension is an extension that combines two types of extensions in one, by incorporating a WebExtension inside of a bootstrapped or SDK extension.

Why use an Embedded WebExtension?

There are attributes (functions) of legacy add-ons that are used to store information related to the add-on that are not available in WebExtensions. Examples of these functions include user preferences, arbitrary file system access for storing assets, configuration information, stateful information, and others. If your add-on makes use of functionality like these to store information, you can use an Embedded WebExtension to access your legacy add-on data and move it over to a WebExtension. The earlier you do this, the more likely all your users will transition over smoothly.

It’s important to emphasize that Embedded WebExtensions are intended to be a transition tool, and will not be supported past Firefox 57. They should not be used for add-ons that are not expected to transition to WebExtensions.

How do I define an Embedded WebExtension?

To get started, read the documentation below. You can also contact us—we’re here to help you through the transition.

MDN docs: https://developer.mozilla.org/en-US/Add-ons/WebExtensions/Embedded_WebExtensions

Examples: https://github.com/mdn/webextensions-examples/tree/master/embedded-webextension-bootstrapped

https://github.com/mdn/webextensions-examples/tree/master/embedded-webextension-sdk

Planet MozillaNightlies in TaskCluster - go team!

As catlee has already mentioned, yesterday we shipped the first nightly builds for Linux and Android off our next-gen Mozilla continuous integration (CI) system known as TaskCluster. I eventually want to talk more about why this important and how we got to here, but for now I’d like to highlight some of the people who made this possible.

Thanks to Aki’s meticulous work planning and executing on a new chain of trust (CoT) model, the nightly builds we now ship on TaskCluster are arguably more secure than our betas and releases. Don’t worry though, we’re hard at work porting the chain of trust to our release pipeline. Jordan and Mihai tag-teamed the work to get the chain-of-trust-enabled workers doing important things like serving updates and putting binaries in the proper spots. Kim did the lion’s share of the work getting our task graphs sorted to tie together the disparate pieces. Callek wrangled all of the l10n bits. On the testing side, gbrown did some heroic work getting reliable test images setup for our Linux platforms. Finally, I’d be remiss if I didn’t also call out Dustin who kept us all on track with his migration tracker and who provided a great deal of general TaskCluster platform support.

Truly it was a team effort, and thanks to all of you for making this particular milestone happen. Onward to Mac, Windows, and release promotion!

Planet MozillaCommunicating the Dangers of Non-Secure HTTP

Password Field with Warning Drop Down

HTTPS, the secure variant of the HTTP protocol, has long been a staple of the modern Web. It creates secure connections by providing authentication and encryption between a browser and the associated web server. HTTPS helps keep you safe from eavesdropping and tampering when doing everything from online banking to communicating with your friends. This is important because over a regular HTTP connection, someone else on the network can read or modify the website before you see it, putting you at risk.

To keep users safe online, we would like to see all developers use HTTPS for their websites. Using HTTPS is now easier than ever. Amazing progress in HTTPS adoption has been made, with a substantial portion of web traffic now secured by HTTPS:

Changes to Firefox security user experience
Up until now, Firefox has used a green lock icon in the address bar to indicate when a website is using HTTPS and a neutral indicator (no lock icon) when a website is not using HTTPS. The green lock icon indicates that the site is using a secure connection.

Address bar showing green lock at https://example.com

Current secure (HTTPS) connection

Address bar at example.com over HTTP

Current non-secure (HTTP) connection

In order to clearly highlight risk to the user, starting this month in Firefox 51 web pages which collect passwords but don’t use HTTPS will display a grey lock icon with a red strike-through in the address bar.

Control Center message when visiting an HTTP page with a Password field

Clicking on the “i” icon, will show the text, “Connection is Not Secure” and “Logins entered on this page could be compromised”.

This has been the user experience in Firefox Dev Edition since January 2016. Since then, the percentage of login forms detected by Firefox that are fully secured with HTTPS has increased from nearly 40% to nearly 70%, and the number of HTTPS pages overall has also increased by 10%, as you can see in the graph above.

In upcoming releases, Firefox will show an in-context message when a user clicks into a username or password field on a page that doesn’t use HTTPS.  That message will show the same grey lock icon with red strike-through, accompanied by a similar message, “This connection is not secure. Logins entered here could be compromised.”:

Login form with Username and Password field; Password field shows warning

In-context warning for a password field on a page that doesn’t use HTTPS

What to expect in the future
To continue to promote the use of HTTPS and properly convey the risks to users, Firefox will eventually display the struck-through lock icon for all pages that don’t use HTTPS, to make clear that they are not secure. As our plans evolve, we will continue to post updates but our hope is that all developers are encouraged by these changes to take the necessary steps to protect users of the Web through HTTPS.

For more technical details about this feature, please see our blog post from last year. In order to test your website before some of these changes are in the release version of Firefox, please install the latest version of Firefox Nightly.

Thanks!
Thank you to the engineering, user experience, user research, quality assurance, and product teams that helped make this happen – Sean Lee, Tim Guan-tin Chien, Paolo Amadini, Johann Hofmann, Jonathan Kingston, Dale Harvey, Ryan Feeley, Philipp Sackl, Tyler Downer, Adrian Florinescu, and Richard Barnes. And a very special thank you to Matthew Noorenberghe, without whom this would not have been possible.

Planet MozillaWhat is participation design anyway?

As part of our insights phase for Diversity & Inclusion for Participation at Mozilla, we’ve identified ‘Participation Design’ as being as one of 5 important topics for focus group discussion.  Here is how I describe Participation Design (and thanks to Paul) for the question:

Participation design is the framework(s) we use to generate contribution opportunities that empower volunteers to ….

 

  • Recognize, and embrace and personalize the opportunity of lending time and skills to a project at Mozilla – technical and non-technical.
  • Understand the steps they need to take to be successful and engaged at a very basic level.  (task trackers, chat rooms, blogs, newsletters, wikis).
  • Complete a contribution  with  success on project goals, and value to the volunteer.
  • Grow in skills, knowledge and influence as community members, and leaders/mobilizers at Mozilla and in the broader open source community.

 

In our focus group for this topic, we’ll explore from both contributor and maintainer perspectives – what it means to design for participation for diversity, equality and inclusion.   If you want to know more about how focus groups work – here’s a great resource.

If you, or someone you know from Mozilla past, present or future has insights, experience and vision for inclusive participation design. Please nominate them!  (and select the topic ‘Participation Design’).

 

FacebookTwitterGoogle+Share

Planet MozillaWebdev Beer and Tell: January 2017

Webdev Beer and Tell: January 2017 Once a month web developers across the Mozilla community get together (in person and virtually) to share what cool stuff we've been working on in...

Planet WebKitIntroducing Riptide:WebKit’s Retreating Wavefront Concurrent Garbage Collector

As of r209827, 64-bit ARM and x86 WebKit ports use a new garbage collector called Riptide. Riptide reduces worst-case pause times by allowing the app to run concurrently to the collector. This can make a big difference for responsiveness since garbage collection can easily take 10 ms or more, even on fast hardware. Riptide improves WebKit’s performance on the JetStream/splay-latency test by 5x, which leads to a 5% improvement on JetStream. Riptide also improves our Octane performance. We hope that Riptide will help to reduce the severity of GC pauses for many different kinds of applications.

This post begins with a brief background about concurrent GC (garbage collection). Then it describes the Riptide algorithm in detail, including the mature WebKit GC foundation, on which it is built. The field of incremental and concurrent GC goes back a long time and WebKit is not the first system to use it, so this post has a section about how Riptide fits into the related work. This post concludes with performance data.

Introduction

Garbage collection is expensive. In the worst case, for the collector to free a single object, it needs to scan the entire heap to ensure that no objects have any references to the one it wants to free. Traditional collectors scan the entire heap periodically, and this is roughly how WebKit’s collector has worked since the beginning.

The problem with this approach is that the GC pause can be long enough to cause rendering loops to miss frames, or in some cases it can even take so long as to manifest as a spin. This is a well-understood computer science problem. The originally proposed solution for janky GC pauses, by Guy Steele in 1975, was to have one CPU run the app and another CPU run the collector. This involves gnarly race conditions that Steele solved with a bunch of locks. Later algorithms like Baker’s were incremental: they assumed that there was one CPU, and sometimes the application would call into the collector but only for bounded increments of work. Since then, a huge variety of incremental and concurrent techniques have been explored. Incremental collectors avoid some synchronization overhead, but concurrent collectors scale better. Modern concurrent collectors like DLG (short for Doligez, Leroy, Gonthier, published in POPL ’93 and ’94) have very cheap synchronization and almost completely avoid pausing the application. Taking garbage collection off-core rather than merely shortening the pauses is the direction we want to take in WebKit, since almost all of the devices WebKit runs on have more than one core.

The goal of WebKit’s new Riptide concurrent GC is to achieve a big reduction in GC pauses by running most of the collector off the main thread. Because Riptide will be our always-on default GC, we also want it to be as efficient — in terms of speed and memory — as our previous collector.

The Riptide Algorithm

The Riptide collector combines:

  • Marking: The collector marks objects as it finds references to them. Objects not marked are deleted. Most of the collector’s time is spent visiting objects to find references to other objects.
  • Constraints: The collector allows the runtime to supply additional constraints on when objects should be marked, to support custom object lifetime rules.
  • Parallelism: Marking is parallelized on up to eight logical CPUs. (We limit to eight because we have not optimized it for more CPUs.)
  • Generations: The collector lets the mark state of objects stick if memory is plentiful, allowing the next collection to skip visiting those objects. Sticky mark bits are a common way of implementing generational collection without copying. Collection cycles that let mark bits stick are called eden collections in WebKit.
  • Concurrency: Most of the collector’s marking phase runs concurrently to the program. Because this is by far the longest part of collection, the remaining pauses tend to be 1 ms or less. Riptide’s concurrency features kick in for both eden and full collections.
  • Conservatism: The collector scans the stack and registers conservatively, that is, checking each word to see if it is in the bounds of some object and then marking it if it is. This means that all of the C++, assembly, and just-in-time (JIT) compiler-generated code in our system can store heap pointers in local variables without any hassles.
  • Efficiency: This is our always-on garbage collector. It has to be fast.

This section describes how the collector works. The first part of the algorithm description focuses on the WebKit mark-sweep algorithm on which Riptide is based. Then we dive into concurrency and how Riptide manages to walk the heap while the heap is in flux.

Efficient Mark-Sweep

Riptide retains most of the basic architecture of WebKit’s mature garbage collection code. This section gives an overview of how our mark-sweep collector works: WebKit uses a simple segregated storage heap structure. The DOM, the Objective-C API, the type inference runtime, and the compilers all introduce custom marking constraints, which the GC executes to fixpoint. Marking is done in parallel to maximize throughput. Generational collection is important, so WebKit implements it using sticky mark bits. The collector uses conservative stack scanning to ease integration with the rest of WebKit.

Simple Segregated Storage

WebKit has long used the simple segregated storage heap structure for small and medium-sized objects (up to about 8KB):

  • Small and medium-sized objects are allocated from segregated free lists. Given a desired object size, we perform a table lookup to find the appropriate free list and then pop the first object from this list. The lookup table is usually constant-folded by the compiler.
  • Memory is divided into 16KB blocks. Each block contains cells. All cells in a block have the same cell size, called the block’s size class. In WebKit jargon, an object is a cell whose JavaScript type is “object”. For example, a string is a cell but not an object. The GC literature would typically use object to refer to what our code would call a cell. Since this post is not really concerned with JavaScript types, we’ll use the term object to mean any cell in our heap.
  • At any time, the active free list for a size class contains only objects from a single block. When we run out of objects in a free list, we find the next block in that size class and sweep it to give it a free list.

Sweeping is incremental in the sense that we only sweep a block just before allocating in it. In WebKit, we optimize sweeping further with a hybrid bump-pointer/free-list allocator we call bump’n’pop (here it is in C++ and in the compilers). A per-block bit tells the sweeper if the block is completely empty. If it is, the sweeper will set up a bump-pointer arena over the whole block rather than constructing a free-list. Bump-pointer arenas can be set up in O(1) time while building a free-list is a O(n) operation. Bump’n’pop achieves a big speed-up on programs that allocate a lot because it avoids the sweep for totally-empty blocks. Bump’n’pop’s bump-allocator always bumps by the block’s cell size to make it look like the objects had been allocated from the free list. This preserves the block’s membership in its size class.

Large objects (larger than about 8KB) are allocated using malloc.

Constraint-Based Marking

Garbage collection is ordinarily a graph search problem and the heap is ordinarily just a graph: the roots are the local variables, their values are directional edges that point to objects, and those objects have fields that each create edges to some other objects. WebKit’s garbage collector also allows the DOM, compiler, and type inference system to install constraint callbacks. These constraints are allowed to query which objects are marked and they are allowed to mark objects. The WebKit GC algorithm executes these constraints to fixpoint. GC termination happens when all marked objects have been visited and none of the constraints want to mark anymore objects. In practice, the constraint-solving part of the fixpoint takes up a tiny fraction of the total time. Most of the time in GC is spent performing a depth-first search over marked objects that we call draining.

Parallel Draining

Draining takes up most of the collector’s time. One of our oldest collector optimizations is that draining is parallelized. The collector has a draining thread on each CPU. Each draining thread has its own worklist of objects to visit, and ordinarily it runs a graph search algorithm that only sees this worklist. Using a local worklist means avoiding worklist synchronization most of the time. Each draining thread will check in with a global worklist under these conditions:

  • It runs out of work. When a thread runs out of work, it will try to steal 1/Nth of the global worklist where N is the number of idle draining threads. This means acquiring the global worklist’s lock.
  • Every 100 objects visited, the draining thread will consider donating about half of its worklist to the global worklist. It will only do this if the global worklist is empty, the global worklist lock can be acquired without blocking, and the local worklist has at least two entries.

This algorithm appears to scale nicely to about eight cores, which is good enough for the kinds of systems that WebKit usually runs on.

Draining in parallel means having to synchronize marking. Our marking algorithm uses a lock-free CAS (atomic compare-and-swap instruction) loop to set mark bits.

Sticky Mark Bits

Generational garbage collection is a classic throughput optimization first introduced by Lieberman and Hewitt and Ungar. It assumes that objects that are allocated recently are unlikely to survive. Therefore, focusing the collector on objects that were allocated since the last GC is likely to free up lots of memory — almost as much as if we collected the whole heap. Generational collectors track the generation of objects: either young or old. Generational collectors have (at least) two modes: eden collection that only collects young objects and full collection that collects all objects. During an eden collection, old objects are only visited if they are suspected to contain pointers to new objects.

Generational collectors need to overcome two hurdles: how to track the generation of objects, and how to figure out which old objects have pointers to new objects.

The collector needs to know the generation of objects in order to determine which objects can be safely ignored during marking. In a traditional generational collector, eden collections move objects and then use the object’s address to determine its generation. Our collector does not move objects. Instead, it uses the mark bit to also track generation. Quite simply, we don’t clear any mark bits at the start of an eden collection. The marking algorithm will already ignore objects that have their mark bits set. This is called sticky mark bit generational garbage collection.

The collector will avoid visiting old objects during an eden collection. But it cannot avoid all of them: if an old object has pointers to new objects, then the collector needs to know to visit that old object. We use a write barrier — a small piece of instrumentation that executes after every write to an object — that tells the GC about writes to old objects. In order to cheaply know which objects are old, the object header also has a copy of the object’s state: either it is old or it is new. Objects are allocated new and labeled old when marked. When the write barrier detects a write to an old object, we tell the GC by setting the object’s state to old-but-remembered and putting it on the mark stack. We use separate mark stacks for objects marked by the write barrier, so when we visit the object, we know whether we are visiting it due to the barrier or because of normal marking (i.e. for the first time). Some accounting only needs to happen when visiting the object for the first time. The complete barrier is simply:

object->field = newValue;
if (object->cellState == Old)
    remember(object);

Generational garbage collection is an enormous improvement in performance on programs that allocate a lot, which is common in JavaScript. Many new JavaScript features, like iterators, arrow functions, spread, and for-of allocate lots of objects and these objects die almost immediately. Generational GC means that our collector does not need to visit all of the old objects just to delete the short-lived garbage.

Conservative Roots

Garbage collection begins by looking at local variables and some global state to figure out the initial set of marked objects. Introspecting the values of local variables is tricky. WebKit uses C++ local variables for pointers to the garbage collector’s heap, but C-like languages provide no facility for precisely introspecting the values of specific variables of arbitrary stack frames. WebKit solves this problem by marking objects conservatively when scanning roots. We use the simple segregated storage heap structure in part because it makes it easy to ask whether an arbitrary bit pattern could possibly be a pointer to some object.

We view this as an important optimization. Without conservative root scanning, C++ code would have to use some API to notify the collector about what objects it points to. Conservative root scanning means not having to do any of that work.

Mark-Sweep Summary

Riptide implements complex notions of reachability via arbitrary constraint callbacks and allows C++ code to manipulate objects directly. For performance, it parallelizes marking and uses generations to reduce the average amount of marking work.

Handling Concurrency

Riptide makes the draining phase of garbage collection concurrent. This works because of a combination of concurrency features:

  • Riptide is able to stop the world for certain tricky operations like stack scanning and DOM constraint solving.
  • Riptide uses a retreating wavefront write barrier to manage races between marking and object mutation. Using retreating wavefront allows us to avoid any impedance mismatch between generational and concurrent collector optimizations.
  • Retreating wavefront collectors can suffer from the risk of GC death spirals, so Riptide uses a space-time scheduler to put that in check.
  • Visiting an object while it is being reshaped is particularly hard, and WebKit reshapes objects as part of type inference. We use an obstruction-free double collect snapshot to ensure that the collector never marks garbage memory due to a visit-reshape race.
  • Lots of objects have tricky races that aren’t on the critial path, so we put a fast, adaptive, and fair lock in every JavaScript object as a handy way to manage them. It fits in two otherwise unused bits.

While we wrote Riptide for WebKit, we suspect that the underlying intuitions could be useful for anyone wanting to write a concurrent, generational, parallel, conservative, and non-copying collector. This section describes Riptide in detail.

Stopping The World and Safepoints

Riptide does draining concurrently. It is a goal to eventually make other phases of the collector concurrent as well. But so long as some phases are not safe to run concurrently, we need to be able to bring the application to a stop before performing those phases. The place where the collector stops needs to be picked so as to avoid reentrancy issues: for example stopping to run the GC in the middle of the GC’s allocator would create subtle problems. The concurrent GC avoids these problems by only stopping the application at those points where the application would trigger a GC. We call these safepoints. When the collector brings the application to a safepoint, we say that it is stopping the world.

Riptide currently stops the world for most of the constraint fixpoint, and resumes the world for draining. After draining finishes, the world is again stopped. A typical collection cycle may have many stop-resume cycles.

Retreating Wavefront

Draining concurrently means that just as we finish visiting some object, the application may store to one of its fields. We could store a pointer to an unmarked object into an object that is already visited, in which case the collector might never find that unmarked object. If we don’t do something about this, the collector would be sure to prematurely delete objects due to races with the application. Concurrent garbage collectors avoid this problem using write barriers. This section describes Riptide’s write barrier.

Write barriers ensure that the state of the collector is still valid after any race, either by marking objects or by having objects revisited (GC Handbook, chapter 15). Marking objects helps the collector make forward progress; intuitively, it is like advancing the collector’s wavefront. Having objects revisited retreats the wavefront. The literature of full of concurrent GC algorithms, like the Metronome, C4, and DLG, that all use some kind of advancing wavefront write barrier. The simplest such barrier is Dijkstra’s, which marks objects anytime a reference to them is created. I used these kinds of barriers in my past work because they make it easy to make the collector very deterministic. Adding one of those barriers to WebKit would be likely to create some performance overhead since this means adding new code to every write to the heap. But the retreating wavefront barrier, originally invented by Guy Steele in 1975, works on exactly the same principle as our existing generational barrier. This allows Riptide to achieve zero barrier overhead by reusing WebKit’s existing barrier.

It’s easiest to appreciate the similarity by looking at some barrier code. Our old generational barrier looked like this:

object->field = newValue;
if (object->cellState == Old)
    remember(object);

Steele’s retreating wavefront barrier looks like this:

object->field = newValue;
if (object->cellState == Black)
    revisit(object);

Retreating wavefront barriers operate on the same principle as generational barriers, so it’s possible to use the same barrier for both. The only difference is the terminology. The black state means that the collector has already visited the object. This barrier tells the collector to revisit the object if its cellState tells us that the collector had already visited it. This state is part of the classic tri-color abstraction: white means that the GC hasn’t marked the object, grey means that the object is marked and on the mark stack, and black means that the object is marked and has been visited (so is not on the mark stack anymore). In Riptide, the tri-color states that are relevant to concurrency (white, grey, black) perfectly overlap with the sticky mark-bit states that are relevant to generations (new, remembered, old). The Riptide cell states are as follows:

  • DefinitelyWhite: the object is new and white.
  • PossiblyGrey: the object is grey, or remembered, or new and white.
  • PossiblyBlack: the object is black and old, or grey, or remembered, or new and white.

A naive combination generational/concurrent barrier might look like this:

object->field = newValue;
if (object->cellState == PossiblyBlack)
    slowPath(object);

This turns out to need tweaking to work. The PossiblyBlack state is too ambiguous, so the slowPath needs additional logic to work out what the object’s state really was. Also, the order of execution matters: the CPU must run the object->cellState load after it runs the object->field store. That’s hard, since CPUs don’t like to obey store-before-load orderings. Finally, we need to guarantee that the barrier cannot retreat the wavefront too much.

Disambiguating Object State

The GC uses the combination of the object’s mark bit in the block header and the cellState byte in the object’s header to determine the object’s state. The GC clears mark bits at the start of full collection, and it sets the cellState during marking and barriers. It doesn’t reset objects’ cellStates back to DefinitelyWhite at the start of a full collection, because it’s possible to infer that the cellState should have been reset by looking at the mark bit. It’s important that the collector never scans the heap to clear marking state, and even mark bits are logically cleared using versioning. If an object is PossiblyBlack or PossiblyGrey and its mark bit is logically clear, then this means that the object is really white. Riptide’s barrier slowPath is almost like our old generational slow path but it has a new check: it will not do anything if the mark bit of the target object is not set, since this means that we’re in the middle of a GC and the object is actually white. Additionally, the barrier will attempt to set the object back to DefinitelyWhite so that the slowPath path does not have to see the object again (at least not until it’s marked and visited).

Store-Before-Barrier Ordering

The GC must flag the object as PossiblyBlack just before it starts to visit it and the application must store to field before loading object->cellState. Such ordering is not guaranteed on any modern architecture: both x86 and ARM will sink the store below the load in some cases. Inserting an unconditional store-load fence, such as lock; orl $0, (%rsp) on x86 or dmb ish on ARM, would degrade performance way too much. So, we make the fence itself conditional by playing a trick with the barrier’s condition:

object->field = newValue;
if (object->cellState <= blackThreshold)
    slowPath(object);

Where blackThreshold is a global variable. The PossiblyBlack state has the value 0, and when the collector is not running, blackThreshold is 0. But once the collector starts marking, it sets blackThreshold to 100 while the world is stopped. Then the barrier’s slowPath leads with a check like this:

storeLoadFence();
if (object->cellState != PossiblyBlack)
    return;

This means that the application takes a slight performance hit while Riptide is running. In typical programs, this overhead is about 5% during GC and 0% when not GCing. The only additional cost when not GCing is that blackThreshold must be loaded from memory, but we could not detect a slow-down due to this change. The 5% hit during collection is worth fixing, but to put it in perspective, the application used to take a 100% performance hit during GC because the GC would stop the application from running.

The complete Riptide write barrier is emitted as if the following writeBarrier function had been inlined just after any store to target:

ALWAYS_INLINE void writeBarrier(JSCell* target)
{
    if (LIKELY(target->cellState() > blackThreshold))
        return;
    storeLoadFence();
    if (target->cellState() != PossiblyBlack)
        return;
    writeBarrierSlow(target);
}

NEVER_INLINE void writeBarrierSlow(JSCell* target)
{
    if (!isMarked(target)) {
        // Try to label this object white so that we don't take the barrier
        // slow path again.
        if (target->compareExchangeCellState(PossiblyBlack, DefinitelyWhite)) {
            if (Heap::isMarked(target)) {
                // A race! The GC marked the object in the meantime, so
                // pessimistically label it black again.
                target->setCellState(PossiblyBlack);
            }
        }
        return;
    }

    target->setCellState(DefinitelyGrey);
    m_mutatorMarkStack->append(target);
}

The JIT compiler inlines the part of the slow path that rechecks the object’s state after doing a fence, since this helps keep the overhead low during GC. Moreover, our just-in-time compilers optimize the barrier further by removing barriers if storing values that the GC doesn’t care about, removing barriers on newly allocated objects (which must be white), clustering barriers together to amortize the cost of the fence, and removing redundant barriers if an object is stored to repeatedly.

Revisiting

When the barrier does append the object to the m_mutatorMarkStack, the object will get revisited eventually. The revisit could happen concurrently to the application. That’s important since we have seen programs retreat the wavefront enough that the total revisit pause would be too big otherwise.

Unlike advancing wavefront, retreating wavefront means forcing the collector to redo work that it has already done. Without some facilities to ensure collector progress, the collector might never finish due to repeated revisit requests from the write barrier. Riptide tackles this problem in two ways. First, we defer all revisit requests. Draining threads do not service any revisit requests until they have no other work to do. When an object is flagged for revisiting, it stays in the grey state for a while and will only be revisited towards the end of GC. This ensures that if an old object often has its fields overwritten with pointers to new objects, then the GC will usually only scan two snapshots’ worth of those fields: one snapshot whenever the GC visited the object first, and another towards the end when the GC gets around to servicing deferred revisits. Revisit deferral reduces the likelihood of runaway GC, but fully eliminating such pathologies is left to our scheduler.

Space-Time Scheduler

The bitter end of a retreating wavefront GC cycle is not pretty: just as the collector goes to visit the last object on the mark stack, some object that had already been visited gets written to, and winds up back on the mark stack. This can go on for a while, and before we had any mitigations we saw Riptide using 5x more memory than with synchronous collection. This death spiral happens because programs allocate a lot all the time and the collector cannot free any memory until it finishes marking. Riptide prevents death spirals using a scheduler that controls the application’s pace. We call it the space-time scheduler because it links the amount of time that the application gets to run for in a timeslice to the amount of space that the application has used by allocating in the collector’s headroom.

The space-time scheduler ensures that the retreating wavefront barrier cannot wreak havoc by giving the collector an unfair advantage: it will periodically stop the world for short pauses even when the collector could be running concurrently. It does this just so the collector can always outpace the application in case of a race. If this was meant as a garbage collector for servers, you could imagine providing the user with a bunch of knobs to control the schedule of these synthetic pauses. Different applications will have different ideal pause lengths. Applications that often write to old memory will retreat the collector’s wavefront a lot, and so they will need a longer pause to ensure termination. Functional-style programs tend to only write to newly allocated objects, so those could get away with a shorter pause. We don’t want web users or web developers to have to configure our collector, so the space-time scheduler adaptively selects a pause schedule.

To be correct, the scheduler must eventually pause the world for long enough to let the collector terminate. The space-time scheduler is based on a simple idea: the length of pauses increases during collection in response to how much memory the application is using.

The space-time scheduler selects the duration and spacing of synthetic pauses based on the headroom ratio, which is a measure of the amount of extra memory that the application has allocated during the concurrent collection. A concurrent collection is triggered by memory usage crossing the trigger threshold. Since the collector allows the application to keep running, the application will keep allocating. The space that the collector makes available for allocation during collection is called the headroom. Riptide is tuned for a max headroom that is 50% larger than the trigger threshold: so if the app needed to allocate 100MB to trigger a collection, its max headroom is 50MB. We want the collector to complete synchronously if we ever deplete all of our headroom: at that point it’s better for the system to pause and free memory than to run and deplete even more memory. The headroom ratio is simply the available headroom divided by the max headroom. The space-time scheduler will divide time into fixed timeslices, and the headroom ratio determines how much time the application is resumed for during that period.

The default tuning of our collector is that the collector timeslice is 2 ms, and the first C ms of it is given to the collector and the remaining M ms is given to the mutator. We always let the collector pause for at least 0.6 ms. Let H be the headroom ratio: 1 at the start of collection, and 0 if we deplete all headroom. With a 0.6 ms minimum pause and a 2 ms timeslice, we define M and C as follows:

M = 1.4 H
C = 2 – M

For example, at the start of usual collection we will give 0.6 ms to the collector and then 1.4 ms to the application, but as soon as the application starts allocating, this window shifts. Aggressive applications, which both allocate a lot and write to old objects a lot, will usually end collection with the split being closer to 1 ms for the collector followed by 1 ms for the application.

Thanks to the space-time scheduler, the worst that an adversarial program could do is cause the GC to keep revisiting some object. But it can’t cause the GC to run out of memory, since if the adversary uses up all of the headroom then M becomes 0 and the collector gets to stop the world until the end of the cycle.

Obstruction-Free Double Collect Snapshot

Concurrent garbage collection means finding exciting new ways of side-stepping expensive synchronization. In traditional concurrent mark-sweep GCs, which focused on nicely-typed languages, the worst race was the one covered by the write barrier. But since this is JavaScript, we get to have a lot more fun.

JavaScript objects may have properties added to them at any time. The WebKit JavaScript object model has three features that makes this efficient:

  • Each object has a structure ID: The first 32 bits of each object is its structure ID. Using a table lookup, this gives a pointer to the object’s structure: a kind of meta-object that describes how its object is supposed to look. The object’s layout is governed by its structure. Some objects have immutable structures, so for those we know that so long as their structure IDs stay the same, they will be laid out the same.
  • The structure may tell us that the object has inline storage. This is a slab of space in the object itself, left aside for JavaScript properties.
  • The structure may tell us about the object’s butterfly. Each object has room for a pointer that can be used to point to an overflow storage for additional properties that we call a butterfly. The butterfly is a bidirectional object that may store named properties to the left of the pointer and indexed properties to the right.

It’s imperative that the garbage collector visits the butterfly using exactly the structure that corresponds to it. If the object has a mutable structure, it’s imperative that the collector visits the butterfly using the data from the structure that corresponds to that butterfly. The collector would crash if it tried to decode the butterfly using wrong information.

To accomplish this, we use a very simple obstruction-free version of Afek et al’s double collect snapshot. To handle the immutable structure case, we just ensure that the application uses this protocol to set both the structure and butterfly:

  1. Nuke the structure ID — this sets a bit in the structure ID to indicate to the GC that the structure and butterfly are changing.
  2. Set the butterfly.
  3. Set the new (decontaminated) structure ID — decontaminating means clearing the nuke bit.

Meanwhile the collector does this to read both the structure and the butterfly:

  1. Read the structure ID.
  2. Read the butterfly.
  3. Read the structure ID again, and compare to (1).

If the collector ever reads a nuked structure ID, or if the structure ID in (1) and (3) are different, then we know that we will have a butterfly-structure mismatch. But if none of these conditions hold, then we are guaranteed that the collector will have a consistent structure and butterfly. See here for the proof.

Harder still is the case where the structure is mutable. In this case, we ensure that the protocol for setting the fields in the structure is to set them after the structure is nuked but before the new one is installed. The collector reads those fields before/after as well. This allows the collector to see a consistent snapshot of the structure, butterfly, and a bit inside the structure without using any locking. All that matters is that the stores in the application and the loads in the collector are ordered. We get this for free on x86, and on ARM we use store-store fences in the application (dmb ishst) and load-load fences in the collector (dmb ish).

This algorithm is said to be obstruction-free because it will complete in O(1) time no matter what kind of race it encounters, but if it does encounter a race then it’ll tell you to try again. Obstruction-free algorithms need some kind of contention manager to ensure that they do eventually complete. The contention manager must provably maximize the likelihood that the obstruction-free algorithm will eventually run without any race. For example, this would be a sound contention manager: exponential back-off in which the actual back-off amount is a random number between 0 and X where X increases exponentially on each try. It turns out that Riptide’s retreating wavefront revisit scheduler is already a natural contention manager. When the collector bails on visiting an object because it detected a race, it schedules revisiting of that object just as if a barrier had executed. So, the GC will visit any object that encountered such a race again anyway. The GC will visit the object much later and the timing will be somewhat pseudo-random due to OS scheduling. If an object did keep getting revisited, eventually the space-time scheduler will increase the collector’s synthetic pause to the point where the revisit will happen with the world stopped. Since there are no safepoints possible in any of the structure/butterfly atomic protocols, stopping the world ensures that the algorithm will not be obstructed.

Embedded WTF Locks

The obstruction-free object snapshot is great, but it’s not scalable — from a WebKit developer sanity standpoint — to use it everywhere. Because we have been adding more concurrency to WebKit for a while, we made this easier by already having a custom locking infrastructure in WTF (Web Template Framework). One of the goals of WTF locks was to fit locks in two bits so that we may one day stuff a lock into the header of each JavaScript object. Many of the loony corner-case race conditions in the concurrent garbage collector happen on paths where acquiring a lock is fine, particularly if that lock has a great inline fast path like WTF locks. So, all JavaScript objects in WebKit now have a fast, adaptive, and fair WTF lock embedded in two bits of what is otherwise the indexingType byte in the object header. This internal lock is used to protect mutations to all sorts of miscellaneous data structures. The collector will hold the internal lock while visiting those objects.

Locking should always be used with care since it can be a slow-down. In Riptide, we only use locking to protect uncommon operations. Additionally, we use an optimized lock implementation to reduce the cost of synchronization even further.

Algorithm Summary

Riptide is an improvement to WebKit’s collector and retains most of the things that made the old algorithm great. The changes that transformed WebKit’s collector were landed over the past six months, starting with the painful work of removing WebKit’s previous use of copying. Riptide combines Guy Steele’s classic retreating wavefront write barrier with a mature sticky-mark-sweep collector and lots of concurrency tricks to get a useful combination of high GC throughput and low GC latency.

Related Work

The paper that introduced retreating wavefront did not claim to implement the idea — it was just a thought experiment. We are aware of two other implementations of retreating wavefront. The oldest is the BDW (Boehm-Demers-Weiser) collector‘s incremental mode. That collector uses a page-granularity revisit because it relies entirely on page faults to trigger the barrier. The collector makes pages that have black objects read-only and then any write to that page triggers a fault. The fault handler makes the page read-write and logs the entire page for revisiting. Riptide uses a software barrier that precisely triggers revisiting only for the object that got stored to. The BDW collector uses page faults for a good reason: so that it can be used as a plug-in component to any kind of language environment. The compiler doesn’t have to be aware of retreating wavefronts or generations since the BDW collector will be sure to catch all of the writes that it cares about. But in WebKit we are happy to have everything tightly integrated and so Riptide relies on the rest of WebKit to use its barrier. This was not hard since the new barrier is almost identical to our old one.

Another user of retreating wavefront is ChakraCore. It appears to have both a page-fault-based barrier like BDW and a software card-marking barrier that can flag 128-byte regions of memory as needing revisit. (For a good explanation of card-marking, albeit in a different VM, see here.) Riptide uses an object-granularity barrier instead. We tried card-marking, but found that it was slower than our barrier unless we were willing to place our entire heap in a single large virtual memory reservation. We didn’t want our memory structure to be that deterministic. All retreating wavefront collectors require a stop-the-world snapshot-at-the-end increment that confirms that there is no more marking left to do. Both BDW and ChakraCore perform all revisiting during the snapshot-at-the-end. If there is a lot of revisiting work, that increment could take a while. That risk is particularly high with card-marking or fault-based barriers, in which a write to a single object usually causes the revisiting of multiple objects. Riptide can revisit objects with the application resumed. Riptide can also resume the application in between executions of custom constraints. Riptide is tuned so that the snapshot-at-the-end is only confirming that there is no more work, rather than spending an unbounded amount of time creating and chasing down new work.

Instead of retreating wavefront, most incremental, concurrent, and real-time collectors use some kind of advancing wavefront barrier. In those kinds of barriers, the application marks the objects it interacts with under certain conditions. Baker’s barrier marks every pointer you load from the heap. Dijkstra’s barrier marks every pointer you store into the heap. Yuasa’s barrier marks every pointer you overwrite. All of these barriers advance the collector’s wavefront in the sense that they reduce the amount of work that the collector will have to do — the thinking goes that the collector would have marked the object anyway so the barrier is helping. Since these collectors usually allocate objects black during collection, marking objects will not postpone when the collector can finish. This means that advancing wavefront collectors will mark all objects that were live at the very beginning of the cycle and all objects allocated during the cycle. Keeping objects allocated during the GC cycle (which may be long) is called floating garbage. Retreating wavefront collectors largely avoid floating garbage since in those collectors an object can only be marked if it is found to be referenced from another marked object.

Advancing wavefront barriers are not a great match for generational collection. The generational barrier isn’t going to overlap with an advancing wavefront barrier the way that Riptide’s, ChakraCore’s, and BDW’s do. This means double the barrier costs. Also, in an advancing wavefront generational collector, eden collections have to be careful to ensure that their floating garbage doesn’t get promoted. This requires distinguishing between an object being marked for survival versus being marked for promotion. For example, the Domani, Kolodner, Petrank collector has a “yellow” object state and special color-toggling machinery to manage this state, all so that it does not promote floating garbage. The Frampton, Bacon, Cheng, and Grove version of the Metronome collector maintains three nurseries to gracefully move objects between generations, and in their collector the eden collections and full collections can proceed concurrently to each other. While those collectors have incredible features, they are not in widespread use, probably because of increased baseline costs due to extra bookkeeping and extra barriers. To put in perspective how annoying the concurrent-generational integration is, many systems like V8 and HotSpot avoid the problem by using synchronous eden collections. We want eden collections to be concurrent because although they are usually fast, we have no bound on how long they could take in the worst case. Not having floating garbage is another reason why it’s so easy for retreating wavefront collectors to do concurrent eden collection: there’s no need to invent states for black-but-new objects.

Using retreating wavefront means we don’t get the advancing wavefront’s GC termination guarantee. We make up for it by having more aggressive scheduling. It’s common for advancing wavefront collectors to avoid all global pauses because all of collection is concurrent. In the most aggressive advancing wavefront concurrent collectors, the closest thing to a “pause” is that at some point each thread must produce a stack scan. Even if all of Riptide’s algorithms were concurrent, we would still have to artificially stop the application simply to ensure termination. That’s a trade-off that we’re happy with, since we get to control how long these synthetic pauses are.

In many ways, Riptide is a classic mark-sweep collector. Using simple segregated storage is very common, and variants of this technique can be found in Jikes RVM, the Metronome real-time garbage collector, the BDW collector, the Bartok concurrent mark-sweep collector, and probably many others. Combining mark-sweep with bump-pointer is not new; Immix is another way to do it. Our conservative scan is almost like what the BDW collector does. Sticky mark bits are also used in BDW, Jikes RVM, and ChakraCore.

Evaluation

We enabled Riptide once we were satisfied that it did not have any major remaining regressions (in stability, performance, and memory usage) and that it demonstrated an improvement on some test of GC pauses. Enabling it now enables us to expose it to a lot of testing as we continue to tune and validate this collector. This section summarizes what we know about Riptide’s performance so far.

The synchronization features that enable concurrent collection were landed in many revisions over a six month period starting in July 2016. This section focuses on the performance boost that we get once we enable Riptide. Enabling Riptide means that draining will resume the application and allow the application and collector to run alongside each other. The application will still experience pauses: both synthetic pauses from the space-time scheduler and mandatory pauses for things like DOM constraint evaluation. The goal of this evaluation is to give a glimpse of what Riptide can do for observed pauses.

The test that did the best job of demonstrating our garbage collector’s jankyness was the Octane SplayLatency test. This test is also included in JetStream. WebKit was previously not the best at either version of this test so we wanted a GC that would give us a big improvement. The Octane version of this test reports the reciprocal of the root-mean-squared, which rewards uniform performance. JetStream reports the reciprocal of the average of the worst 0.5% of samples, which rewards fast worst-case performance. We tuned Riptide on the JetStream version of this test, but we show results from both versions.

The performance data was gathered on a 15″ MacBook Pro with a 2.8 GHz Intel Core i7 and 16GB RAM. This machine has four cores, and eight logical CPUs thanks to hyperthreading. We took care to quiet down the machine before running benchmarks, by closing almost all apps, disconnecting from the network, disabling Spotlight, and disabling ReportCrash. Our GC is great at taking advantage of hyperthreaded CPUs, so it runs eight draining threads on this machine.

<figure>
</figure>

The figure above shows that Riptide improves the JetStream/splay-latency score by a factor of five.

<figure>
</figure>

The figure above shows that Riptide improves the Octane/SplayLatency score by a factor of 2.5.

<figure>
</figure>

The chart above shows what is happening over 10,000 iterations of the Splay benchmark: without Riptide, an occasional iteration will pause for >10 ms due to garbage collection. Enabling Riptide brings these hiccups below 3 ms.

You can run this benchmark interactively if you want to see how your browser’s GC performs. That version will plot the time per iteration in milliseconds over 2,000 iterations.

We continue to tune Riptide as we validate it on a larger variety of workloads. Our goal is to continue to reduce pause times. That means making more of the collector concurrent and improving the space-time scheduler. Continued tuning is tracked by bug 165909.

Conclusion

This post describes the new Riptide garbage collector in WebKit. Riptide does most of its work off the main thread, allowing for a significant reduction in worst-case pause times. Enabling Riptide leads to a five-fold improvement in latency as reported by the JetStream/splay-latency test. Riptide is now enabled by default in WebKit trunk and you can try it out in Safari Technology Preview 21. Please try it out and file bugs!

Planet MozillaA hyper update

hyper 0.10

A new version of hyper was released last week, v0.10.0, to fix a lot of dependency-related problems that the 0.9 versions were having. The biggest problem was version incompatibilities with OpenSSL. You can read all about the problem and solution in the issue tracker, but the tl;dr is that hyper 0.10 no longer depends on OpenSSL, or any TLS implementation really. TLS is a big part of HTTP, but there are also several different implementations, and they all release on their own schedules. Even just an optional dependency on OpenSSL could cause unrecoverable dependency conflicts.

This should be the last feature release of hyper using blocking IO. You can sort of think of it as an LTS version. If there are serious bugs or security fixes, a point release will be made. But all development is completely focused on the new release that will be using tokio. Speaking of…

hyper and tokio

The work to make hyper use non-blocking IO has been a long road. In recent months, it has been with the help of the tokio library, which just recently released a version. We just merged the tokio branch into master this week!

wrk -t 10 -d 10s -c 20: 225759.00 requests per second1

The full pipeline works great, and it’s fast!2 Using futures feels very natural when programming with the asynchronicity of HTTP messages. Instead of including inline code examples here that may grow stale, I’ll just point to the examples in the hyper repository. And yes, full guides will be coming soon, as part of the 0.11 release.

There’s still things to touch up. We’re still trying to find the best ways to a) setup an HTTP server or client for easy use, b) plug in an HTTP server or client into a the wider tokio ecosystem. There’s still internal performance things we could do to get even faster. But there’s a light at the end of this tunnel. It’s growing. If you’d like, join in! Or try to port your framework to use it, and provide feedback.

Soon, we’ll have a much better answer for are we web yet?


  1. The benchmarks are hand-wavey at the moment. I surprisingly don’t have an environment available to me to benchmark a bunch of different settings and against other HTTP libraries. If you’d like to help record some benchmarks, I’d greatly appreciate it. 

  2. Of course, the biggest benefit of non-blocking IO is that it is the best way to scale when you have other IO to do in order to serve a request (files, databases, other networking, etc), or when the payloads are bigger than the the network packet size, and you want to serve thousands of those requests at the same time. 

Planet MozillaCalling on the New U.S Presidential Administration to Make the Internet a Priority

A new U.S. Presidential Administration takes office today. While there are many serious issues to tackle and many hard-fought debates to commence, no matter what your political views are, I think one thing we can all agree on is the need for progress on the issues that impact internet users around the world.

So, I’ll make a short and sweet request to all U.S. policymakers, new and returning:

 

Please make the internet a priority.

What do we mean by that?

Protect and advance cybersecurity.

Many of the most critical issues that affect internet users are related to cybersecurity. It’s about more than just attacks and protecting nation states. Encryption, secure communications, government surveillance, lawful hacking, and even online privacy and data protection, at the end of the day, are fundamentally about securing data and protecting users. It’s about the importance and challenges of the day to day necessities of making systems secure and trustworthy for the internet as a global public resource.

We’ve talked about how protecting cybersecurity is a shared responsibility.  There is a need for governments, tech companies and users to work together on topics like encryption, security vulnerabilities and surveillance.  We want to help make that happen.  But we need this Administration to sign on and sign up to do it.

A bipartisan Congressional working group recently released a report that concluded encryption backdoors aren’t really necessary and can, in fact, be harmful. The report included questions about other important cybersecurity issues as well, including “lawful hacking” by governments and government disclosure of security vulnerabilities. We were encouraged by these recommendations, but we need to see more progress.

Promote innovation and accessibility.

No one owns the internet – it is all of ours to create, shape, and benefit from. And for the future of our society and our economy, we need to keep it that way – open and distributed.

The internet gives everyone a voice and creates a place for self expression and innovation. We need to keep the internet open and accessible to all.

We need to create pathways and partnerships for the public and private sectors to work together to make progress on challenging issues like net neutrality, copyright and patent policy. We can also create a space for innovation by investing more in, promoting and using open source software, which benefits not only technology but also everything it touches.

What Else?

I promised to keep it short and sweet, so these are just a few of the most important internet issues we need to work on, together.

The Obama Administration worked well with Mozilla and other companies, researchers and constituents to make progress in these areas. We were pleased to see the recent appointment of the nation’s first Federal Chief Information Security officer as part of the Cybersecurity National Action Plan. We hope this type of bi-partisan activity continues, to advance cybersecurity and internet health for all of us.

We’re calling on you, new and returning U.S. policymakers, to lead, and we stand ready to work with you. We make this ask of you because we’re not your average technology company. We do this as part of our role as the champions of internet health. Mozilla was founded with a mission to promote openness, innovation and opportunity online. We fight and advocate for that mission everyday to protect and advance the health of the internet, in addition to creating technology products and solutions to support our mission.

We know there are many policy challenges in front of you and many competing priorities to balance, but we can’t sit back and wait for another blow to internet health – we must work together to make the internet as strong as possible.

Planet MozillaNightly builds from Taskcluster

Yesterday, for the very first time, we started shipping Linux Desktop and Android Firefox nightly builds from Taskcluster.

74851712.jpg

We now have a much more secure, resilient, and hackable nightly build and release process.

It's more secure, because we have developed a chain of trust that allows us to verify all generated artifacts back to the original decision task and docker image. Signing is no longer done as part of the build process, but is now split out into a discrete task after the build completes.

The new process is more resilient because we've split up the monolithic build process into smaller bits: build, signing, symbol upload, upload to CDN, and publishing updates are all done as separate tasks. If any one of these fail, they can be retried independently. We don't have to re-compile the entire build again just because an external service was temporarily unavailable.

Finally, it's more hackable - in a good way! All the configuration files for the nightly build and release process are contained in-tree. That means it's easier to inspect and change how nightly builds are done. Changes will automatically ride the trains to aurora, beta, etc.

Ideally you didn't even notice this change! We try and get these changes done quietly, smoothly, in the background.

This is a giant milestone for Mozilla's Release Engineering and Taskcluster teams, and is the result of many months of hard work, planning, coding, reviewing and debugging.

Big big thanks to jlund, Callek, mtabara, kmoir, aki, dustin, sfraser, jlorenzo, coop, jmaher, bstack, gbrown, and everybody else who made this possible!

Planet MozillaRust Meetup January 2017

Rust Meetup January 2017 Rust meetup for January 2017

Planet MozillaWhat’s Up with SUMO – 19th January 2017

Hello, SUMO Nation!

Welcome to the third post of the year, friends and revolutionaries :-) Time flies! We have a large set of updates to share with you, so let’s dig in without delay. As always, if you think we’re missing something in this post (or you’d like to see something mentioned in the future) use the comments section!

Welcome, new contributors!

If you just joined us, don’t hesitate – come over and say “hi” in the forums!

SUMO Community meetings

  • LATEST ONE: 18th of January – you can read the notes here (and see the video at AirMozilla).
  • NEXT ONE: happening on the 25th of January!
  • Reminder – if you want to add a discussion topic to the upcoming meeting agenda:
    • Start a thread in the Community Forums, so that everyone in the community can see what will be discussed and voice their opinion here before Wednesday (this will make it easier to have an efficient meeting).
    • Please do so as soon as you can before the meeting, so that people have time to read, think, and reply (and also add it to the agenda).
    • If you can, please attend the meeting in person (or via IRC), so we can follow up on your discussion topic during the meeting with your feedback.

Community

Platform

Social

Support Forum

Knowledge Base & L10n

  • Over 400 edits in the KB in all locales since the last blog post – thank you for making the Knowledge Base better, everyone!.
  • Jeff from the l10n team shared our “big cousin’s” plans for 2017 – read all about it here!
  • You can see the L10n migration bugs in progress and fixed here: (the list includes making sure localized content lands in the right language category – yay!)
  • We now have an “Other languages” component linking to existing locales. It can be seen on article pages in the side panel. We’re working to figure out the best way to make it easier for non-existing locales to be translated using the same (or similar) setup. Thanks for all your feedback on that!
  • Michal is taking a well-deserved break from making SUMO awesomer for Czech users. Thank you Mikk & good luck with your final educational stretch! In the meantime, Jiri is taking over the keys to the Czech part of the SUMO kingdom :-) Teamwork!

Firefox

  • for iOS
    • Version 6.0 was released on January 17th, including:
      • an update for mailto:// links.
      • something for developersSwift 2.3!
    • You can find the contributor forums for iOS here.
    • Firefox Focus / Firefox Klar comes out next week! There will be a “Set your default search engine” article and an update to the existing article in the works.

…and that’s that for this week, dear people of SUMO (and not only). We hope you’ve had a good week so far and that the next seven days bring a lot of interesting, exciting, and inspiring moments. See you next week!

Planet MozillaConnected Devices Weekly Program Update, 19 Jan 2017

Connected Devices Weekly Program Update Weekly project updates from the Mozilla Connected Devices team.

Planet MozillaEmoji and Bugzilla

As announced earlier, we will enable emoji in user inputs in Bugzilla.

We have not committed to a date to release this feature as it requires a production database change which we haven't yet scheduled.

Meanwhile, this change means that as an Bugzilla API consumer, you will need to be ready to accept emoji in your systems.

In particular, if your client application uses MySQL, you'll need to update your databases and tables to use utf8mb4 instead of utf8 encoding, otherwise if you try writing strings containing emoji, they would be truncated at the first emoji which would be a 💩 situation. Adjust that caveat as needed for other data stores such as Postgres.

If your application will need more time to support emoji, please contact us on IRC (irc://irc.mozilla.org/bteam) or comment here. Otherwise we'll assume you're 😎 with this change, and not 😱.

Also, once we turn this feature on, some Bugzilla users will think themselves clever and create bugs with 💩💩💩 as the title. If the bug contains little else than that, it's probably a 🗑🔥 and can be closed as INVALID.

Your friendly, neighborhood 🐞👩



comment count unavailable comments

Planet MozillaRust Paris meetup #35

Rust Paris meetup #35 Mozilla and the Rust community invites everyone in the area of Paris to gather and share knowledge about the Rust programming language. This month we...

Planet MozillaEqual Rating Innovation Challenge: And the Semifinalists are…

Announcing the five innovative concepts that made it to the final round

About three months ago we launched this global Equal Rating Innovation Challenge to help catalyze new thinking and innovation to provide access to the open Internet to those still living without. Clearly the idea resonated. Thanks to the help of numerous digital inclusion initiatives, think tanks, impact hubs and various local communities that supported us, our challenge has spurred global engagement. We received 98 submissions from 27 countries around the world. This demonstrates that there are entrepreneurs, researchers, and innovators in myriad fields poised to tackle this huge challenge with creative products and services.

<figure></figure>

Our judging panel evaluated the submissions against the criteria of compliance with Equal Rating, affordability and accessibility, empathy, technical feasibility, as well as scalability, user experience, differentiation, potential for quick deployment, and team potential.

Here are the five projects which received the highest scores from our judges. Each team will receive 8 weeks of mentorship from experts within our Mozilla community, covering topics such as policy, business, engineering, and design. The mentorship is broad to better assist the teams in building out their proposed concepts.

Congratulations go to:

Gram Marg Solution for Rural Broadband

  • Team Leader: Prof. Abhay Karandikar
  • Location: Mumbai, India
  • Open source low-cost hardware prototype utilizing Television White Spectrum to provide affordable access to rural communities.

Freemium Mobile Internet (FMI)

  • Team Leader: Steve Song
  • Location: Lunenburg, Nova Scotia, Canada
  • A new business model for telecommunication companies to provide free 2G to enable all the benefits of the open web to all.

Afri-Fi: Free Public WiFi

  • Team Leader: Tim Human
  • Location: Cape Town, South Africa
  • Model to make Project Isizwe financially sustainable by connecting brands to an untapped, national audience, specifically low-income communities who otherwise cannot afford connectivity.

Free Networks P2P Cooperative

  • Team Leader: Bruno Vianna
  • Location: Rio de Janeiro, Brazil
  • Cooperative that enables communities to set-up networks to get access to the Internet and then supports itself through the cooperative fees, and while co-creating the knowledge and respecting the local cultures.

Zenzeleni “Do it for yourselves” Networks (ZN)

  • Team Leader: Dr Carlos Rey-Moreno
  • Location: Cape Town, South Africa
  • Bottom-up telecommunications co-operatives that allows the most disadvantaged rural areas of South Africa to self-provide affordable communications at a fraction of the cost offered by other operators.

While we will disclose further information about all of these teams and their projects in the coming weeks, here are some themes that we’ve seen in the submission process and our observations on these themes:

  • Cooperatives were a popular mechanism to grow buy-in and share responsibility and benefit across communities. This is in contrast to a more typical and transactional producer-consumer relationship.
  • Digital literacy was naturally integrated into solutions, but was rarely the lead idea. Instead it was the de facto addition. This signals that digital literacy in and of itself is not perceived as a full solution or service, but rather an essential part of enabling access to the Internet.
  • Many teams took into account the unbanked and undocumented in their solutions. There seemed to be a feeling that solutions for the people would come from the people, not governments or corporations.
  • There was a strong trend for service solutions to disintermediate traditional commercial relationships and directly connect buyers and sellers.
  • In media-centric solutions, the voice of the people was as important as authoritative sources. User generated content in the areas of local news was popular, as was enabling a distribution of voices to be heard.

What’s Next?

Following the mentorship period, on March 9, we will host a day-long event in New York City on the topic of affordable access and innovation. We will invite speakers and researchers from around the world to provide their valuable insights on the global debate, various initiatives, and the latest approaches to affordable access. The main feature of this event will be presentations by our semifinalists, with a thorough Q&A from our judges. We will then have a week of open public voting on EqualRating.com to help determine the winners of the Challenge. The winners will then be announced at RightsCon on March 29 in Brussels.

At this point we want to thank all who have sent us their ideas, organised or hosted an event, or helped to spread the word. We also want to thank our esteemed panel of judges for their time, insight, and mobilizing their communities. While we did have almost a hundred teams submit solutions, we also had thousands of people meeting and engaging in this content through our events, webinars, and website. With this in mind, Mozilla aims to further engage with more teams who sent us their concepts, connect them to our network, and continue to grow the community of people working on this important topic.

Let’s keep this spirit burning — not only through the end of our Challenge, but beyond.


Equal Rating Innovation Challenge: And the Semifinalists are… was originally published in Mozilla Open Innovation on Medium, where people are continuing the conversation by highlighting and responding to this story.

Planet MozillaReps Weekly Meeting Jan. 19, 2017

Reps Weekly Meeting Jan. 19, 2017 This is a weekly call with some of the Reps to discuss all matters about/affecting Reps and invite Reps to share their work with everyone.

Planet MozillaDigital Citizens, Let’s Talk About Internet Health

Today, Mozilla is launching the prototype version of the Internet Health Report. With this open-source research project, we want to start a conversation with you, citizens of the Internet, about what is healthy, unhealthy, and what lies ahead for the Internet.

When I first fell in love with the Internet in the mid-1990s, it was very much a commons that belonged to everyone: a place where anyone online could publish or make anything. They could do so without asking permission from a publisher, a banker or a government. It was a revelation. And it made me — and countless millions of others — very happy.

Since then, the Internet has only grown as a platform for our collective creativity, invention and self expression. There will be five billion of us on the Internet by 2020. And vast swaths of it will remain as open and decentralized as they were in the early days. At least, that’s my hope.

<figure class="wp-caption alignleft" id="attachment_6735" style="width: 274px;">Facebook’s Mark Zuckerberg as a Roman emperor on the cover of The Economist (April 9-15, 2016)Facebook’s Mark Zuckerberg as a Roman emperor on the cover of The Economist (April 9-15, 2016)</figure>

Yet when Facebook’s Mark Zuckerberg shows up on the cover of The Economist depicted as a Roman emperor, I wonder: is the Internet being divided up into a few great empires monopolizing everyday activities like search, talking to friends or shopping? Can it remain truly open and decentralized?

Similarly, when I read about hackers turning millions of home webcams and video recorders into a botnet army, I wonder whether this precious public resource can remain safe, secure and dependable? Can it survive?

These questions are even more critical now that we move into an age where the Internet starts to wrap around us, quite literally.

Think about it: we are increasingly surrounded by connected devices meant to ‘help’ with every aspect of our lives — eating, walking, driving, growing food, finding a parking spot, having sex, building a widget, having a baby (or not), running a city. This so-called Internet of Things will include 20.8 billion devices by 2020, all collecting and collating data constantly.

The Internet of Things, autonomous systems, artificial intelligence: these innovations will no doubt bring good to our lives and society. However, they will also create a world where we no longer simply ‘use a computer,’ we live inside it.

This changes the stakes. The Internet is now our environment. How it works — and whether it’s healthy — has a direct impact on our happiness, our privacy, our pocketbooks, our economies and democracies.

This is why I wake up every day thinking about the health of the Internet. It’s also why I’m so focused on getting more people to think of it as an issue that affects all of us.

Environmentalists in the 1960s faced the same problem. Few people knew that the health of the planet was at risk. They built a global movement that helped the public understand nerdy topics like the ozone layer and renewable energy, eventually changing laws and sparking a swath of industries committed to green business. They made the environment a mainstream issue.

We need a similar movement for the health of the Internet. We need to help people understand what’s at risk and what they can do.

We have started work on the Internet Health Report at Mozilla for exactly this reason. It is an open source project to document and explain what’s happening to this valuable public resource. We put the report together with data from multiple sources and combined it with stories from the ground.

This initial version of the report unpacks the health of the web across five issues that range from familiar Mozilla topics like: decentralization, open innovation, and online privacy and security; to newer areas like digital inclusion and web literacy. We chose to focus on these issues because they all have an influence on the social, technical, political and economic shape of the internet. Deeply intertwined, these issues — and the choices we make around them — have a deep impact on the health of the Internet, for better or for worse.

We’re hoping that you will read what we’ve started, comment in the margins, hack it and share it, to make it better. If you’d like to write, contribute research or otherwise get involved in future versions of the report, reach out to Solana Larsen, our Internet Health Report Editor, with your ideas. Your feedback will help build the next version of this report.

The good news is we can impact the health of the Internet. It’s designed that way. We can build new parts and teach people to get the most out of what’s there. We can point out what’s wrong and make it better. If we do this kind of work together, I believe we can expand and fuel the movement to keep the Internet much healthier for the future.

 [This blog post originally appeared on blog.mozilla.org on January 19, 2019]

The post Digital Citizens, Let’s Talk About Internet Health appeared first on Mark Surman.

Planet MozillaDigital Citizens, Let’s Talk About Internet Health

Today, Mozilla is launching the prototype version of the Internet Health Report. With this open-source research project, we want to start a conversation with you, citizens of the Internet, about what is healthy, unhealthy, and what lies ahead for the Internet.

When I first fell in love with the Internet in the mid-1990s, it was very much a commons that belonged to everyone: a place where anyone online could publish or make anything. They could do so without asking permission from a publisher, a banker or a government. It was a revelation. And it made me — and countless millions of others — very happy.

Since then, the Internet has only grown as a platform for our collective creativity, invention and self expression. There will be five billion of us on the Internet by 2020. And vast swaths of it will remain as open and decentralized as they were in the early days. At least, that’s my hope.

Yet when Facebook’s Mark Zuckerberg shows up on the cover of The Economist depicted as a Roman emperor, I wonder: is the Internet being divided up into a few great empires monopolizing everyday activities like search, talking to friends or shopping? Can it remain truly open and decentralized?

EZ

Facebook’s Mark Zuckerberg as a Roman emperor on the cover of The Economist (April 9-15, 2016)

Similarly, when I read about hackers turning millions of home webcams and video recorders into a botnet army, I wonder whether this precious public resource can remain safe, secure and dependable? Can it survive?

These questions are even more critical now that we move into an age where the Internet starts to wrap around us, quite literally.

Think about it: we are increasingly surrounded by connected devices meant to ‘help’ with every aspect of our lives — eating, walking, driving, growing food, finding a parking spot, having sex, building a widget, having a baby (or not), running a city. This so-called Internet of Things will include 20.8 billion devices by 2020, all collecting and collating data constantly.

The Internet of Things, autonomous systems, artificial intelligence: these innovations will no doubt bring good to our lives and society. However, they will also create a world where we no longer simply ‘use a computer,’ we live inside it.

This changes the stakes. The Internet is now our environment. How it works — and whether it’s healthy — has a direct impact on our happiness, our privacy, our pocketbooks, our economies and democracies.

This is why I wake up every day thinking about the health of the Internet. It’s also why I’m so focused on getting more people to think of it as an issue that affects all of us.

Environmentalists in the 1960s faced the same problem. Few people knew that the health of the planet was at risk. They built a global movement that helped the public understand nerdy topics like the ozone layer and renewable energy, eventually changing laws and sparking a swath of industries committed to green business. They made the environment a mainstream issue.

We need a similar movement for the health of the Internet. We need to help people understand what’s at risk and what they can do.

We have started work on the Internet Health Report at Mozilla for exactly this reason. It is an open source project to document and explain what’s happening to this valuable public resource. We put the report together with data from multiple sources and combined it with stories from the ground.

This initial version of the report unpacks the health of the web across five issues that range from familiar Mozilla topics like: decentralization, open innovation, and online privacy and security; to newer areas like digital inclusion and web literacy. We chose to focus on these issues because they all have an influence on the social, technical, political and economic shape of the internet. Deeply intertwined, these issues — and the choices we make around them — have a deep impact on the health of the Internet, for better or for worse.

We’re hoping that you will read what we’ve started, comment in the margins, hack it and share it, to make it better. If you’d like to write, contribute research or otherwise get involved in future versions of the report, reach out to Solana Larsen, our Internet Health Report Editor, with your ideas. Your feedback will help build the next version of this report.

The good news is we can impact the health of the Internet. It’s designed that way. We can build new parts and teach people to get the most out of what’s there. We can point out what’s wrong and make it better. If we do this kind of work together, I believe we can expand and fuel the movement to keep the Internet much healthier for the future.

Planet MozillaThe values of sharing and archiving when working

This was written in the spirit of sharing and self-instropecting the way I prefer working.

I had a simple conversation (edited for expressing a general idea and not focusing on a specific case) about participation (after a request of a 1-1 meeting). I replied:

About the off-discussions, I will encourage you to do everything on record. It is a lot better and more inclusive and helps to keep history of the discussions. So use the issues tracker, the mailing-list, etc. :) but let's make it in public, not 1 to 1. ;)

which the person replied:

yes on the documentation part. but I want to gather information first, so a 1:1 would be great. Just to get a feeling in which direction the project should head.

I was about to send an email as an answer, but thought it would be better for a wider audience, and it would also probably help clarify my own ideas about it.

The thing which makes me uncomfortable is, I guess, this part: yes on the documentation part. but I want to gather information first. The gather information part, the process in which we assemble ideas to come up with a draft is essential to me in terms of understanding the path as a group we take, because the voyage is more important than the destination. There's not that much difference for me in between the gather information and the documentation part. This is probably due to two things in my life:

The more inclusive we are when drafting things, exploring ideas, recording them, the better it is for the community as large. Community doesn't necessary mean the world. It could be the community of interests. It's all about the context. Even if only two persons are discussing the issue, the fact that it is on record and archived will/might help, even the silent readers.

  • some might confirm their own opinions.
  • or they might be encouraged to make their own point because they have a slight disagreement.
  • or they might bring to the table a point that the two persons in the 1:1 had not thought about.
  • And quite important to me, it automatically describes the past, and leaves a trail of data making it easier to connect the dots in 10 to 20 years from now.

I understand that I surprise people by my own determination (stubbornness) to work this way. We are imperfect souls. We do mistakes. And we are afraid to do them in public. The typical example for me is my JavaScript skills, but on the long term, it helped me a lot more to do my naive mistakes and acknowledged them in recorded environment than to get stucked. People want to help. Our own mistakes help others. It's also why the amazing work—86 episodes already!—done my Mike Conley, The Joy of Coding, is essential to an healthy community and encouragement to the rest of the people. If you had to watch one, just start with the first one. See how Mike is stumbling, hesitating, going back and forth, how he is documenting is thoughts process. This is liberating for others who are less comfortable with this type of code.

So when you think you are ashamed, struggling with imperfect thoughts, if you need a push in the back to encourage you to go on records, just think about all the people you will help doing that. The people less knowledgeable than you. Yes they exist. That's the altruistic part. And think about all the things you will learn in the process with people more knowledgeable than you. That's the amazing self-development part.

Comments

Thinking in The Open

A post by Akshay S Dinesh on January 18, 2017 - Thinking in The Open

… how cool it would be if every idea that every person has is documented and permanently stored on the internet.

Slightly related to this comment from Akshay: Paper trail

Otsukare!

Planet MozillaContinuing Advances in Patent Quality

Patent maths

As we have previously written, long term patent terms impede the short innovation cycle and continuous iteration of software development. Innovation is harmed by overbroad software patents that cover large swaths of general activity and create uncertainty as to litigation. That’s why Mozilla submitted written comments today to the U.S. Patent and Trademark Office explaining our belief that recent U.S. Supreme Court rulings on what is eligible for patent protection have resulted in improvements to patent quality.

Two years ago, the U.S. Supreme Court unanimously ruled in Alice Corp. Pty. Ltd. v. CLS Bank International that patent claims that merely require generic computer implementation of abstract ideas are not patentable inventions. This was an important step towards mitigating the negative effects that overbroad and poorly written software patents have had for software developers, and the Internet as a whole. As a result, other federal courts have invalidated many other broad and abstract software patents, and the USPTO has made efforts to better incorporate subject matter eligibility in their examination procedures. Companies have also improved the quality of their patent applications and are more selective in filing applications. Our USPTO comments reaffirmed our belief that Alice has had a positive effect on the industry and should continue to be integrated into USPTO procedures and case law. We believe this would be disrupted if Congress were to prematurely intervene by altering or rolling back court rulings on patent subject matter eligibility.

Our mission and work let us see first-hand how patents affect software development and innovation on the Open Web — from the software we write and ship, to our participation and collaboration on open standards. What’s more, our non-profit roots put us in a unique position to talk openly and candidly about patents. We are glad for the opportunity to provide this direct feedback to the USPTO on this very important subject for software developers, and open source projects everywhere.

Planet MozillaVideo-conferencing the right way

I work from home. I have been doing so for the last four years, ever since I joined Mozilla. Some people dislike it, but it suits me well: I get the calm, focused, relaxing environment needed to work on complex problems all in the comfort of my home.

Even given the opportunity, I probably wouldn't go back to working in an office. For the kind of work that I do, quiet time is more important than high bandwidth human interaction.

Yet, being able to talk to my colleagues and exchanges ideas or solve problems is critical to being productive. That's where the video-conferencing bit comes in. At Mozilla, we use Vidyo primarily, sometimes Hangout and more rarely Skype. We spend hours every week talking to each other via webcams and microphones, so it's important to do it well.

Having a good video setup is probably the most important and yet least regarded aspect of working remotely. When you start at Mozilla, you're given a laptop and a Vidyo account. No one teaches you how to use it. Should I have an external webcam or use the one on your laptop? Do I need headphones, earbuds, a headset with a microphone? What kind of bandwidth does it use? Those things are important to good telepresence, yet most of us only learn them after months of remote work.

When your video setup is the main interface between you and the rest of your team, spending a bit of time doing it right is far from wasted. The difference between a good microphone and a shitty little one, or a quiet room and taking calls from the local coffee shop, influence how much your colleagues will enjoy working with you. I'm a lot more eager to jump on a call with someone I know has good audio and video, than with someone who will drag me in 45 minutes of ambient noise and coughing in his microphone.

This is a list of tips and things that you should care about, for yourself, and for your coworkers. They will help you build a decent setup with no to minimal investment.

The place

It may seem obvious, but you shouldn't take calls from a noisy place. Airports, coffee shops, public libraries, etc. are all horribly noisy environments. You may enjoy working from those places, but your interlocutors will suffer from all the noise. Nowadays, I refuse to take calls and cut meetings short when people try to force me into listening to their surrounding. Be respectful of others and take meetings from a quiet space.

Bandwidth

Despite what ISPs are telling you, no one needs 300Mbps of upstream bandwidth. Take a look at the graph below. It measures the egress point of my gateway. The two yellow spikes are video meetings. They don't even reach 1Mbps! In the middle of the second one, there's a short spike at 2Mbps when I set Vidyo to send my stream at 1080p, but shortly reverted because that software is broken and the faces of my coworkers disappeared. Still, you get the point: 2Mbps is the very maximum you'll need for others to see you, and about the same amount is needed to download their streams.

You do want to be careful about ping: latency can increase up to 200ms without issue, but even 5% packet drop is enough to make your whole experience miserable. Ask Tarek what bad connectivity does to your productivity: he works from a remote part of france where bandwidth is scarce and latency is high. I coined him the inventor of the Tarek protocol, where you have to repeat each word twice for others to understand what you're saying. I'm joking, but the truth is that it's exhausting for everyone. Bad connectivity is tough on remote workers.

(Tarek thought it'd be worth mentioning that he tried to improve his connectivity by subscribing to a satellite connection, but ran into issues in the routing of his traffic: 700ms latency was actually worse than his broken DSL.)

Microphone

Perhaps the single most important aspect of video-conferencing is the quality of your microphone and how you use it. When everyone is wearing headphones, voice quality matters a lot. It is the difference between a pleasant 1h conversation, or a frustrating one that leaves you with a headache.

Rule #1: MUTE!

Let me say that again: FREAKING MUTE ALREADY!

Video softwares are terrible at routing the audio of several people at the same time. This isn't the same as a meeting room, where your brain will gladly separate the voice of someone you're speaking to from the keyboard of the dude next to you. On video, everything is at the same volume, so when you start answering that email while your colleagues are speaking, you're pretty much taking over their entire conversation with keyboard noises. It's terrible, and there's nothing more annoying than having to remind people to mute every five god damn minutes. So, be a good fellow, and mute!

Rule #2: no coughing, eating, breathing, etc... It's easy enough to mute or move your microphone away from your mouth that your colleagues shouldn't have to hear you breathing like a marathoner who just finished the olympics. We're going back to rule #1 here.

Now, let's talk about equipment. A lot of people neglect the value of a good microphone, but it really helps in conversations. Don't use your laptop microphone, it's crap. And so is the mic on your earbuds (yes, even the apple ones). Instead, use a headset with a microphone.

If you have a good webcam, it's somewhat ok to use the microphone that comes with it. The Logitech C920 is a popular choice. The downside of those mics is they will pick up a lot of ambient noise and make you sound distant. I don't like them, but it's an acceptable trade-off.

If you want to go all out, try one of those fancy podcast microphones, like the Blue Yeti.

You most definitely don't need that for good mic quality, but they sound really nice. Here's a recording comparing the Plantronic headset, the Logitech C920 and the Blue Yeti.

<audio controls="controls"> <source src="https://ulfr.io/f/testmic_yeti_plantronic_logitech.ogg" type="audio/ogg"></source> </audio>

Webcam

This part is easy because most laptops already come with 720p webcam that provide decent video quality. I do find the Logitech renders colors and depth better than the webcam embedded on my Lenovo Carbon X1, but the difference isn't huge.

The most important part of your webcam setup should be its location. It's a bit strange to have someone talk to you without looking straight at you, but this is often what happens when people place their webcam to the side of their screen.

I've experimented a bit with this, and my favorite setup is to put the webcam right in the middle of my screen. That way, I'm always staring right at it.

It does consume a little space in the middle of my display, but with a large enough screen - I use an old 720p 35" TV - doesn't really bother me.

Lighting and background are important parameters too. Don't bring light from behind, or your face will look dark, and don't use a messy background so people can focus on what you're saying. These factors contribute to helping others read your facial expressions, which are an important part of good communication. If you don't believe me, ask Cal Lightman ;).

Spread the word!

In many ways, we're the first generation of remote workers, and people are learning how to do it right. I believe video-conferencing is an important part of that process, and I think everyone should take a bit of time and improve their setup. Ultimately, we're all a lot more productive when communication flows easily, so spread the word, and do tell your coworkers when they setup is getting in the way of good conferencing.

Planet Mozillacurl

Today Mozilla revealed the new logo and “refreshed identity”. The new Mozilla logo uses the colon-slash-slash sequence already featured in curl’s logo as I’ve discussed previously here from back before the final decision (or layout) was complete.

The sentiment remains though. We don’t feel that we can in any way claim exclusivity to this symbol of internet protocols – on the contrary we believe this character sequence tells something and we’re nothing but happy that this belief is shared by such a great organization such as Mozilla.

Will the “://” in either logo make some users think of the other logo? Not totally unlikely, but I don’t consider that bad. curl and Mozilla share a lot of values and practices; open source, pro users, primarily client-side. I am leading the curl project and I am employed by Mozilla. Also, when talking and comparing brands and their recognition and importance in a global sense, curl is of course nothing next to Mozilla.

I’ve talked to the core team involved in the Mozilla logo revamping before this was all made public and I assured them that we’re nothing but thrilled.

Planet MozillaSecurity Audit Finds Nothing: News At 11

Secure Open Source is a project, stewarded by Mozilla, which provides manual source code audits for key pieces of open source software. Recently, we had a trusted firm of auditors, Cure53, examine the dovecot IMAP server software, which runs something like two thirds of all IMAP servers worldwide. (IMAP is the preferred modern protocol for accessing an email store.)

The big news is that they found… nothing. Well, nearly nothing. They managed to scrape up 3 “vulnerabilities” of Low severity.

Cure53 write:

Despite much effort and thoroughly all-encompassing approach, the Cure53 testers only managed to assert the excellent security-standing of Dovecot. More specifically, only three minor security issues have been found in the codebase, thus translating to an exceptionally good outcome for Dovecot, and a true testament to the fact that keeping security promises is at the core of the Dovecot development and operations.

Now, if we didn’t trust our auditors and they came back empty handed, we might suspect them of napping on the job. But we do, and so this sort of result, while seemingly a “failure” or a “waste of money”, is the sort of thing we’d like to see more of! We will know Secure Open Source, and other programs to improve the security of FLOSS code, are having an impact when more and more security audits come back with this sort of result. So well done to the dovecot maintainers; may they be the first of many.

Planet MozillaThe Joy of Coding - Episode 87

The Joy of Coding - Episode 87 mconley livehacks on real Firefox bugs while thinking aloud.

Planet MozillaData and People: A Discussion with Google's former SVP of People Operations Laszlo Bock

Data and People: A Discussion with Google's former SVP of People Operations Laszlo Bock When Google's head of People came out with the bestselling book Work Rules! last year, he debunked many myths. Adopting an experiments-based approach with their...

Planet MozillaThese Weeks in Firefox: Issue 8

The first Firefox Desktop Engineering meeting for 2017 took place this morning! Here are the interesting bits:

Highlights

  • past wrote a blog post about some upcoming privacy and security features
  • ashughes has posted some very interesting GPU Process Experiment Results
    • TL;DR: A GPU process is a very good idea!
  • The Cliqz Test Pilot experiment has launched in Germany!
    • It’s available in other regions as well, but the suggestions it offers will be very Germany-centric

Contributor(s) of the Week

Project Updates

Add-ons
Content Handling Enhancement
Electrolysis (e10s)
Firefox Core Engineering
Form Autofill
Platform UI and other Platform Audibles
Privacy/Security
  • Florian reports that the new Captive Portal UI will ship in Firefox 52. Last bits of polish have just landed and are being uplifted.
Quality of Experience

Here are the raw meeting notes that were used to derive this list.

Want to help us build Firefox? Get started here!

Here’s a tool to find some mentored, good first bugs to hack on.

Planet MozillaArrival

Seven months since setting out to refresh the Mozilla brand experience, we’ve reached the summit. Thousands of emails, hundreds of meetings, dozens of concepts, and three rounds of research later, we have something to share. If you’re just joining this process, you can get oriented here and here.

At the core of this project is the need for Mozilla’s purpose and brand to be better understood by more people. We want to be known as the champions for a healthy Internet. An Internet where we are all free to explore and discover and create and innovate without barriers or limitations. Where power is in the hands of many, not held by few. An Internet where our safety, security and identity are respected.

Today, we believe these principles matter more than ever. And as a not-for-profit organization, we’re uniquely able to build products, technologies, and programs that keep the Internet growing and healthy, with individuals informed and in control of their online lives.

Our brand identity – our logo, our voice, our design – is an important signal of what we believe in and what we do. And because we are so committed to ensuring the Internet is a healthy global public resource, open and accessible to everyone, we’ve designed the language of the Internet into our brand identity.

Today, we’re sharing our new logo and a proposed color palette, language architecture, and imagery approach. Remaining true to our intent to engage with the design and tech community throughout this open design process, we welcome your feedback on these elements as we build out our design guidelines.

Let’s go into a bit more detail on the components of our brand identity system, developed in collaboration with our exceptional London-based design partner johnson banks.

Our logo

Our logo with its nod to URL language reinforces that the Internet is at the heart of Mozilla. We are committed to the original intent of the link as the beginning of an unfiltered, unmediated experience into the rich content of the Internet.

Mozilla-12jan-1500px_logo

The font for the wordmark and accompanying copy lines is Zilla. Created for us by Typotheque in the Netherlands, Zilla is free and open to all.

Typotheque was an historic partner to Mozilla. They were the first type-foundry to release Web-based fonts, and Mozilla’s Firefox web browser was an early adopter of Web fonts. We chose to partner with Peter Bilak from Typotheque because of their deep knowledge of localization of fonts, and our commitment to having a font that includes languages beyond English. Prior to partnering with Typotheque, we received concepts and guidance from Anton Koovit and FontSmith.

johnsonbanks_Mozilla_zilla_type_2

Selected to evoke the Courier font used as the original default in coding, Zilla has a journalistic feel reinforcing our commitment to participate in conversations about key issues of Internet health. It bucks the current convention of sans serif fonts. Anyone can create the Mozilla logo by typing and highlighting with the Zilla font, making the logo open and democratic. The black box surrounding the logo is a key building block of the design, and echoes the way we all select type in toolbars and programs.

Mozilla comes first in any application of the system, just as the protocol begins any internet journey. Copy lines, colors, and images all flow from that starting point, much like a web journey.

Our color palette

Our color palette, derived from the highlight colors used by Firefox and other web browsers, distinguishes our brand from its contemporaries. Color flows into our logo and changes according to the context in which the logo is used. As we develop our style guide, we’ll define color pairings, intensities, and guidelines.

Mozilla-12jan-1500px_color

Our language and language architecture

Copy lines to the right or below the logo hold core Mozilla messages.  They also hold program, event, and team names — simplifying and unifying a multitude of different Mozilla activities. It will now be easier to know that something is “from” Mozilla and understand how our global initiatives connect and reinforce one another.

The system enables Mozilla volunteer communities across the globe to create their own identity by selecting color and choosing imagery unique to them. Meanwhile the core blocks of our system, bounding boxes and typography, will provide the consistency, making it clear that these communities are part of one Mozilla.

Mozilla-12jan-1500px_architecture

Our Imagery

As we looked at the elements of our brand identity, the concept of one image or icon standing for the whole of Mozilla, and the entirety of the Internet, seemed anachronistic. Since imagery is an important reflection of the diversity and richness of the Internet, however, we’ve made it an important component of our system.

 

In digital applications, ever-changing imagery represents the unlimited bounty of the online ecosystem. Dynamic imagery allows the identity of Mozilla to evolve with the Internet itself, always fresh and new. Static applications of our identity system include multiple, layered images as if taken as a still frame within a moving digital experience.

Mozilla-12jan-1500px_imagery

How might it work? We intend to invite artists, designers, and technologists to contribute to an imagery collective, and we’ll code curated GIFs, animations, and still images to flow into mozilla.org and and other digital experiences. Through this open design approach, we will engage new design contributors and communities, and make more imagery available to all under Creative Commons. We’re looking for input from creative communities to help shape and expand this idea.

Mozilla-12jan-1500px_apps2

 

 

 

We will roll out the new brand identity in phases, much as we have with concepts in this open design process, so please be patient with our progress. As we develop our design system, we look forward to hearing your feedback and suggestions using the comments below. You’ve been with us from the start and we’re glad you’re here. We’ll continue to share updates and comments in this space.

Mozilla-12jan-1500px_environmental

Mozilla-12jan-1500px_apps1

 

Photo credits
Brandenburg Gate https://commons.wikimedia.org/wiki/File:Berlin_-_0266_-_16052015_-_Brandenburger_Tor.jpg
Iron Filings https://www.flickr.com/photos/oskay/4581194252

Planet MozillaAnnouncing git-cinnabar 0.4.0

Git-cinnabar is a git remote helper to interact with mercurial repositories. It allows to clone, pull and push from/to mercurial remote repositories, using git.

Get it on github.

These release notes are also available on the git-cinnabar wiki.

What’s new since 0.3.2?

  • Various bug fixes.
  • Updated git to 2.11.0 for cinnabar-helper.
  • Now supports bundle2 for both fetch/clone and push (https://www.mercurial-scm.org/wiki/BundleFormat2).
  • Now Supports git credential for HTTP authentication.
  • Now supports git push --dry-run.
  • Added a new git cinnabar fetch command to fetch a specific revision that is not necessarily a head.
  • Added a new git cinnabar download command to download a helper on platforms where one is available.
  • Removed upgrade path from repositories used with version < 0.3.0.
  • Experimental (and partial) support for using git-cinnabar without having mercurial installed.
  • Use a mercurial subprocess to access local mercurial repositories.
  • Cinnabar-helper now handles fast-import, with workarounds for performance issues on macOS.
  • Fixed some corner cases involving empty files. This prevented cloning Mozilla’s stylo incubator repository.
  • Fixed some correctness issues in file parenting when pushing changesets pulled from one mercurial repository to another.
  • Various improvements to the rules to build the helper.
  • Experimental (and slow) support for pushing merges, with caveats. See issue #20 for details about the current status.
  • Fail graft earlier when no commit was found to graft
  • Allow graft to work with git version < 1.9
  • Allow git cinnabar bundle to do the same grafting as git push

Planet MozillaCall for Nominations – D&I for Participation Focus Groups

heart-1187266_960_720

The heart of Mozilla is people – we are committed to a community that invites in and empowers people to participate fully, introduce new ideas and inspire others, regardless of background, family status, gender, gender identity or expression, sex, sexual orientation, native language, age, ability, race and ethnicity, national origin, socioeconomic status, religion, geographic location or any other dimension of diversity.

In a previous post, I outlined a draft for a D&I for Participation Strategy.  Since this time, we’ve been busily designing the insights phase  – which launches today.  In this first phase, we’re asking Mozillians to self-nominate, or nominate others for a series of focus groups with D&I topics relevant to regional leadership, events, project design and participation in projects and beyond.  These insights will generate initiatives and experiments that lead us to a first version of the strategy

To be successful moving forward, our focus groups must represent the diversity of our global community including a range of:

  • First languages other than English

  • Region & time zones

  • Skill-sets (technical and non-technical) contribution

  • Active & inactive contributors

  • Students & professionals

  • Gender identities

  • Ethnicities, races, and cultural backgrounds

  • Ages

  • Bandwidth & accessibility needs
  • Students, professionals, retired and everyone in between

  • New, emerging and established contributors

  • Staff, community project maintainers designing for participation

Focus groups will be conducted in person, online and many in first languages other than English.  We’ll also be reaching out for 1:1 interviews where it feels more suitable to group interviews.

If you believe that you, or someone you know can provide key insights needed for a d&I community strategy – please, please, please nominate them!  Thank you!

Nominations will close January 23rd at 12:00 UTC.   For more information, and updates please check our D&I for Participation  Strategy wiki.

FacebookTwitterGoogle+Share

Planet MozillaServo Talk at LCA 2017

My talk from Linux.conf.au was just posted, and you can go watch it. In it I cover some of the features of Servo that make it unique and fast, including the constellation and WebRender.

<figure>Servo Architecture: Safety & Performance by Jack Moffitt, LCA 2017, Hobart, Australia.</figure>

Planet MozillaAdd-ons Update – 2017/01

Here’s the state of the add-ons world this month.

If you haven’t read Add-ons in 2017, I suggest that you do. It lays out the high-level plan for add-ons this year.

The Review Queues

In the past month, 1,412 listed add-on submissions were reviewed:

  • 1015 (72%) were reviewed in fewer than 5 days.
  • 63 (4%) were reviewed between 5 and 10 days.
  • 334 (24%) were reviewed after more than 10 days.

There are 415 listed add-ons awaiting review.

If you’re an add-on developer and are looking for contribution opportunities, please consider joining us. Add-on reviewers are critical for our success, and can earn cool gear for their work. Visit our wiki page for more information.

Compatibility

The compatibility blog post for Firefox 51 is up, and the bulk validation was run. The blog post for 52 is also up and the bulk validation is pending.

Multiprocess Firefox is enabled for some users, and will be deployed for all users very soon. Make sure you’ve tested your add-on and either use WebExtensions or set the multiprocess compatible flag in your add-on manifest.

As always, we recommend that you test your add-ons on Beta and Firefox Developer Edition to make sure that they continue to work correctly. End users can install the Add-on Compatibility Reporter to identify and report any add-ons that aren’t working anymore.

Recognition

We would like to thank the following people for their recent contributions to the add-ons world:

  • saintsebastian
  • Atique Ahmed Ziad
  • Aniket Kudale
  • Sailesh Choyal
  • Laurent
  • Azharul Islam
  • Piyush Rungta
  • Raffaele Spinelli
  • Shubheksha Jalan
  • Rob Wu
  • euleram
  • asamusaK
  • SaminRK

You can read more about their work in our recognition page.

Planet MozillaEqual Rating Innovation Challenge: And the Semifinalists are…

Announcing the five innovative concepts that made it to the final round

 

About three months ago we launched this global Equal Rating Innovation Challenge to help catalyze new thinking and innovation to provide access to the open Internet to those still living without. Clearly the idea resonated. Thanks to the help of numerous digital inclusion initiatives, think tanks, impact hubs and various local communities that supported us, our challenge has spurred global engagement. We received 98 submissions from 27 countries around the world. This demonstrates that there are entrepreneurs, researchers, and innovators in myriad fields poised to tackle this huge challenge with creative products and services.

Semifinalist Infographic

Our judging panel evaluated the submissions against the criteria of compliance with Equal Rating, affordability and accessibility, empathy, technical feasibility, as well as scalability, user experience, differentiation, potential for quick deployment, and team potential.

Here are the five projects which received the highest scores from our judges. Each team will receive 8 weeks of mentorship from experts within our Mozilla community, covering topics such as policy, business, engineering, and design. The mentorship is broad to better assist the teams  in building out their proposed concepts.

Congratulations go to:

Gram Marg Solution for Rural Broadband

  • Team Leader: Prof. Abhay Karandikar
  • Location: Mumbai, India
  • Open source low-cost hardware prototype utilizing Television White Spectrum to provide affordable access to rural communities.

Freemium Mobile Internet (FMI)

  • Team Leader: Steve Song
  • Location: Lunenburg, Nova Scotia, Canada
  • A new business model for telecommunication companies to provide free 2G to enable all the benefits of the open web to all.

Afri-Fi: Free Public WiFi

  • Team Leader: Tim Human
  • Location: Cape Town, South Africa
  • Model to make Project Isizwe financially sustainable by connecting brands to an untapped, national audience, specifically low-income communities who otherwise cannot afford connectivity.

Free Networks P2P Cooperative

  • Team Leader: Bruno Vianna
  • Location: Rio de Janeiro, Brazil
  • Cooperative that enables communities to set-up networks to get access to the Internet and then supports itself through the cooperative fees, and while co-creating the knowledge and respecting the local cultures.

Zenzeleni “Do it for yourselves” Networks (ZN)

  • Team Leader: Dr Carlos Rey-Moreno
  • Location: Cape Town, South Africa
  • Bottom-up telecommunications co-operatives that allows the most disadvantaged rural areas of South Africa to self-provide affordable communications at a fraction of the cost offered by other operators.

While we will disclose further information about all of these teams and their projects in the coming weeks, here are some themes that we’ve seen in the submission process and our observations on these themes:

  • Cooperatives were a popular mechanism to grow buy-in and share responsibility and benefit across communities. This is in contrast to a more typical and transactional producer-consumer relationship.
  • Digital literacy was naturally integrated into solutions, but was rarely the lead idea. Instead it was the de facto addition. This signals that digital literacy in and of itself is not perceived as a full solution or service, but rather an essential part of enabling access to the Internet.
  • Many teams took into account the unbanked and undocumented in their solutions. There seemed to be a feeling that solutions for the people would come from the people, not governments or corporations.
  • There was a strong trend for service solutions to disintermediate traditional commercial relationships and directly connect buyers and sellers.
  • In media-centric solutions, the voice of the people was as important as authoritative sources. User generated content in the areas of local news was popular, as was enabling a distribution of voices to be heard.

What’s Next?

Following the mentorship period, on March 9, we will host a day-long event in New York City on the topic of affordable access and innovation. We will invite speakers and researchers from around the world to provide their valuable insights on the global debate, various initiatives, and the latest approaches to affordable access. The main feature of this event will be presentations by our semifinalists, with a thorough Q&A from our judges. We will then have a week of open public voting on EqualRating.com to help determine the winners of the Challenge. The winners will then be announced at RightsCon on March 29 in Brussels.

At this point we want to thank all who have sent us their ideas, organised or hosted an event, or helped to spread the word. We also want to thank our esteemed panel of judges for their time, insight, and mobilizing their communities. While we did have almost a hundred teams submit solutions, we also had thousands of people meeting and engaging in this content through our events, webinars, and website. With this in mind, Mozilla aims to further engage with more teams who sent us their concepts, connect them to our network, and continue to grow the community of people working on this important topic.

Let’s keep this spirit burning – not only through the end of our Challenge, but beyond.

Planet MozillaWhat’s the First Firefox Crash a User Sees?

Growth is going to be a big deal across Mozilla in 2017. We spent 2016 solidifying our foundations, and now we’re going to use that to spring to action and grow our influence and user base.

So this got me thinking about new users. We’re constantly getting new users: people who, for one reason or another, choose to install and run Firefox for the first time today. They run it and… well, then what?

Maybe they like it. They open a new tab. Then they open a staggeringly unbelievable number of tabs. They find and install an addon. Or two.

Fresh downloads and installs of Firefox continue at an excellent pace. New people, every day, are choosing Firefox.

So with the number of new users we already see, the key to Growth may not lie in attracting more of them… it might be that we need to keep the ones we already see.

So what might stop a user from using Firefox? Maybe after they open the seventy-first tab, Firefox crashes. It just disappears on them. They open it again, browse for a little while… but can’t forget that the browser, at any time, could just decide to disappear and let them down. So they migrate back to something else, and we lose them.

It is with these ideas in my head that I wondered “Are there particular types of crashes that happen to new users? Do they more likely crash because of a plugin, their GPU misbehaving, running out of RAM… What is their first crash, and how might it compare to the broader ecosystem of crashes we see and fix every day?”

With the new data available to me thanks to Gabriele Svelto’s work on client-side stack traces, I figured I could maybe try to answer it.

My full analysis is here, but let me summarize: sadly there’s too much noise in the process to make a full diagnosis. There are some strange JSON errors I haven’t tracked down… and even if I do, there are too many “mystery” stack frames that we just don’t have a mechanism to figure out yet.

And this isn’t even covering how we need some kind of service or algorithm to detect signatures in these stacks or at least cluster them in some meaningful way.

Work is ongoing, so I hope to have a more definite answer in the future. But for now, all I can do is invite you to tell me what you think causes users to stop using Firefox. You can find me on Twitter, or through my @mozilla.com email address.

:chutten


Planet MozillaShell script to record a window into an animated GIF

Part of my work consists of spreading what new features land on Nightly for our Twitter account and sometimes an animated Gif to show how a new feature works or how to activate it easier than trying to squeeze explanations into 140 characters.

Initially I was doing a video screencast and then converting the video into a Gif but I wasn't happy with the quality of the end result and the whole process was time consuming. I ended up searching for a better solution and found out about byzanz-record, a command that allows screencasting directly as a Gif and I think is easier to use for a Linux user than playing with ffmpeg. I ended up tweaking a script I found on Stackoverflow and this is what I use in the end:

Other people using Linux may have similar needs so maybe that will help you guys discover this command.

Planet MozillaMartes Mozilleros, 17 Jan 2017

Martes Mozilleros Reunión bi-semanal para hablar sobre el estado de Mozilla, la comunidad y sus proyectos. Bi-weekly meeting to talk (in Spanish) about Mozilla status, community and...

Planet MozillaMy visit to the medical Holodeck – cancer research at Weill Cornell using HoloLens and the VR Cave

Interactive VR demo of going through MRI data
I just spent a few days in New York setting up a workshop to help minority students to get into development (soon more on that). I was lucky to be in Microsoft’s Reactor when Alex Sigaras, a research associate in computational biomedicine at Weill Cornell Medicine gave a talk about how HoloLens transforms healthcare research for the HoloLens Developer Group in New York.

I took the opportunity to talk to Alex for Decoded Chats about that. We also covered other topics such as sharing of information in healthcare. And how HoloLens despite being a high-end and rare device allows for collaboration of experts in all feld and not only developers.

If you prefer to have an audio version, you can download it here (MP3, 19MB)

<audio controls="controls" src="http://techchatsdata.azurewebsites.net/DECODED-Chats-Alexandros-Sigaras-on-HoloLens-in-Medical-Research.mp3"></audio>

Here are the questions we covered:

  1. You just gave a talk at a HoloLens meetup about medical research. Who are you and what do you do with HoloLens?
  2. What are the benefits of using the HoloLens as a visualisation tool in computational medicine compared to VR environments?
  3. Is there a collbaboration benefit in augmented reality and mixed reality rather than virtual reality? Does it scale better in bigger groups?
  4. Genomics is known to have to deal with huge amounts of data. Isn’t that an issue on a device that is self-contained like the HoloLens?
  5. Most of the HoloLens demos you see are single person use. Your use case is pushing the collaborative part of the device. How does that work out?
  6. What is the development stack you use? Did you find it hard to port to the device and to re-use code of other, VR, solutions you already had?
  7. Do you also have some success stories where using HoloLens helped find a data correlation faster than any of the other ways you used before?
  8. Is there any way for the audience here to collaborate in your research and help you further breaking down silos in medical research?

You can see the HoloLens work Alex and his team are working on in this tweet.


The slides of his talk are on SlideShare and have a lot more information on the topic.

In addition to visiting Alex at work, I also got a special treat to have a demo of their other VR work, including The Cave, a room with 5 walls that are rear-projected screens allowing you to get detailed 3D views of MRI scans.

Here’s a very raw an unedited video of Vanessa Borcherding (@neezbeez) showing their research in VR and the insights it can give you.

Warning: unless you are also wearing 3D glasses, this video flickers a lot:

I left the hospital and research facility and had to take a long walk in Central Park. It is not every day you see things that you always considered science fiction and a faraway dream happen right now. I’m looking forward to working more with these people, even if I felt utterly lost and the dummy in the room. It is great to see that technology that on first glance looks great for gaming and entertainment can help experts of all walks of life to do important work to make people live longer.

Planet MozillaFirefox for iOS Users Can Now Choose Their Favorite Email Apps

For most of us email is a big part of our online lives. Today we’re excited to share that we’ve made updates to the email experience in Firefox for iOS, making it possible to choose your favorite email app when sending emails from pages browsed with Firefox.

We identified some of the mail applications preferred by Firefox users around the world and included those email apps in this update. So whether it is Microsoft Outlook, Airmail, Mail.Ru, MyMail, or Spark, you can easily send an email by tapping an email link displayed in the browser. That will open up your selected email app with the desired email address automatically populated in the address field. In a similar fashion, users can also update their settings in these email apps to automatically open any embedded link in Firefox.

You can choose your favorite email program in Firefox by going into settings in the Firefox for iOS app and selecting from the email programs listed.

You can also use Firefox to automatically open links embedded in emails by going into the settings menu of your preferred email app and selecting Firefox.

It’s clever, quick and simple – and more flexible. Because we want you to browse the Internet freely, the way you want, on Firefox.  Get the latest Firefox for iOS here.

 

List of Mail Partners

To experience the newest feature and use the latest version of Firefox for iOS, download the update and let us know what you think.

 

We hope you enjoy the latest version.

Planet MozillaNotes on HACS 2017

Real World Crypto is probably one of my favorite conferences. It’s a fine mix of practical and theoretical talks, plus a bunch of great hallway, lunch, and dinner conversations. It was broadcasted live for the first time this year, and the talks are available online. But I’m not going to talk more about RWC, others have covered it perfectly.

The HACS workshop

What I want to tell you about is a lesser-known event that took place right after RWC, called HACS - the High Assurance Crypto Software workshop. An intense, highly interactive two-day workshop in its second year, organized by Ben Laurie, Gilles Barthe, Peter Schwabe, Meredith Whittaker, and Trevor Perrin.

Its stated goal is to bring together crypto-implementers and verification people from open source, industry, and academia; introduce them and their projects to each other, and develop practical collaborations that improve verification of crypto code.

The projects & people

The formal verification community was represented by projects such as miTLS, HACL*, Project Everest, Z3, VeriFast, tis-interpreter, ct-verif, Cryptol/SAW, Entroposcope, and other formal verification and synthesis projects based on Coq or F*.

Crypto libraries were represented by one or multiple maintainers of OpenSSL, BoringSSL, Bouncy Castle, NSS, BearSSL, *ring*, and s2n. Other invited projects included LLVM, Tor, libFuzzer, BitCoin, and Signal. (I’m probably missing a few, sorry.)

Additionally, there were some attendants not directly involved with any of the above projects but who are experts in formal verification or synthesis, constant-time implementation of crypto algorithms, fast arithmetic in assembler, elliptic curves, etc.

All in all, somewhere between 70 and 80 people.

HACS - Day 1

After short one-sentence introductions on early Saturday morning we immediately started with simultaneous round-table discussions, focused on topics such as “The state of crypto libraries”, “Challenges in implementing crypto libraries”, “Efficient fuzzing”, “TLS implementation woes”, “The LLVM ecosystem”, “Fast and constant-time low-level algorithm implementations”, “Formal verification/synthesis with Coq”, and others.

These discussions were hosted by a rotating set of people, not always leading by pure expertise, sometimes also moderating, asking questions, and making sure we stay on track. We did this until lunch, and continued to talk over food with the people we just met. For the rest of the day, discussions became longer and more focused.

By this point people slowly started to sense what it is they want to focus on this weekend. They got to meet most of the other attendants, found out about their skills, projects, and ideas; thought about possibilities for collaboration on projects for this weekend or the months to come.

In the evening we split into groups and went for dinner. Most people’s brains were probably somewhat fried (as was mine) after hours of talking and discussing. Everyone was so engaged that you not once found the time to take out your laptop or phone, or had the desire to do so, which was great.

HACS - Day 2

The second day, early Sunday morning, continued much like the previous. We started off with a brainstorming session for what we think the group should be working on. The rest of the day was filled with long and focused discussion that were mostly a continuation from the day before.

A highlight of the day was the skill sharing session, where participants could propose a specific skill to share with others. If you didn’t find something to share you could be one of the 50% of the group that gets to learn from others.

My lucky pick was Chris Hawblitzel from Microsoft Research, who did his best to explain to me (in about 45 minutes) how Z3 works, what its limitations are, and what higher-level languages exist that make it a little more usable. Thank you, Chris!

We ended the day with signing up for one or multiple projects for the last day.

HACS - Day 3

The third day of the workshop was optional, a hacking day with maybe roughly 50% attendance. Some folks took the chance to arrive a little later after two days of intense discussions and socializing. By now you knew most people’s names, and you better did because no one cared to wear name tags anymore.

It was the time to get together with the people from the projects you signed up for and get your laptop out (if needed). I can’t possibly remember all the things people worked on but here are a few examples:

  • Verify DRBG implementations, various other crypto algorithms, and/or integrate synthesized implementations for different crypto libraries.
  • Brainstorm and experiment with a generic comparative fuzzing API for libFuzzer.
  • Come up with an ASCII representation for TLS records, similar to DER ASCII, that could be used to write TLS implementation tests or feed fuzzers.
  • Start fuzzing projects like BearSSL and Tor. I do remember that at least BearSSL quickly found a tiny (~900 byte) buffer overflow :)

See you again next year?

I want to thank all the organizers (and sponsors) for spending their time (or money) planning and hosting such a great event. It always pays off to bring communities closer together and foster collaboration between projects and individuals.

I got to meet dozens of highly intelligent and motivated people, and left with a much bigger sense of community. I’m grateful to all the attendants that participated in discussions and projects, shared their skills, asked hard questions, and were always open to suggestions from others.

I hope to be invited again to future workshops and check in on the progress we’ve made at improving the verification and quality assurance of crypto code across the ecosystem.

Planet MozillaE0

Long looooong ago, I wrote a deep review of the XHTML 2.0 spec that was one of the elements that led to the resuming of the HTML activity at the W3C and the final dismissal of XHTML 2.0. 

Long ago, I started a similar effort on EPUB that led to Dave Cramer's EPUB Zero. It's time (fr-FR) to draw some conclusions.

This document is maintained on GitHub and accepting contributions.

Daniel Glazman

OCF

  1. EPUB publications are not "just a zip". They are zips with special constraints. I question the "mimetype file in first uncompressed position" constraint since I think the vast majority of reading systems don't care (and can't care) because most of the people creating EPUB at least partially by hand don't know/care. The three last contraints (zip container fields) on the ZIP package described in section 4.2 of the spec are usually not implemented by Reading Systems.
  2. the container element of the META-INF/container.xml file has a version attribute that is always "1.0", whatever the EPUB version. That forces editors, filters and reading systems to dive into the default rendition to know the EPUB version and that's clearly one useless expensive step too much.
  3. having multiple renditions per publication reminds me of MIME multipart/alternative. When Borenstein and Freed added it to the draft of RFC 1341 some 25 years ago, Mail User Agents developers (yours truly counted) envisioned and experimented far more than alternatives between text/html and text/plain only. I am under the impression multiple renditions start from the same good will but fail to meet the goals for various reasons:
    1. each additional rendition drastically increases the size of the publication...
    2. most authoring systems, filters and converters on the market don't deal very well with multiple renditions
    3. EPUB 2 defined the default rendition as the first rendition with a application/oebps-package+xml  mimetype while the EPUB 3 family of specs defines it as the first rendition in the container
    4. while a MIME-compliant Mail User Agent will let you compose a message in text/html and output for you the multipart/alternative between that text/html and its text/plain serialization, each Publication rendition must be edited separately.
    5. in the case of multiple renditions, each rendition has its own metadata and it's then legitimate to think the publication needs its own metadata too. But the META-INF/metadata.xml file has always been quoted as "This version of the OCF specification does not define metadata for use in the metadata.xml file. Container-level metadata may be defined in future versions of this specification and in IDPF-defined EPUB extension specifications." by all EPUB specifications. The META-INF/metadata.xml should be dropped.
    6. encryption, rights and signatures are per-publication resources while they should be per-rendition resources.
  4. the full-path attribute on rootfile elements is the only path in a publication that is relative to the publication's directory. All other URIs (for instance in href attributes) are relative to the current document instance. I think full-path should be deprecated in favor of href here, and finally superseded by href for the next major version of EPUB.
  5. we don't need the mimetype attribute on rootfile elements since the prose of EPUB 3.1 says the target of a rootfile must be a Package Document, i.e. an OPF file... If EPUB 2 OCF could directly target for instance a PDF file, it's not the case any more for OCF 3.
  6. absolute URIs (for instance /files/xhtml/index.html with a leading slash, cf. path-absolute construct in RFC 3986) are harmful to EPUB+Web convergence
  7. if multiple renditions are dropped, the META-INF/container.xml file becomes useless and it can be dropped.
  8. the prose for the META-INF/manifest.xml makes me wonder why this still exists: "does not mandate a format", "MUST NOT be used". Don't even mention it! Just say that extra unspecified files inside META-INF directory must be ignored (Cf. OCF section 3.5.1) , and possibly reserve the metadata.xml file name, period. Oh, and a ZIP is also a manifest of files...
  9. I am not an expert of encryption, signatures, XML-ENC Core or XML DSIG Core, so I won't discuss encryption.xml and signatures.xml files
  10. the rights.xml file still has no specified format. Strange. Cf. item 8 above.
  11. Resource obfuscation is so weak and useless it's hilarious. Drop it. It is also painful in EPUB+Web convergence.
  12. RNG schemas are not enough in the OCF spec. Section 3.5.2.1 about container.xml file for instance shouls have models and attribute lists for each element, similarly to the Packages spec.
  13. not sure I have ever seen links and link elements in a container.xml file... (Cf. issue #374). The way these links are processed is unspecified anyway. Why are these elements normatively specified since extra elements are allowed - and explicitely ignored by spec - in the container?

Packages

  1. a Package consists of one rendition only.
  2. I have never understood the need for a manifest of resources inside the package, probably because my own publications don't use Media Overlays or media fallbacks.
  3. fallbacks are an inner mechanism also similar to multipart/alternative for renditions. I would drop it.
  4. I think the whole package should have properties identifying the required technologies for the rendition of the Package (e.g. script), and avoid these properties on a per-item basis that makes no real sense. The feature is either present in the UA for all content documents or for none.
  5. the spine element represent the default reading order of the package. Basically, it's a list. We have lists in html, don't we? Why do we need a painful and complex proprietary xml format here?
  6. the name of the linear attribute, that discriminates between primary and supplementary content, is extremely badly chosen. I always forget what really is linear because of that.
  7. Reading Systems are free, per spec, to completely ignore the linear attribute, making it pointless from an author's point of view.
  8. I have never seen the collection element used and I don't really understand why it contains link elements and not itemref elements
  9. metadata fun, as always with every EPUB spec. Implementing refines in 3.0 was a bit of a hell (despite warnings to the EPUB WG...), and it's gone from 3.1, replaced by new attributes. So no forwards compatbility, no backwards compatibility. Yet another parser and serializer for EPUB-compliant user agents.
  10. the old OPF guide element is now a html landmarks list, proving it's feasible to move OPF features to html
  11. the Navigation Document, an html document, is mandatory... So all the logics mentioned above could be there.
  12. Fixed Layout delenda est. Let's use CSS Fragmentation to make sure there's no orphaned content in the document post-pagination, and if CSS Fragmentation is not enough, make extension contributions to the CSS WG.
  13. without 3.0 refines, there is absolutely nothing any more in 3.1 preventing Package's metadata to be expressed in html; in 3.0, the refines attribute was a blocker, implying an extension of the model of the meta html element or another ugly IDREF mechanism in html.
  14. the prefix attribute on the package element is a good thing and should be preserved
  15. the rendition-flow property is weird, its values being paginated, scrolled-continuous, scrolled-doc and auto. Where is paginated-doc, the simplest paginated mode to implement?
  16. no more NCX, finally...
  17. the Navigation Document is already a html document with nav elements having a special epub:type/role (see issue #941), that's easy to make it contain an equivalent to the spine or more.

Content Documents

  1. let's get rid of the epub namespace, please...

Media Overlays

  1. SMIL support across rendering engines is in very bad shape and the SMIL polyfill does not totally help. Drop Media Overlays for the time being and let's focus on visual content.

Alternate Style Tags

  1. terrible spec... Needed because of Reading Systems' limitations but still absolutely terrible spec...
  2. If we get rid of backwards compatibility, we can drop it. Submit extensions to Media Queries if needed.

On the fly conclusions

  1. backwards compatibility is an enormous burden on the EPUB ecosystem
  2. build a new generation of EPUB that is not backwards-compatible
  3. the mimetype file is useless
  4. file extension of the publication MUST be well-defined
  5. one rendition only per publication and no more links/link elements
  6. in that case, we don't need the container.xml file any more
  7. metadata.xml and manifest.xml files removed
  8. we may still need encryption.xml, signatures.xml and rights.xml inside a META-INF directory (or directly in the package's root after all) to please the industry.
  9. application/oebps-package+xml mimetype is not necessary
  10. the EPUB spec evolution model is/was "we must deal with all cases in the world, we move very fast and we fix the mistakes afterwards". I respectfully suggest a drastic change for a next generation: "let's start from a low-level common ground only and expand slowly but cleanly"
  11. the OPF file is not needed any more. The root of the unique rendition in a package should be the html Navigation Document.
  12. the metadata of the package should be inside the body element of the Navigation Document
  13. the spine of the package can be a new nav element inside the body of the Navigation Document
  14. intermediary summary: let's get rid of both META-INF/container.xml and the OPF file... Let's have the Navigation Document mandatorily named index.xhtml so a directory browsing of the uncompressed publication through http will render the Navigation Document.
  15. let's drop the Alternate Style Tags spec for now. Submission of new Media Queries to CSS WG if needed.
  16. let's drop Media Overlays spec for now.

CONCLUSION

EPUB is a monster, made to address very diverse markets and ecosystems, too many markets and ecosystems. It's weak, complex, a bit messy, disconnected from the reality of the Web it's supposed to be built upon and some claim (link in fr-FR) it's too close to real books to be disruptive and innovative.

I am then suggesting to severe backwards compatibility ties and restart almost from scratch, and entirely and purely from W3C Standards. Here's the proposed result:


1. E0 Publication

A E0 Publication is a ZIP container. Files in the ZIP must be stored as is (no compression) or use the Deflate algorithm. File and directory names must be encoded in UTF-8 and follow some restrictions (see EPUB 3.1 filename restrictions).

The file name of a E0 Publication MUST use the e0 file extension.

A E0 Publication MUST contain a Navigation Document. It MAY contain files encryption.xml, signatures.xml and rights.xml (see OCF 3.1 for more information). All these files must be placed directly inside the root of the E0 Publication.

A E0 Publication can also contain Content Documents and their resources.

Inside a E0 Publication, all internal references (links, anchors, references to replaced elements, etc) MUST be strictly relative. With respect to section 4.2 of RFC 3986, only path-noscheme and path-empty are allowed for IRIs' relative-part. External references are not restricted.

2. E0 Navigation Document

A E0 Navigation Document is a html document. Its file name MUST be index.xhtml if the document is a XML document and index.html if it it is not a XML document. A E0 Publication cannot contain both index.html and index.xhtml files.

A E0 Navigation Document contains at least one header element (for metadata) and at least two nav html elements (for spine and table of contents) in its body element.

2.1. E0 Metadata in the Navigation Document

E0 metadata are designed to be editable in any Wysiwyg html editor, and potentially rendered as regular html content by any Web browser.

E0 metadata are expressed inside a mandatory header html element inside the Navigation Document. That element must carry the "metadata" ID and the vocab attribute with value "http://www.idpf.org/2007/opf". All metadata inside that header element are then expressed using html+RDFa Lite 1.1. E0 metadata reuse EPUB 3.1 metadata and corresponding unicity rules, expressed in a different way.

Refinements of metadata are expressed through nesting of elements.

Example:

<header id="metadata"
        vocab="http://www.idpf.org/2007/opf">
<h1>Reading Order</h1> <ul> <li>Author: <span property="dc:creator">glazou
(<span property="file-as">Glazman, Daniel</span>)</span></li> <li>Title: <span property="dc:title">E0 Publications</span></li> </ul> </header>

The mandatory title element of the Navigation Document, contained in its head element, should have the same text contents than the first "dc:title" metadata inside that header element.

2.2. E0 Spine

The spine of a E0 Publication is expressed in its Navigation Document as a new nav element holding the "spine" ID. The spine nav element is mandatory.

See EPUB 3.1 Navigation Document.

2.3. E0 Table of Contents

The Table of Contents of a E0 Publication is expressed in its Navigation Document as a nav element carrying the "toc" ID. The Table of Contents nav element is mandatory.

See EPUB 3.1 Navigation Document.

2.4. E0 Landmarks

The Landmarks of a E0 Publication is expressed in its Navigation Document as a nav element carrying the "landmarks" ID. The Landmarks nav element is optional.

See EPUB 3.1 Navigation Document.

2.5. Other nav elements

The Navigation Document may include one or more nav elements. These additional nav elements should have an role attribute to provide a machine-readable semantic, and must have a human-readable heading as their first child.

IDs "metadata", "spine" , 'landmarks" and "toc" are reserved in the Navigation Document and must not be used by these extra nav elements.

2.6. Example of a Navigation Document

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta content="text/html; charset=UTF-8" http-equiv="content-type">
    <title>Moby-Dick</title>
  </head>
  <body>
    <header id="metadata"
            vocab="http://www.idpf.org/2007/opf">
      <ul>
        <li>Author:
          <span property="dc:creator">Herman Melville
(<span property="file-as">Melville Herman</span>)</span></li> <li>Title: <span property="dc:title">Moby-Dick</span></li> <li>Identifier: <span property="dc:identifier">glazou.e0.samples.moby-dick</span></li> <li>Language: <span property="dc:language">en-US</span></li> <li>Last modification: <span property="dcterms:modified">2017-01-17T11:16:41Z</span></li> <li>Publisher: <span property="dc:publisher">Harper & Brothers, Publishers</span></li> <li>Contributor: <span property="dc:contributor">Daniel Glazman</span></li> </ul> </header> <nav id="spine"> <h1>Default reading order</h1> <ul> <li><a href="http://www.glazman.org/weblog/dotclear/index.php?post/2017/01/17/cover.html">Cover</a></li> <li><a href="http://www.glazman.org/weblog/dotclear/index.php?post/2017/01/17/titlepage.html">Title</a></li> <li><a href="http://www.glazman.org/weblog/dotclear/index.php?post/2017/01/17/toc-short.html">Brief Table of Contents</a></li> ... </ul> </nav> <nav id="toc" role="doc-toc"> <h1>Table of Contents</h1> <ol> <li><a href="http://www.glazman.org/weblog/dotclear/index.php?post/2017/01/17/titlepage.html">Moby-Dick</a></li> <li><a href="http://www.glazman.org/weblog/dotclear/index.php?post/2017/01/17/preface_001.html">Original Transcriber’s Notes:</a></li> <li><a href="http://www.glazman.org/weblog/dotclear/index.php?post/2017/01/17/introduction_001.html">ETYMOLOGY.</a></li> ... </ol> </nav> </body> </html>

3. Directories

A E0 Publication may contain any number of directories and nested directories.

4. E0 Content Documents

E0 Content Documents are referenced from the Navigation Document. E0 Content Documents are html documents.

E0 Content Documents should contain <link rel="prev"...> and <link rel="next"...> elements in their head element conformant to the reading order of the spine present in the Navigation Document. Content Documents not present in that spine don't need such elements.

The epub:type attribute is superseded by the role attribute and must not be used.

5. E0 Resources

E0 Publications can contain any number of extra resources (CSS stylesheets, images, videos, etc.) referenced from either the Navigation Document or Content Documents.

Planet MozillaA-Blast: Save the World from the Cutest Creatures in the Universe!

A-Blast: Save the World from the Cutest Creatures in the Universe!

Are you prepared? Get your VR controllers and jump https://aframe.io/a-blast! Make sure you have a WebVR-enabled browser. Firefox is ready in Nightly branch. In Chromium, enable the flags for chrome://flags/#enable-webvr and chrome://flags/#enable-gamepad-extensions.

A-Blast: Save the World from the Cutest Creatures in the Universe!

Wave shooters are probably the most popular genre in the first crop of VR games. From the point of view of the implementation, they are also the easiest to make: you don’t have to move the player around (so you don’t need to implement any locomotion feature) and this simplifies the stage since there is only one fixed point of view. Also, the interaction with the enemies is easy too: just shoot and detect bullet collisions. As simple as wave shooters are, they are quite fun to play and some of them really make you feel the anxiety of the player character (eg: Raw Data).

With A-Blast we wanted to create one game focusing on adding smooth playability, quality assets, and a real example of the capabilities of A-Frame and browser performance. We also wanted to dogfood our own tools (A-Frame and Firefox with WebVR).

During testing we found many problems with performance, so we needed to optimize several parts of the initial implementation and some features were added to A-Frame to help with this task (like the pool component ). We’ll share these details in a future post.

The gameplay is straightforward: just grab your weapons, aim to the characters floating around, and pull the trigger. You can also dodge and shoot enemy bullets in order to keep your 5 lives intact. The more characters you blast, the more points you get, and then enter the (local) Hall of Fame.

We wanted to keep the gameplay time under 5 minutes to have a good turnover in demo stations at fairs and conventions. For a full game, we would have designed more elaborate levels. (Yep, we know is too short, but please keep in mind that this is a technical demo, not a whole game with hours of content).

A-Blast: Save the World from the Cutest Creatures in the Universe!

The game is designed for the HTC Vive but it can also be played with mouse and keyboard or in your smartphone by tapping the screen to shoot.

Using our A-Frame JavaScript VR framework, A-Blast was created by two programmers and one artist in two months. A-Blast debuted to dozens of Mozillians in December 2016 at the Mozilla All-Hands event in Hawaii.

A-Blast: Save the World from the Cutest Creatures in the Universe!

A-Blast also served as a tour de force of A-Frame, testing it with a relatively complex application (the source code is slightly bigger than A-Painter's), helping to stress test and then improve A-Frame.

If you have ideas for improvement, like adding a global leaderboard or support for other controllers and devices, head to the A-Blast GitHub repository and send a pull request.

Many thanks to José Manuel Pérez Paredes (JosSs) for providing the soundtrack, which really improves the experience!

Planet MozillaThis Week in Rust 165

Hello and welcome to another issue of This Week in Rust! Rust is a systems language pursuing the trifecta: safety, concurrency, and speed. This is a weekly summary of its progress and community. Want something mentioned? Tweet us at @ThisWeekInRust or send us a pull request. Want to get involved? We love contributions.

This Week in Rust is openly developed on GitHub. If you find any errors in this week's issue, please submit a PR.

Updates from Rust Community

News & Blog Posts

Other Weeklies from Rust Community

Crate of the Week

This week's Crate of the Week is alacritty, an OpenGL-propelled Terminal application. Really fast, nice looking. Missing scrollback. Thanks to Vikrant for the suggestion!

Submit your suggestions and votes for next week!

Call for Participation

Always wanted to contribute to open-source projects but didn't know where to start? Every week we highlight some tasks from the Rust community for you to pick and get started!

Some of these tasks may also have mentors available, visit the task page for more information.

If you are a Rust project owner and are looking for contributors, please submit tasks here.

Updates from Rust Core

119 pull requests were merged in the last week.

New Contributors

  • Behnam Esfahbod
  • Benjamin Saunders
  • Ben Wiederhake
  • Bjorn Tipling
  • Christopher Armstrong
  • Craig Macomber
  • Djzin
  • Jeff Waugh
  • Tyler Julian

Approved RFCs

Changes to Rust follow the Rust RFC (request for comments) process. These are the RFCs that were approved for implementation this week:

No RFCs were approved this week.

Final Comment Period

Every week the team announces the 'final comment period' for RFCs and key PRs which are reaching a decision. Express your opinions now. This week's FCPs are:

Closed RFCs

Following proposals were rejected by the team after their 'final comment period' elapsed.

  • Abort by default v2. Specify abort-by-default in Cargo.toml when the user does cargo new --bin, as well as various other refinements to the panick strategy system.

New RFCs

No new RFCs were proposed this week.

Style RFCs

Style RFCs are part of the process for deciding on style guidelines for the Rust community and defaults for Rustfmt. The process is similar to the RFC process, but we try to reach rough consensus on issues (including a final comment period) before progressing to PRs. Just like the RFC process, all users are welcome to comment and submit RFCs. If you want to help decide what Rust code should look like, come get involved!

Ready for PR:

There's a lot of them right now, contributions here would be very welcome. If you want advice or help getting started, please ping nrc, or any other member of the style team, in #rust-style.

Issues in final comment period:

Upcoming Events

If you are running a Rust event please add it to the calendar to get it mentioned here. Email the Rust Community Team for access.

Rust Jobs

Tweet us at @ThisWeekInRust to get your job offers listed here!

Quote of the Week

I really hate the phrase "fighting". Calling it a fight doesn't do justice to the conversations you have with the borrow checker when you use Rust every day. You don't fight with the borrow checker, because there isn't a fight to win. It's far more elegant, more precise. It's fencing; you fence with the borrow checker, with ripostes and parries and well-aimed thrusts. And sometimes, you get to the end and you realize you lose anyway because the thing you were trying to do was fundamentally wrong. And it's okay, because it's just fencing, and you're a little wiser, a little better-honed, a little more practiced for your next bout.

kaosjester on Hacker News.

Thanks to Manishearth for the suggestion.

Submit your quotes for next week!

This Week in Rust is edited by: nasa42, llogiq, and brson.

Planet MozillaGPU Process Experiment Results

Update: It was pointed out that it was hard to know what the charts measure specifically due to unlabeled axes. For all charts measuring crashes (ie. not the percentage charts) the Y-Axis represents crash rate, where crash rate for Telemetry data is defined as “crashes per 1000 hours of usage” and crash rate for Socorro data is defined as “crashes per number of unique installations”. The latter really only applies to the Breakdown by Vendor chart and the vendor bars in the Breakdown of GPU Process Crashes chart. The x-axis for all charts is the date.

GPU Process has landed with 4% fewer crash reports overall!

  • 1.2% fewer Browser crashes
  • 5.6% fewer Content crashes
  • 5.1% fewer Shutdown crashes
  • 5.5% greater Plugin crashes
  • 45% fewer GPU driver crash reports!

Thanks to David Anderson, Ryan Hunt, George Wright, Felipe Gomes, Jukka Jylänki
Data sources available below

Several months ago Mozilla’s head of Platform Engineering, David Bryant, wrote a post on Medium detailing a project named Quantum. Built on the foundation of quality and stability we’d built up over the previous year, and using some components of what we learned through Servo, this project seeks to enable developers to “tap into the full power of the underlying device”. As David points out, “it’s now commonplace for devices to incorporate one or more high-performance GPUs”.

It may surprise you to learn that one of these components has already landed in Nightly and has been there for several weeks: GPU Process.  Without going in to too much technical detail this basically adds a separate Firefox process set aside exclusively for things we want the GPU (graphics processing unit) to handle.

I started doing quality assurance with Mozilla in 2007 and have seen a lot of pretty bad bugs over the years. From subtle rendering issues to more aggressive issues such as the screen going completely black, forcing the user to restart their computer. Even something as innocuous as playing a game with Firefox running in the background was enough to create a less than desirable situation.

Unfortunately many of these issues stem from interactions with drivers which are out of our control. Especially in cases where users are stuck with older, unmaintained drivers. A lot of the time our only option is to blacklist the device and/or driver. This forces users down the software rendering path which often results in a sluggish experience for some content, or missing out altogether on higher-end user experiences, all in an effort to at least stop them from crashing.

While the GPU Process won’t in and of itself prevent these types of bugs, it should enable Firefox to handle these situations much more gracefully. If you’re on Nightly today and you’re using a system that qualifies (currently Windows 7 SP1 or later, a D3D9 capable graphics card, and whitelisted for using multi-process Firefox aka e10s), you’ve probably had GPU Process running for several weeks and didn’t even notice. If you want to check for yourself it can be found in the Graphics section of the about:support page. To try it out do something that normally requires the video card (high quality video, WebGL game, etc) and click the Terminate GPU Process button — you may experience a slight delay but Firefox should recover and continue without crashing.

Before I go any further I would like to thank David Anderson, Ryan Hunt, and George Wright for doing the development work to get GPU Process working. I also want to thank Felipe Gomes and Jukka Jylänki for helping me work through some bugs in the experimentation mechanism so that I could get this experiment up and running in time.

The Experiment

As a first milestone for GPU Process we wanted to make sure it did not introduce a serious regression in stability and so I unleashed an experiment on the Nightly channel. For two weeks following Christmas, half of the users who had GPU Process enabled on Nightly were reverted to the old, one browser + one content process model. The purpose of this experiment was to measure the anonymous stability data we receive through Telemetry and Socorro, comparing this data between the two user groups. The expectation was that the stability numbers would be similar between the two groups and the hope was that GPU Process actually netted some stability improvements.

Now that the experiment has wrapped up I’d like to share the findings. Before we dig in I would like to explain a key distinction between Telemetry and Socorro data. While we get more detailed data through Socorro (crash signatures, graphics card information, etc), the data relies heavily on users clicking the Report button when a crash occurs; no reports = no data. As a result Socorro is not always a true representation of the entire user population. On the other hand, Telemetry gives us a much more accurate representation of the user population since data is submitted automatically (Nightly uses an opt-out model for Telemetry).  However we don’t get as much detail, for example we know how many crashes users are experiencing but not necessarily which crashes they happen to be hitting.

I refer to both data sets in this report as they are each valuable on their own but also for checking assumptions based on a single source of data. I’ve included links to the data I used at the end of this post.

As a note, I will the terminology “control” to refer to those in the experiment who were part of the control group (ie. users with GPU Process enabled) and “disabled” to refer to those in the experiment who were part of the test group (ie. users with GPU Process disabled). Each group represents a few thousand Nightly users.

Daily Trend

To start with I’d like to present the daily trend data. This data comes from Socorro and is graphed on my own server using the MetricsGraphics.js framework. As you can see, day to day data from Socorro can be quite noisy. However when we look at the trend over time we can see that overall the control group reported roughly 4% fewer crashes than those with GPU Process disabled.

<figure class="wp-caption alignnone" id="attachment_415" style="width: 803px;">Trend in daily data coming from Socorro. [source]</figure>

Breakdown of GPU Process Crashes

Crashes in the GPU Process itself compare favourably across the board, well below 1.0 crashes per 1,000 hours of usage, and much less than crash rates we see from other Firefox processes (I’ll get into this more below). The following chart is very helpful in illustrating where our challenges might lie and may well inform roll-out plans in the future. It’s clear to see that Windows 8.1 and AMD hardware are the worst of the bunch while Windows 7 and Intel is the best.

<figure class="wp-caption alignnone" id="attachment_389" style="width: 573px;">Vendor data comes from Socorro crash reports, all other data comes from Telemetry [source]</figure>

Breakdown by Process Type

Of course, the point of GPU Process is not just to see how stable the process is itself but also to see what impact it has on crashes in other processes. Here we can see that stability in other processes is improved almost universally by 5%, except for plugin crashes which are up by 5%.

<figure class="wp-caption alignnone" id="attachment_384" style="width: 569px;">A negative percentage represents an improvement in the Control group as it compares to the Disabled group while a positive percentage represents a regression. All data comes from Telemetry [source]</figure>

GPU Driver Crashes

One of the areas we expected to see the biggest wins was in GPU driver crashes. The theory is that driver crashes would move to the GPU Process and no longer take down the entire browser. The user experience of drivers crashing in the GPU Process still needs to be vetted but there does appear to be a noticeable impact with driver crash reports being reduced overall by 45%.

<figure class="wp-caption alignnone" id="attachment_400" style="width: 553px;">All data comes from Socorro crash reports [source]</figure>

Breakdown by Platform

Now we dig in deeper to see how GPU Process impacts stability on a platform level. Windows 8.0 with GPU Process disabled is the worst especially when Plugin crashes are factored in while Windows 10 with GPU Process enabled seems to be quite favourable overall.

<figure class="wp-caption alignnone" id="attachment_385" style="width: 617px;">A negative percentage represents an improvement in the Control group as it compares to the Disabled group while a positive percentage represents a regression. All data comes from Telemetry [source]</figure>

Breakdown by Architecture

Breaking the data down by architecture we can see that overall 64-bit seems to be much more stable overall than 32-bit. 64-bit sees improvement across the board except for plugin process crashes which regress significantly. 32-bit sees an inverse effect albeit at a smaller scale.

<figure class="wp-caption alignnone" id="attachment_386" style="width: 579px;">A negative percentage represents an improvement in the Control group as it compares to the Disabled group while a positive percentage represents a regression. Data comes from Telemetry [source]</figure>

Breakdown by Compositor

Taking things down to the compositor level we can see that D3D11 performs best overall, with or without the GPU Process but does seem to benefit from having it enabled. There is a significant 36% regression in Plugin process crashes though which needs to be looked at — we don’t see this with D3D9 nor Basic compositing. Looking at D3D9 itself seems to carry a regression as well in Content and Shutdown crashes. These are challenges we need to address and keep in mind as we determine what to support as we get closer to release.

<figure class="wp-caption alignnone" id="attachment_387" style="width: 629px;">A negative percentage represents an improvement in the Control group as it compares to the Disabled group while a positive percentage represents a regression. Data comes from Telemetry [source]</figure>

Breakdown by Graphics Card Vendor

Looking at the data broken down by graphics card vendors there is significant improvement across the board from GPU Process with the only exception being a 6% regression in Browser crashes on AMD hardware and a 12% regression in Shutdown crashes on Intel hardware. However, considering this data comes from Socorro we cannot say that these results are universal. In other words, these are regressions in the number of crashes reported which do not necessarily map one-to-one to the number of crashes occurred.

<figure class="wp-caption alignnone" id="attachment_388" style="width: 627px;">A negative percentage represents an improvement in the Control group as it compares to the Disabled group while a positive percentage represents a regression. Data comes from Telemetry [source]</figure>

Signature Comparison

As a final comparison, since some of the numbers above varied quite a lot between the Control and Test groups, I wanted to look at the top signatures reported by these separate groups of users to see where they did and did not overlap.

This first table shows the signatures that saw the greatest improvement. In other words these crashes were much less likely to be reported in the Control group. Of note there are multiple signatures related to Adobe Flash and some related to graphics drivers in this list.

<figure class="wp-caption alignnone" id="attachment_380" style="width: 617px;">Crashes reported less frequently with GPU Process. Signatures highlighted green are improved by more than 50%, blue is 10-50% improvement, and yellow is 0-10% improvement. Data comes from Socorro crash reports [source].</figure>This next table shows the inverse, the crashes which were more likely to be reported with GPU Process enabled. Here we see a bunch of JS and DOM related signatures appearing more frequently.

<figure class="wp-caption alignnone" id="attachment_381" style="width: 619px;">Crashes reported more frequently with GPU Process. Signatures in red are more than 50% worse, orange are 10-50% worse, and yellow are 0-10% worse. Data comes from Socorro crash reports. [source]</figure>These final tables break down the signatures that didn’t show up at all in either of the cohorts. The top table represents crashes which were only reported when GPU Process was enabled, while the second table are those which were only reported when GPU Process was disabled. Of note there are more signatures related to driver DLLs in the Disabled group and more Flash related signatures in the Enabled group.

<figure class="wp-caption alignnone" id="attachment_382" style="width: 541px;">Crashes which show up only when GPU Process is enabled, or only when disabled. Data comes from Socorro crash reports. [source]</figure>

Conclusions

In this first attempt at a GPU process things are looking good — I wouldn’t say we’re release ready but it’s probably good enough to ship to a wider test audience. We were hoping that stability would be on-par overall with most GPU related crashes moving to the GPU Process and hopefully being much more recoverable (ie. not taking down the browser). The data seems to indicate this has happened with a 5% win overall. However this has come at a cost of a 5% regression in plugin stability and seems to perform worse under certain system configurations once you dig deeper into the data. These are concerns that will need to be evaluated.

It’s worth pointing out that this data comes from Nightly users, users who trend towards more modern hardware and more up to date software. We might see swings in either direction once this feature reaches a wider, more diverse population. In addition this experiment only hit half of those users who qualified to use GPU Process which, as it turns out, is only a few thousand users. Finally, this experiment only measured crash occurrences and not how gracefully GPU Process crashes — a critical factor that will need to be vetted before we release this feature to users who are less regression tolerant.

As I close out this post I want to take another moment to thank David Anderson, Ryan Hunt, and George Wright for all the work they’ve put in to making this first attempt at GPU Process. In the long run I think this has the potential to make Firefox a lot more stable and faster than previous Firefox versions but potentially the competition as well. It is a stepping stone to what David Bryant calls the “next-generation web platform”. I also want to thank Felipe Gomes and Jukka Jylänki for their help getting this experiment live. Both of them helped me work through some bugs, despite the All-hands in Hawaii and Christmas holidays that followed — without them this experiment might not have happened in time for the Firefox 53 merge to Aurora.


If you made it this far, thank you for reading. Feel free to leave me feedback or questions in the comments. If you think there’s something I’ve reported here that needs to be investigated further please let me know and I’ll file a bug report.

Crash rate data by architecture, compositor, platform, process c/o Mozilla Telemetry
Vendor and Topcrash data c/o Mozilla Socorro

 

Planet MozillaWorking for Postbox

I’m happy to announce that I’ve started working for Postbox, doing user content and support.
This means that I won’t have time for some of my commitments within Mozilla. Over the next while, I may be cancelling or transferring some of my projects and responsibilities.

Planet MozillaThis Week In Servo 88

In the last week, we landed 109 PRs in the Servo organization’s repositories.

Planning and Status

Our overall roadmap is available online. Plans for 2017 (including Q1) will be solidified soon. Please check it out and provide feedback!

This week’s status updates are here.

Notable Additions

  • emilio updated the Freetype FFI bindings.
  • nox fixed some incorrect scheduling of async and deferred JS.
  • Ms2ger corrected the integration with SpiderMonkey’s GC to avoid hazards.
  • Manishearth integrated Stylo’s CSS parsing with Gecko’s runtime preferences.
  • notriddle fixed the behaviour of text-overflow: ellipsis when the overflow state changes.
  • karenher made inline scripts report meaningful line numbers when thowing exceptions.
  • emilio added support for inline namespaces to rust-bindgen.
  • mrnayak corrected the implementation of some crossOrigin attributes.
  • gw optimized the rendering of clip masks.
  • jrmuizel implemented an automated test harness for WebRender.
  • nox unified the implementation of text insertion for HTML and XML.
  • ioctaptceb added argument validation to some WebGL APIs.
  • hiikezoe integrated Stylo’s CSS values into Gecko’s animation storage.
  • bd339 improved the DOM integration with the Fetch network stack.
  • fiji-flo made clicking inside of a text input position the cursor appropriately.
  • shravan-achar (and other NCSU students) implemented support for non-pologyonal image maps.

New Contributors

Interested in helping build a web browser? Take a look at our curated list of issues that are good for new contributors!

Screenshot

No screenshots.

Planet Mozilla[worklog] Edition 050 - Intern hell and testing

Monday was a day off in Japan.

webcompat issues

  • Do you remember last week the issues related to the implementation of global. Well, we are not the only one it seems. Apple announced that they released the same feature for Safari Tech Preview 21, but they had to revert it immediately because it broke their Polymer tests.

webcompat.com dev

  • After rehacking on the two size images after a mistake I had done in one JavaScript. My manual testing are allright, but the intern tests using selenium are not even starting. I tried a couple of hours to solve it but ran in a dead end. Spent already two much time on this. At least I'm not the only one. It led to the need to update Intern, the functional testing tool we use.
  • Discussions around labels is going on with interesting ideas.
  • We also restarted the discussion about closing non strictly webcompat issues. Your opinion is welcome.
  • How to deal with a flash message on a URI just at the creation of a new issue.
  • Spent a bit of time reading and hacking on mock testing. Trying to rewrite parts of the upload images tests section using mocking, but was yet not successful reaching a working state. The exercise was kind of beneficial in some ways, because to do it we have to be better at the way we coded the uploads module.

Miscellaneous

  • Medium is in trouble.

    I’ll start with the hard part: As of today, we are reducing our team by about one third — eliminating 50 jobs, mostly in sales, support, and other business functions. We are also changing our business model to more directly drive the mission we set out on originally.

    Each time, I have read an article published on the—yes slick but not original—Medium platform, a little voice inside me told me: "If you like this article, you should save it locally. This might disappear one day." If you publish content, you need to own and liberate your content. That seems contradictory. Ownership: You need to publish the content on your blog so you are responsible for its future. Liberation: You need to use a permissive licence (Creative Commons CC0, Public Domain, etc.), so people can disseminate it and make it this way more resilient. Culture propagates because we share.

  • Dietrich on Elephant Trail This reminds me of our walk on the same trail with webcompat team in Taipei last October.

Otsukare!

Planet Mozilla45.7.0 available (also: Talos fails)

TenFourFox 45.7.0 is now available for testing. In addition to reducing the layout paint delay I also did some tweaks to garbage collection by removing some code that isn't relevant to us, including some profile accounting work we don't need to bother computing. If there is a request to reinstate this code in a non-debug build we can talk about a specific profiling build down the road, probably after exiting source parity. As usual the build finalizes Monday evening Pacific time. I didn't notice that the release had been pushed forward another week, to January 24. If additional security patches land, there will be a respin. There will be a respin this weekend. The download links have been invalidated and cancelled.

For 45.8 I plan to start work on the built-in user-agent switcher, and I'm also looking into a new initiative I'm calling "Operation Short Change" to wring even more performance out of IonPower. Currently, the JavaScript JIT's platform-agnostic section generates simplistic unoptimized generic branches. Since these generic branches could call any code at any displacement and PowerPC conditional branch instructions have only a limited number of displacement bits, we pad the branches with nops (i.e., nop/nop/nop/bc) so they can be patched up later if necessary to a full-displacement branch (lis/ori/mtctr/bcctr) if the branch turns out to be far away. This technique of "branch stanzas" dates back all the way to the original nanojit we had in TenFourFox 4 and Ben Stuhl did a lot of optimization work on it for our JaegerMonkey implementation that survived nearly unchanged in PPCBC and in a somewhat modified form today in IonPower-NVLE.

However, in the case of many generic branches the Ion code generator creates, they jump to code that is always just a few instruction words away and the distance between them never moves. These locations are predictable and having a full branch stanza in those cases wastes memory and instruction cache space; fortunately we already have machinery to create these fixed "short branches" in our PPC-specific code generator and now it's time to further modify Ion to generate these branches in the platform-agnostic segment as well. At the same time, since we don't generally use LR actually as a link register due to a side effect of how we branch, I'm going to investigate whether using LR is faster for long branches than CTR (i.e., lis/ori/mtlr/b(c)lr instead of mtctr/b(c)ctr). Certainly on G5 I expect it probably will be because having mtlr and blr/bclr in the same dispatch group doesn't seem to incur the same penalty that mtctr and bctr/bcctr in the same dispatch group do. (Our bailouts do use LR, but in an indirect form that intentionally clobbers the register anyway, so saving it is unimportant.)

On top of all that there is also the remaining work on AltiVec VP9 and some other stuff, so it's not like I won't have anything to do for the next few weeks.

On a more disappointing note, the Talos crowdfunding campaign for the most truly open, truly kick-*ss POWER8 workstation you can put on your desk has run aground, "only" raising $516,290 of the $3.7m goal. I guess it was just too expensive for enough people to take a chance on, and in fairness I really can't fault folks for having a bad case of sticker shock with a funding requirement as high as they were asking. But you get the computer you're willing to pay for. If you want a system made cheaper by economies of scale, then you're going to get a machine that doesn't really meet your specific needs because it's too busy not meeting everybody else's. Ultimately it's sad that no one's money was where their mouths were because for maybe double-ish the cost of the mythical updated Mac Pro Tim Cook doesn't see fit to make, you could have had a truly unencumbered machine that really could compete on performance with x86. But now we won't. And worst of all, I think this will scare off other companies from even trying.

Planet MozillaMe: 2016 retrospective

My 2016 retrospective in which I talk about projects I worked on, pushed off and other things.

Read more… (6 mins to read)

Planet MozillaHappy Early BMO Push Day!

the following changes have been pushed to bugzilla.mozilla.org:

  • [1280406] [a11y] Make each start of a comment a heading 3 for easier navigation
  • [1280395] [a11y] Make the Add Comment and Preview tabs real accessible tabs
  • [1329659] Copy A11y tweaks from BugzillaA11yFixes.user.js
  • [1330449] Clarify “Edit” button
  • [1329511] Any link to user-entered URL with target=”_blank” should have rel=”noopener” or rel=”noreferrer”

discuss these changes on mozilla.tools.bmo.


Planet MozillaProject Conduit

In 2017, Engineering Productivity is starting on a new project that we’re calling “Conduit”, which will improve the automation around submitting, testing, reviewing, and landing commits. In many ways, Conduit is an evolution and course correction of the work on MozReview we’ve done in the last couple years.

Before I get into what Conduit is exactly, I want to first clarify that the MozReview team has not really been working on a review tool per se, aside from some customizations requested by users (line support for inline diff comments). Rather, most of our work was building a whole pipeline of automation related to getting code landed. This is where we’ve had the most success: allowing developers to push commits up to a review tool and to easily land them on try or mozilla-central. Unfortunately, by naming the project “MozReview” we put the emphasis on the review tool (Review Board) instead of the other pieces of automation, which are the parts unique to Firefox’s engineering processes. In fact, the review tool should be a replaceable part of our whole system, which I’ll get to shortly.

We originally selected Review Board as our new review tool for a few reasons:

  • The back end is Python/Django, and our team has a lot of experience working with both.

  • The diff viewer has a number of fancy features, like clearly indicating moved code blocks and indentation changes.

  • A few people at Mozilla had previously contributed to the Review Board project and thus knew its internals fairly well.

However, we’ve since realized that Review Board has some big downsides, at least with regards to Mozilla’s engineering needs:

  • The UI can be confusing, particularly in how it separates the Diff and the Reviews views. The Reviews view in particular has some usability issues.

  • Loading large diffs is slow, but also conversely it’s unable to depaginate, so long diffs are always split across pages. This restricts the ability to search within diffs. Also, it’s impossible to view diffs file by file.

  • Bugs in interdiffs and even occasionally in the diffs themselves.

  • No offline support.

In addition, the direction that the upstream project is taking is not really in line with what our users are looking for in a review tool.

So, we’re taking a step back and evaluating our review-tool requirements, and whether they would be best met with another tool or by a small set of focussed improvements to Review Board. Meanwhile, we need to decouple some pieces of MozReview so that we can accelerate improvements to our productivity services, like Autoland, and ensure that they will be useful no matter what review tool we go with. Project Conduit is all about building a flexible set of services that will let us focus on improving the overall experience of submitting code to Firefox (and some other projects) and unburden us from the restrictions of working within Review Board’s extension system.

In order to prove that our system can be independent of review tool, and to give developers who aren’t happy with Review Board access to Autoland, our first milestone will be hooking the commit repo (the push-to-review feature) and Autoland up to BMO. Developers will be able to push a series of one or more commits to the review repo, and reviewers will be able to choose to review them either in BMO or Review Board. The Autoland UI will be split off into its own service and linked to from both BMO and Review Board.

(There’s one caveat: if there are multiple reviewers, the first one gets to choose, in order to limit complexity. Not ideal, but the problem quickly gets much more difficult if we fork the reviews out to several tools.)

As with MozReview, the push-to-BMO feature won’t support confidential bugs right away, but we have been working on a design to support them. Implementating that will be a priority right after we finish BMO integration.

We have an aggressive plan for Q1, so stay tuned for updates.

Planet MozillaHarry Potter and The Jabber Spam

After many many years of happy using XMPP we were finally awarded with the respect of spammers and suddenly some of us (especially those who have their JID in their email signature) are getting a lot of spim.

Fortunately, the world of Jabber is not so defenceless, thanks to XEP-0016 (Privacy Lists). Not only it is possible to set up list of known spammers (not only by their complete JIDs, but also by whole domains), but it is also possible to build a more complicated constructs.

Usually these constructs are not very well supported by GUI so most of the work must be done by sending plain XML stanzas to the XMPP stream. For example with pidgin one can open XMPP Console by going to Tools/XMPP Console and selecting appropriate account for which the privacy lists are supposed to be edited.

Whole system of ACLs consists from multiple lists. To get a list of all those privacy lists for the particular server, we need to send this XMPP stanza:

<iq type='get' id='getlist1'>
        <query xmlns='jabber:iq:privacy'/>

</iq>

If the stanza is send correctly and your server supports XEP-0016, then the server replies with the list of all privacy lists:

<iq id='getlist1' type='result'>
        <query xmlns='jabber:iq:privacy'>
                <default name='urn:xmpp:blocking'/>
                <list name='invisible'/>
                <list name='urn:xmpp:blocking'/>
        </query>
</iq>

To get a content of one particular list we need to send this stanza:

<iq type='get' id='getlist2'>
    <query xmlns='jabber:iq:privacy'>
        <list name='urn:xmpp:blocking'/>
    </query>
</iq>

And again the server replies with this list:

<iq id='getlist2' type='result'>
    <query xmlns='jabber:iq:privacy'>
        <list name='urn:xmpp:blocking'>
            <item order='0' action='deny'
                value='talk.mipt.ru' type='jid'/>
            <item order='0' action='deny'
                value='im.flosoft.biz' type='jid'/>
            <item order='0' action='deny'
                value='nius.net' type='jid'/>
            <item order='0' action='deny'
                value='jabber.me' type='jid'/>
            <item order='0' action='deny'
                value='tigase.im' type='jid'/>
            <item order='0' action='deny'
                value='pisem.net' type='jid'/>
            <item order='0' action='deny'
                value='qip.ru' type='jid'/>
            <item order='0' action='deny'
                value='crypt.mn' type='jid'/>
            <item order='0' action='deny'
                value='atteq.com' type='jid'/>
            <item order='0' action='deny'
                value='j3ws.biz' type='jid'/>
            <item order='0' action='deny'
                value='jabber.dol.ru' type='jid'/>
            <item order='0' action='deny'
                value='vpsfree.cz' type='jid'/>
            <item order='0' action='deny'
                value='buckthorn.ws' type='jid'/>
            <item order='0' action='deny'
                value='pandion.im' type='jid'/>
        </list>
    </query>
</iq>

Server goes through every item in the list and decides based on the value of action attribute. If the actual considered stanza does not match any item in the list, the whole system defaults to allow.

I was building a blocking list like this for some time (I have even authored a simple Python script for adding new JID to the list), but it seems to be road to nowhere. Spammers are just generating new and new domains. The only workable solution seems to me to be white-list. Some domains are allowed, but everything else is blocked.

See this list stanza sent to the server (answer should be simple one line empty XML element):

<iq type='set' id='setwl1'>
    <query xmlns='jabber:iq:privacy'>
        <list name='urn:xmpp:whitelist'>
            <item type='jid' value='amessage.de'
                  action='allow' order='1'/>
            <item type='jid' value='ceplovi.cz'
                  action='allow' order='2'/>
            <item type='jid' value='cepl.eu'
                  action='allow' order='3'/>
            <item type='jid' value='dukgo.com'
                  action='allow' order='4'/>
            <item type='jid' value='eischmann.cz'
                  action='allow' order='5'/>
            <item type='jid' value='gmail.com'
                  action='allow' order='7'/>
            <item type='jid' value='gtalk2voip.com'
                  action='allow' order='8'/>
            <item type='jid' value='jabber.at'
                  action='allow' order='9'/>
            <item type='jid' value='jabber.cz'
                  action='allow' order='10'/>
            <item type='jid' value='jabber.fr'
                  action='allow' order='11'/>
            <item type='jid' value='jabber.org'
                  action='allow' order='12'/>
            <item type='jid' value='jabber.ru'
                  action='allow' order='13'/>
            <item type='jid' value='jabbim.cz'
                  action='allow' order='14'/>
            <item type='jid' value='jankratochvil.net'
                  action='allow' order='15'/>
            <item type='jid' value='kde.org'
                  action='allow' order='16'/>
            <item type='jid' value='loqui.im'
                  action='allow' order='17'/>
            <item type='jid' value='mac.com'
                  action='allow' order='18'/>
            <item type='jid' value='metajack.im'
                  action='allow' order='19'/>
            <item type='jid' value='njs.netlab.cz'
                  action='allow' order='20'/>
            <item type='jid' value='stpeter.im'
                  action='allow' order='21'/>
            <item type='jid' value='ucw.cz'
                  action='allow' order='22'/>
            <item action='deny' order='23'/>
        </list>
    </query>
</iq>

Server goes in order through all items on the list, and if it doesn’t match on any item, it hits the last item in the list, which denies the access.

It is also useful to make sure the list which have actually created be default:

<iq type='set' id='default1'>
    <query xmlns='jabber:iq:privacy'>
        <default name='urn:xmpp:whitelist'/>
    </query>
</iq>

So, now I am in the state of testing, how it works (using as server jabberd2 version 2.4.0 from the RHEL-6/EPEL package).

Planet MozillaFeeling safer online with Firefox

The latest privacy and security improvements in Firefox

 

[This post originally appeared on Medium]

Firefox is the only browser that answers only to you, our users; so all of us who work on Firefox spend a lot of effort making your browsing experience more private and secure. We update Firefox every 6 weeks, and every new change ships to you as fast as we can make and verify it. For a few releases now, we have been landing bits and pieces of a broader set of privacy and security changes. This post will outline the big picture of all these changes.

Site Identity and Permissions Panel

The main project involved improvements to the way Firefox handles permission requests from sites that want to do things that the web platform doesn't allow by default - like accessing your laptop's camera or GPS sensor. To find out how our existing model fares, we ran it through a number of user studies and gathered feedback from users and web developers alike.

old-prompt.png

What we found was clear: users were having trouble making full use of web permissions. Here are some of the observations:
  • It’s easy by design to dismiss a permission prompt, to prevent websites from nagging you. But it’s not so obvious how to get back to an inadvertently dismissed prompt, which users found confusing.
  • Managing the permissions of an individual site was hard, due to the multitude of presented options.
  • It was cumbersome to grant access to screen sharing. This was because it was difficult to select which area of the screen would be shared and because screen sharing was only permitted on websites included in a manually curated list.
In order for the open web platform to be on par with the capabilities of native, closed platforms, we needed to fix these issues. So we first focused on putting all privacy and security related information in one place. We call it the Site Identity and Permissions Panel, or more affectionately, the Control Center™.

control-center.png

The Site Identity panel appears when you click on the circled “i” icon – “i” for “information” – on the left side of the Awesome Bar. The panel is designed to be the one-stop shop for all security and privacy information specific to the site you’re on. This includes encrypted connections certificate, mixed content warning messages, tracking protection status, as well as non-default permissions. We were happy to see Chrome adopt a similar UI, too.

Elevated Privileges for Non-Default Permissions

By default, web sites need an elevated level of privilege to access your computer hardware like camera, microphone, GPS or other sensors. When a site requests such a permission and the user grants it, the Site Identity panel will display the allowed item along with an "x" button to revert it. In the case of particularly privacy-sensitive permissions, like microphone or camera access, the icon will have a red hue and a gentle animation to draw attention.


When a site has been granted elevated privileges, the "i" icon in the URL bar is badged with a dot that indicates the additional information present in the Site Identity panel. This lets you assess the security properties of your current session with a quick glance at the awesomebar, where the "i" and lock icons are displayed together.



Users who want even more fine-grained control over all available permissions can go to the Permissions tab in the Page Info dialog (right arrow in the Identity panel -> More Information).

Permission Prompt and Dialog

Permission dialogs are now more consistent than before, both in terms of available actions and messaging.

When a site asks for a permission, a permission prompt appears with a message and iconography specific to the type of permission being requested and the available actions. Most of the time, there will only be two: allow or don’t allow access. The default action will stand out in a blue highlight, making the common action easier to perform.


In the few cases of permission prompts with more than two actions, a drop-down menu will appear next to the secondary action button.


Permanently allowing or rejecting a permission for a site is done by checking the always present "Remember this decision" option.



We have received a lot of feedback about how these prompts are easy to dismiss and how users often couldn't figure out how to bring them back. In the new design, permission prompts stay up even when you interact with the page. You can of course ignore it and continue to use the page normally. But thanks to the persistence of the prompt, it’s now easier to associate site misbehavior – webcams that don’t work, locations that won’t display – with an allow/don’t allow button that needs your response.

Furthermore, disallowed permission requests are now displayed as strikethrough icons in the Awesome Bar to hint at the potential cause of site breakage. For example a video conferencing site will probably not be functioning very well if you reject its camera permission request. So the crossed-out camera icon will remain afterwards, next to the "i" icon, to remind you of that fact.


Going to a different tab will hide the prompt (because it’s specific to the site you have open on each tab), but when the prior tab is selected again, the prompt will reappear.


Audio, Video and Screen Sharing Permissions

WebRTC-related permissions have even more new changes.

For starters, screen sharing now doesn't require sites to be added to a separate whitelist. This means that all sites can now use WebRTC screen sharing in Firefox.

Also, screen sharing now includes a preview of the content that will be shared to make it easier to identify the right screen, application or window to share.


In the riskiest of cases, such as sharing the entire screen or sharing the Firefox application, a scary warning message is displayed to ensure you know what you are about to do.

screen-sharing.png

Moreover, when you have granted a video conferencing site access to both your camera and microphone, reverting the permission grant for one permission will also revert it for the other. This will help you avoid accidentally leaking your private data.

Add-on Panel Improvements

While working on these security improvements we fixed some old platform panel bugs that used to affect all kinds of panels, including those created by add-ons. Therefore if you are using an add-on that displays popup panels you should have an improved experience even if the panels are not related to permission prompts.

Error Pages

And finally, error pages also received some new smarts.

The most common cause for secure connection errors turns out to be user systems having the wrong time. Firefox will now detect when your clock seems way off and will suggest in the error message how to fix it.



Another common cause for broken connections is the presence of a captive portal. Firefox will now detect that case and prompt you to log in the captive portal. Even though some operating systems have built-in support for detecting captive portals, if you regularly use social network accounts to log in, the experience with Firefox will be smoother. This change is now in Nightly and Developer Edition versions and should ship soon in the stable release.


Looking back at what we managed to accomplish in the last few months makes me proud to work with this fabulous team of talented and passionate engineers, designers, user researchers, QA engineers, product and project managers. But of course we are far from being done with privacy and security improvements for our users. Stay tuned for more exciting Firefox privacy and security updates in 2017!

[Many thanks to Bram Pitoyo, Nihanth Subramanya, Tanvi Vyas, Peter Dolanjski, Florian Quèze, and Johann Hofmann for reviewing drafts of this post.]

Planet MozillaWhat’s Up with SUMO – 12th January 2017

Hello, SUMO Nation!

Yes, it’s Friday the 13th this week, considered by some cultures to be a day of “bad luck”, so… Read quickly, before the bells chime for midnight! But, before we get there… Happy birthday, Charles Perrault! and a happy National Youth Day to everyone in India!

Welcome, new contributors!

If you just joined us, don’t hesitate – come over and say “hi” in the forums!

Contributors of the week

SUMO Community meetings

  • LATEST ONE: 11th of January – you can read the notes here (and see the video at AirMozilla).
  • NEXT ONE: happening on the 18th of January!
  • Reminder – if you want to add a discussion topic to the upcoming meeting agenda:
    • Start a thread in the Community Forums, so that everyone in the community can see what will be discussed and voice their opinion here before Wednesday (this will make it easier to have an efficient meeting).
    • Please do so as soon as you can before the meeting, so that people have time to read, think, and reply (and also add it to the agenda).
    • If you can, please attend the meeting in person (or via IRC), so we can follow up on your discussion topic during the meeting with your feedback.

Community

Platform

Social

  • Your inboxes should soon contain a message with a link to a post-2016 survey about Social, where you will be able to help us shape the near future of Social Support.
  • Reminder: you can contact Sierra (sreed@), Elisabeth (ehull@), or Rachel (guigs@) to get started with Social support. Help us provide friendly help through the likes of @firefox, @firefox_es, @firefox_fr and @firefoxbrasil on Twitter and beyond :-)
  • Reminder: that you can subscribe to the social-support@ mailing list and get updates from Sierra!

Support Forum

  • Thank you to the new contributors this month! and thank you for your help with the training for Lithium!
  • Moderators! Please check your inboxes for an opportunity to collaborate on some features that you would like to learn about in Lithium and some guides we can provide for future Moderators. Your opinion matters in shaping the months ahead!
  • Reminder: Mark your calendars for SUMO day in January – preparations for swag are beginning, spread the word!
    • Jan 24 – Release of Firefox 51 + a secret (?) bonus
    • Jan 25-26 – SUMO Day and Social SUMO Day

Knowledge Base & L10n

  • Over 220 edits in the KB in all locales since the last blog post – thank you so much for your work there, Editors of all locales!
  • We mostly had minor content updates here and there, so keep rocking the helpful knowledge in all languages and don’t forget about our forums in case you run into any issues.

Firefox

We are at the end for today, whew. Grab a few interesting links for your enjoyment and information:

We shall see you soon, friends of the web :-) Thank you for a great week and we are looking forward to another one. The further we go into 2017, the more new challenges we encounter… and the more greatness we can achieve together!

Planet MozillaConnected Devices Weekly Program Update, 12 Jan 2017

Connected Devices Weekly Program Update Weekly project updates from the Mozilla Connected Devices team.

Planet MozillaReps Weekly Meeting Jan. 12, 2017

Reps Weekly Meeting Jan. 12, 2017 This is a weekly call with some of the Reps to discuss all matters about/affecting Reps and invite Reps to share their work with everyone.

Planet MozillaTelemetry meets HBase (again)

At the end of November AWS announced that HBase on EMR supported S3 as data store. That’s great news because it means one doesn’t have to keep around an HDFS cluster with 3x replication, which is not only costly but it comes with its own operational burden.

At the same time we had some use cases that could have been addressed with a key-value store and this seemed like a good opportunity to give HBase a try.

What is HBase?

HBase is an open source, non-relational, distributed key-value store which traditionally runs on top of HDFS. It provides a fault-tolerant, efficient way of storing large quantities of sparse data using column-based compression and storage.

In addition, it provides fast lookup of data thanks to indexing and in-memory cache. HBase is optimized for sequential write operations, and is highly efficient for batch inserts, updates, and deletes. HBase also supports cell versioning so one can look up and use several previous versions of a cell or a row.

The system can be imagined as a distributed log-structured merge tree and is ultimately an open source implementation of Google’s BigTable whitepaper. A HBase table is partitioned horizontally in so called regions, which contains all rows between the region’s start and end key. Region Servers are responsible to serve regions while the HBase master handles region assignments and DDL operations.

A region server has:

  • a BlockCache which serves as a LRU read cache;
  • a BucketCache (EMR version only), which caches reads on local disk;
  • a WAL, used to store writes not yet persisted to HDFS/S3 and stored on HDFS;
  • a MemStore per column family (a collection of columns); a MemStore is a write cache which, once it accumulated enough data, is written to a store file;
  • a store file stores rows as sorted key values on HDFS/S3;
<figure class="wp-caption alignnone" id="attachment_4529" style="width: 1600px;">hbase-files.pngHBase architecture with HDFS storage</figure>

This is just a 10000-foot overview of the system and there are many articles out there that go into important details, like store file compaction.

<figure class="wp-caption alignnone" id="attachment_4838" style="width: 800px;">hbase_s3.pngEMR’s HBase architecture with S3 storage and BucketCache</figure>

One nice property of HBase is that it guarantees linearizable consistency, i.e. if operation B started after operation A successfully completed, then operation B must see the the system in the same state as it was on completion of operation A, or a newer state. That’s easy to do since each row can only be served by one region server.

Why isn’t Parquet good enough?

Many of our datasets are stored on S3 in Parquet form. Parquet is a great format for typical analytical workloads where one needs all the data for a particular subset of measurements. On the other hand, it isn’t really optimized for finding needles in haystacks; partitioning and sorting can help alleviate this issue only so much.

As some of our analysts have the need to efficiently access the telemetry history for a very small and well-defined sub-population of our user base (think of test-pilot clients before they enrolled), a key-value store like HBase or DynamoDB fits that requirement splendidly.

HBase stores and compresses the data per column-family, unlike Parquet which does the same per column. That means the system will read way more data than it is actually needed if only a small subset of columns is read during a full scan. And no, you can’t just have a column family for each individual column as column families are flushed in concert. Furthermore, HBase doesn’t have a concept of types unlike Parquet. Both the key and the value are just bytes and it’s up to the user to interpret those bytes accordingly.

It turns out that Mozilla’s telemetry data was once stored in HBase! If you knew that then you have been around at Mozilla much longer than I have. That approach was later abandoned as keeping around mostly un-utilized data in HDFS was costly and typical analytical workloads involving large scans were slow.

Wouldn’t it be nice to have the best of both worlds: efficient scans and fast look-ups? It turns out there is one open system out there currently being developed that aims to feel that gap. Apache Kudu provides a combination of fast inserts/updates and efficient columnar scans to enable multiple real-time analytic workloads across a single storage layer. I don’t think it’s ready for prime time just yet though.

What about DyamoDB?

DymanoDB is a managed key value store. Leaving aside operational costs, it’s a fair question to wonder how much it differs in terms of pricing for our example use case.

The data we are planning to store has a compressed size of about 200 GB (~ 1.2 TB uncompressed) per day and it consists of 400 Millions key-value pairs of about 3 KB each uncompressed. As we are planning to keep around the data for 90 days, the total size of the table would amount to 18 TB.

HBase costs

Let’s say the machines we want to use for the HBase cluster are m4.xlarge which have 16 GB of RAM. As suggested in Hortonwork’s HBase guidelines, each machine could ideally serve about 50 regions. By dividing the the table into say 1000 regions, each region would have a size of 18 GB, which is still in the recommended maximum region size. Since each machine can serve about 50 regions, and we have 1000 regions, it means our cluster should ideally have a size of 20.

Using on-demand EMR prices the cluster would have a monthly cost of:

20 \mathrm{\ nodes\ } \times 30 \mathrm{\ day} \times \frac{24 \ \mathrm{hour}}{\mathrm{day}} \times \frac{0.239\$ + 0.060 \$}{\mathrm{hour} \times \mathrm{node}} = 4306 \$

This is an upper bound as reserved or spot instances cost less.

The daily batch job that pushes data to HBase uses 5 c3.4xlarge machines and takes 5 hours, so it would have a monthly cost of:

5 \mathrm{\ nodes\ } \times 30 \mathrm{\ day} \times \frac{5 \mathrm{\ hour}}{\mathrm{day}} \times \frac{0.840\$ + 0.210 \$}{\mathrm{hour} \times \mathrm{node}} = 788 \$

To keep around about 18 TB of data on S3 we will need 378 $ at 0.021 $ per GB. Note that this doesn’t include the price for the requests which is rather difficult to calculate, albeit low.

In total we have a cost of about 5500 $ per month for the HBase solution.

DynamoDB costs

DynamoDB’s pricing is based on the desired request throughput the system needs to have. The throughput is measured in capacity units. Let’s assume that one write request per second corresponds to 3 write capacity units as one unit of write capacity is limited to items of up to 1 KB in size and we are dealing with items of about 3 KB in size. Let’s also assume that we want to use a batch job, equivalent to the one used for HBase, to push the data into the store. Which means that we want enough write capacity to shovel 400 M pings in 5 hours:

\frac{\mathrm{sec} \times 3\ \mathrm{write\ unit}}{\mathrm{\ write}} \times \frac{400 * 10^6 \mathrm{\ write}}{5 \mathrm{\ hour}} \times \frac{\mathrm{hour}}{3600 \mathrm{\ sec}} \times \frac{0.0065\ \$}{\mathrm{hour} \times 10 \times \mathrm {write \ unit}} \times\frac{5\ \mathrm{hour}}{\mathrm{day}} = 217 \mathrm{\ \$/day}

which amounts to about 6510 $ a month. Note that this is just the cost to push the data in and it’s not considering the cost for reads all day around.

The cost of the storage, assuming the compression ratio is the same as with HBase, is:

\frac{0.25\  \$}{\mathrm{GB}} \times 18000 \ \mathrm{GB} = 4500 \ \$

Finally, if we consider also the cost of the batch job (788 $) we have a total spending of about 11800 $ per month.

In conclusion the HBase solution is cheaper and more flexible. For example, one could keep around historical data on S3 and not have an HBase cluster serving it until it’s really needed. The con is that HBase isn’t automagically managed and as such it requires operational effort.

How do I use this stuff?

We have created a mirrored view in HBase of the main summary dataset which is accessible through a Python API. The API allows one to retrieve the history of main pings for a small subset of client ids sorted by activity date:

view = HBaseMainSummaryView()
history = view.get(sc, ["00000000-0000-0000-0000-000000000000"])

We haven’t yet decided if this is something we want to support and keep around and we will make that decision once we have an understanding of the usefulness it provides to our analysts.


Footnotes

Updated: .  Michael(tm) Smith <mike@w3.org>