Boris Zbarsky (Mozilla)Regexps and writing tests

David Mandelin's blog post about one of the sunspider subtests is chock-full of fun information on regexps. It also highlights...

Surfin' Safari (WebKit)Web Inspector Redesign

It has been nine months since our last Web Inspector update and we have a lot of cool things to talk about. If you diligently use the Web Inspector in nightly builds, you might have seen some of these improvements, while other subtle changes might have gone unnoticed.

Some of the Web Inspector improvements were contributed by members of the WebKit community. We really want to get the whole community involved with making this the best web development tool available. Remember, most of the Web Inspector is written in HTML, JavaScript, and CSS, so it’s easy to get started making changes and improvements.

Redesigned Interface

First and foremost, the Web Inspector is now sporting a new design that organizes information into task-oriented groups — represented by icons in the toolbar. The toolbar items (Elements, Resources, Scripts, Profiles and Databases) are named after the fundamental items you will work with inside the respective panels.

Console

The Console is now accessible from any panel. Unlike the other panels, the Console is not just used for one task — it might be used while inspecting the DOM, debugging JavaScript or analyzing HTML parse errors. The Console toggle button is found in the status bar, causing it to animate in and out from the bottom of the Web Inspector. The Console can also be toggled by the Escape key.

Error and warning counts are now shown in the bottom right corner of the status bar. Clicking on these will also open the Console.

In addition to the visual changes to the Console, we have also greatly improved usability by adding auto-completion and tab-completion. As you type expressions, property names will automatically be suggested. If there are multiple properties with the same prefix, pressing the Tab key will cycle through them. Pressing the Right arrow key will accept the current suggestion. The current suggestion will also be accepted when pressing the Tab key if there is only one matched property.

Our compatibility with Firebug’s command line and window.console APIs has also been greatly improved by Keishi Hattori (服部慶士), a student at The University of Tokyo (東京大学) who tackled this area as a summer project.

Elements Panel

The Elements panel is largely the same as the previous DOM view — at least visually. Under the hood we have made number of changes and unified everything into one DOM tree.

  • Descend into sub-documents — expanding a frame or object element will show you the DOM tree for the document inside that element.
  • Automatic updates — the DOM tree will update when nodes are added to or removed from the inspected page.
  • Inspect clicked elements — enabling the new inspect mode lets you hover around the page to find a node to inspect. Clicking on a node in the page will focus it in the Elements panel and turn off the inspect mode. This was contributed by Matt Lilek.
  • Temporarily disable style properties — hovering over an editable style rule will show checkboxes that let you disable individual properties.

  • Style property editing — double click to edit a style property. Deleting all the text will delete the property. Typing or pasting in multiple properties will add the new properties.
  • Stepping for numeric style values — while editing a style property value with a number, you can use the Up or Down keys to increment or decrement the number. Holding the Alt/Option key will step by 0.1, while holding the Shift key will step by 10.

  • DOM attribute editing — double click to edit a DOM element attribute. Typing or pasting in multiple attributes will add the new attributes. Deleting all the text will delete the attribute.
  • DOM property editing — double click to edit a DOM property in the Properties pane. Deleting all the text will delete the property, if allowed.
  • Metrics editing — double click to edit a any of the CSS box model metrics.
  • Position metrics — the Metrics pane now includes position info for absolute, relative and fixed positioned elements.

Resources Panel

The Resources panel is a supercharged version of the previous Network panel. It has a similar looking timeline waterfall, but a lot has been done to make it even more useful.

  • Graph by size — click Size in the sidebar to quickly see the largest resources downloaded.
  • Multiple sorting options — there are many sorting methods available for the Time graph, including latency and duration.
  • Latency bars — the Time graph now shows latency in the bar with a lighter shade. This is the time between making the request and the server’s first response.
  • Unified resource views — clicking a resource in the sidebar will show you the data pulled from the network (not downloaded again), including the request and response headers.
  • View XHRs — the time and size graphs also show XMLHttpRequests. Selecting an XHR resource in the sidebar will show the XHR data and headers.

Scripts Panel

The previous standalone Drosera JavaScript debugger has been replaced with a new JavaScript debugger integrated into the Web Inspector. The new integrated JavaScript debugger is much faster than Drosera, and should be much more convenient.

From the Scripts panel you can see all the script resources that are part of the inspected page. Clicking in the line gutter of the script will set a breakpoint for that line of code. There are the standard controls to pause, resume and step through the code. While paused you will see the current call stack and in-scope variables in the right-hand sidebar.

The Web inspector has a unique feature regarding in-scope variables: it shows closures, “with” statements, and event-related scope objects separately. This gives you a clearer picture of where your variables are coming from and why things might be breaking (or even working correctly by accident).

Profiles Panel

The brand new JavaScript Profiler in the Profiles panel helps you identify where execution time is spent in your page’s JavaScript functions. The sidebar on the left lists all the recorded profiles and a tree view on the right shows the information gathered for the selected profile. Profiles that have the same name are grouped as sequential runs under a collapsible item in the sidebar.

There are two ways to view a profile: bottom up (heavy) or top down (tree). Each view has its own advantages. The heavy view allows you to understand which functions have the most performance impact and the calling paths to those functions. The tree view gives you an overall picture of the script’s calling structure, starting at the top of the call-stack.

Below the profile are a couple of data mining controls to facilitate the dissection of profile information. The focus button (Eye symbol) will filter the profile to only show the selected function and its callers. The exclude button (X symbol) will remove the selected function from the entire profile and charge its callers with the excluded function’s total time. While any of these data mining features are active, a reload button is available that will restore the profile to its original state.

WebKit’s JavaScript profiler is fully compatible with Firebug’s console.profile() and console.profileEnd() APIs, but you can also specify a title in console.profileEnd() to stop a specific profile if multiple profiles are being recorded. You can also record a profile using the Start/Stop Profiling button in the Profiles panel.

Databases Panel

The Databases panel lets you interact with HTML 5 Database storage. You can examine the contents of all of the page’s open databases and execute SQL queries against them. Each database is shown in the sidebar. Expanding a database’s disclosure triangle will show the database’s tables. Selecting a database table will show you a data grid containing all the columns and rows for that table.

Selecting a database in the sidebar will show an interactive console for evaluating SQL queries. The input in this console has auto-completion and tab-completion for common SQL words and phrases along with table names for the database.

Search

Accompanying the task-oriented reorganization, the search field in the toolbar now searches the current panel with results being highlighted in the context of the panel. Targeting the search to the current panel allows each panel to support specialized queries that are suited for the type of information being shown. The panels that support specialized queries are Elements and Profiles.

The Elements panel supports XPath and CSS selectors as queries in addition to plain text. Any search you perform will be attempted as a plain text search, a XPath query using document.evaluate() and a CSS selector using document.querySelectorAll(). All the search results will be highlighted in the DOM tree, with the first match being revealed and selected.

The Profiles panel supports plain text searches of the function names and resource URLs. Numeric searches are also supported that match rows in the profile’s Self, Total and Calls columns. To facilitate powerful numeric searching, there are a few operators and units that work to extend or limit your results. For example you can search for “> 2.5ms” to find all the functions that took longer than 2.5 milliseconds to execute. In addition to “ms”, the other supported units are: “s” for time in seconds and “%” for percentage of time. The other supported operators are “< ”, “<=”, “>=” and “=”. When no units are specified the Calls column is searched.

In all the panels pressing Enter in the search field or ⌘G (Ctrl+G on Windows and Linux) will reveal the next result. Pressing ⇧⌘G (Ctrl+Shift+G on Windows and Linux) will reveal the previous result. In the Resources, Scripts and Profiles panels the search will be performed on the visible view first and will automatically jump to the first result only if the visible view has a match.

Availability and Contributing

All of these things are available now in the Mac and Windows nightly builds. Give them a try today, and let us know what you like (or don’t like).

If you would like to contribute, there are some really interesting tasks in the list of Web Inspector bugs and enhancements, and other contributors in the #webkit chat room are pretty much always available to provide help and advice.

Ben Smedberg (Mozilla)A Parent’s Most Important Job

I remember clearly the when I first read The Tipping Point. The book was a good read and thought-provoking, but I remember the book most clearly because of a small section near the end:

This [study] is, if you think about it, a rather extraordinary finding. Most of us believe that we are like our parents because of some combination of genes and, more important, of nurture — that parents, to a large extent, raise us in their own image. But if that is the case, if nurture matters so much, then why did the adopted kids not resemble their adoptive parents at all? The Colorado study isn’t saying that genes explain everything and that environment doesn’t matter. On the contrary, all of the results strongly suggest that our environment plays as big — if not bigger — a role as heredity in shaping personality and intelligence. What it is saying is that whatever that environmental influence is, it doesn’t have a lot to do with parents. It’s something else, and what Judith Harris argues is that that something else is the influence of peers.

Why, Harris asks, do the children of recent immigrants almost never retain the accent of their parents? How is it the children of deaf parents manage to learn how to speak as well and as quickly as children whose parents speak to them from the day they were born? The answer has always been that language is a skill acquired laterally — that what children pick up from other children is as, or more, important in the acquisition of language as what they pick up at home. What Harris argues is that this is also true more generally, that the environmental influence that helps children become who they are ‒that shapes their character and personality — is their peer group.

Expressed this way, I think it’s easy to come to the wrong conclusion: that parents have little influence over their children. A more useful inference would be:

A parent’s most important duty is to find the best possible peers for their children.

Ben Smedberg (Mozilla)Generating Documentation With Dehydra

One of the common complaints about the Mozilla string code is that it’s very difficult to know what methods are available on a given class. Reading the code is very difficult because it’s hidden behind a complex set of #defines, it’s parameterized for both narrow and wide strings, and because we have a deep and complex string hierarchy. The Mozilla Developer Center has a string guide, but not any useful reference documentation.

With a little hand-holding, static analysis tools can produce very useful reference documentation, which other tools simply cannot make. For example, because a static analysis tool knows the accessibility of methods, you can create a reference document that contains only the public API of a class. I spent parts of yesterday and today tweaking a Dehydra script to produce a string reference. I’m working with Eric Shepherd to figure out the best way to automatically upload the data to the Mozilla Developer Center, but I wanted to post a sample for comment. This is the public API of nsACString:

Reference for nsACString (internal version)

I am trying to keep the format of this document similar to the format we use for interfaces on MDC. It’s a bit challenging, because C++ classes have overloaded method names and frequently have many methods. In the method summary, I have grouped together all the methods with the same name.

Once the output and format are tweaked, I can actually hook the entire generation and upload process to a makefile target, and either run it on my local machine or hook it up to a buildbot. I used E4X to do the actual XML generation. It was a learning experience… I don’t think I’m a fan. I want Genshi for JavaScript. Making conditional constructs in E4X is slightly ugly, and making looping constructs is really painful: my kingdom for an XML generator so that I don’t have to loop and append to an XMLList.

Ben Smedberg (Mozilla)Salmon Cakes

I love crab cakes. But at least here in Johnstown, refrigerated crab meat is expensive enough that making crab cakes on a regular basis is impractical. But there is an affordable alternative that tastes almost as good: Salmon cakes. Canned salmon is inexpensive and is a great substitute; you can find it near the canned tuna at pretty much any decent supermarket.

Ingredients

  • 2 cans salmon (15.75oz each)
  • 1 cup breadcrumbs
  • lots of pepper
  • Spices:
    • 1 teaspoon ground mustard
    • 1 teaspoon paprika
    • 1/2 teaspoon cumin
    • 1/2 teaspoon red pepper flakes
    • Or whatever else strikes your fancy
  • 1 large onion, diced fine
  • 2 eggs, lightly beaten
  • bacon fat or frying oil (peanut, canola, sunflower, or soy oil)

Hardware

  • mixing bowl
  • can opener
  • fine strainer
  • griddle or large skillet (cast iron is best, but any heavy pan will do)
  • Metal spatula-like device: an offset spatula is best
  • Wire rack for draining: for best results, turn the rack upside-down in contact with newspaper.

Preparation

  1. Drain the salmon into a strainer. Pick through the fish and remove any backbone or other large bones, if present
  2. In a mixing bowl, combine the breadcrumbs and spices and toss
  3. Add the salmon, eggs, and onion to the bowl. Combine the ingredients with your hands. The mixture should be somewhat sticky. If it is dry, add another egg.
  4. Form the cakes with your hands:
    • The cakes can be any size from half-fist to fist sized. The cake should be a disc about twice as wide as it is thick… I can typically make 10 large-ish cakes from this recipe.
    • Squeeze in both hands to compact into roughly the correct shape.
    • While holding in the palm of one hand, cup your other hand around the outside of the cake to form it into a round.
  5. Heat the griddle on medium heat and add the frying fat.
  6. When water gently sizzles in the fat (3-4 minutes), add the cakes. It’s ok to place them close together.
  7. Turn when the first side is brown… I prefer a dark mahogany (~7 minutes), but many people prefer a more golden color (~5 minutes)
  8. When the second side is done, remove to the wire rack for draining and cover with foil. Serve immediately.

Service Suggestions

  • For a dipping sauce prepare sour cream with chives, or tartar sauce if you’re feeling very traditional.
  • Salmon cakes work well as a main dish, but you could also make smaller ones as hors d’œuvre or in a surf-n-turf combo.
  • On a cold day, pair with a warm vegetable soup.
  • On a warm day, pair with a cucumber salad.
  • Serve with Sauvignon Blanc or Corona.

Notes

Canned Salmon typically has a lot of added salt. You don’t need to add any salt, and I’d avoid salted seasoning blends (Old Bay) as well. Because the salmon is fully cooked, feel free to check for seasoning before frying.

I’ve seen recipes where the cakes are breaded before frying, typically with crushed saltine crackers. I can’t for the life of me figure out why.

If you are like me and instinctively add garlic to any dish calling for diced onions, please resist the temptation.

Ben Smedberg (Mozilla)Allocated Memory and Shared Library Boundaries

When people get started with XPCOM, one of the most confusing rules is how to pass data across XPCOM boundaries. Take the following method:

IDL markup

string getFoo();

C++ generated method signature

nsresult GetFoo(char **aResult);

Diagram showing transfer of allocation 'ownerhip' from the implementation method to the calling method

C++ Implementation

The aResult parameter is called an “out parameter”. The implementation of this method is responsible for allocating memory and setting *aResult:

nsresult
Object::GetFoo(char **aResult)
{
  // Allocate a string to pass back
  *aResult = NS_Alloc(4);

  // In real life, check for out-of-memory!
  strcpy(*aResult, “foo”);

  return NS_OK;
}

C++ Caller

The caller, after it is finished with the data, is responsible for freeing the data.

char *foo;
myIFace->GetFoo(&foo);
// do something with foo
NS_Free(foo);

The important thing to note is that the code doesn’t allocate memory with malloc, and doesn’t free it with free. All memory that is passed across XPCOM boundaries must be allocated with NS_Alloc and freed with NS_Free.

We have this rule because of mismatched allocators. Depending on your operating system and the position of the moon, each shared library may have its own malloc heap. If you malloc memory in one shared library and free it in a different library, the heap of each library may get corrupted and cause mysterious crashes. By forcing everyone to use the NS_Alloc/Free functions, we know that all code is using the same malloc heap.

Helper Functions

In most cases, there are helper functions which make following the rules much easier. On the implementation side, the ToNewUnicode and ToNewCString functions convert an existing nsAString/nsACString to an allocated raw buffer.

On the caller side, you should almost always use helper classes such as nsXPIDLString to automatically handle these memory issues:

Better C++ Caller

nsXPIDLCString foo;
myIFace->GetFoo(getter_Copies(foo));
// do something with foo

Impact on Extension Authors

It is especially important for extension authors to follow this advice on Windows. The Windows version of Firefox uses custom version of the Windows C runtime which we’ve patched to include the high-performance jemalloc allocator. Extension authors should link the C runtime statically, which guarantees that they will have a mismatched malloc/free heap.

Notes

Surfin' Safari (WebKit)Full Pass of Acid3

Today we would like to announce that WebKit is the first browser engine to fully pass Acid3. A while back, we posted that we scored 100/100 and matched the reference rendering. Now, thanks to recent speedups in JavaScript, DOM and rendering, we have passed the third condition, smooth animation on reference hardware.

Here is a screenshot of a successful run:

Here is the timing reference dialog you get by clicking on the “A” in Acid3 that confirms we pass the smooth animation condition on a 2.4GHz MacBook Pro:

To try it for yourself, grab a nightly. Keep in mind that on slower machines, the timing may not be perfect, and you need to do a cached run of the test (load it once, close window, open new window, load it again) to avoid delays from the network.

Ben Smedberg (Mozilla)What’s so bad about a liquidity crisis?

I’ve been trying to follow the news and commentary about the “bailout” and financial markets in some detail; but there must be some obvious background knowledge I’m missing. From watching bits of the congressional hearing yesterday, and reading the newspapers, it seems that the major purpose of the bailout is “restore liquidity to the markets”, which seems to be an economist’s synonym for “make sure the markets can still loan people money”.

What would happen if the markets stopped loaning people money? For consumers at least, there would be some short-term pain: people have been expecting to be able to use easy credit, so they haven’t saved money for a new car, Christmas presents, and so forth. The housing market would certainly change, and housing prices would drop even further because of a lack of buyers. But would that actually significantly disrupt the economy? Wouldn’t the population save their money for a few years, limp along with their old car, and then buy a new one with saved cash?

Presumably after all the existing bad securities are untangled, some banks will start to be able to loan money again, and these lenders will set stricter requirements on collateral and verified ability to repay loans.

Perhaps the consequences are more serious if business credit disappeared: capitalizing a new business or making capital improvements to an existing business pretty much requires credit. If we want to preserve this essential use of credit to keep the real economy strong (not the speculative market economy), isn’t there a way the U.S. government could guarantee this kind of credit for business capital loans much more cheaply than $700B, and let the chips fall where they might everywhere else?

I invite the blogosphere to link me up to classic economic treatises and modern articles which could help me understand how a liquidity crisis would cause the economy to simply collapse.

Ben Smedberg (Mozilla)Call for Help: Boehm+jemalloc

At the Firefox summit we decided to change tack on XPCOMGC and try to use Boehm instead of MMgc. Overall I think this was a really good decision. With help from Graydon, I even have some Linux builds that use Boehm under the hood (most memory is not considered collectable, only string buffers are collected objects at this point).

Unfortunately, while Boehm is a pretty good collector, it doesn’t do so well at memory allocation and fragmentation. Heap usage is between 1.5x and 2x that of standard Firefox using jemalloc. What I really want is a combination of jemalloc and Boehm, taking the best features from each:

Boehm Features:

  • Fast and threadsafe conservative collector
  • Smart rooting of all thread stacks and static data
  • Incremental marking with hardware write barriers1
  • Option for parallel collection2
  • Ability to intermingle collected and non-collected memory

jemalloc features:

  • Better overall memory usage, primarily due to lower fragmentation
  • Very tight and well-performing allocator

Help Wanted

I’m looking for somebody who’s willing to painstakingly combine the best of these two allocators: either port the jemalloc low-fragmentation design to Boehm, or port the Boehm collection mechanism to the jemalloc allocator. If you’re interested, please contact me. Getting a solution to this problem really blocks any serious plans for further work on XPCOMGC.

Notes

  1. The key word is hardware. The MMgc solution failed because altering our codebase to have correct programmatic write barriers was going to involve boiling the ocean. And even with smart pointers, a standard MMgc write barrier involves a lot of overhead.
  2. In Boehm, parallel collection doesn’t work with most incremental collection, and so we may not actually decide to use it; avoiding large pauses with incremental collection is more important.

Ben Smedberg (Mozilla)Are you going to start plating soon?

I’m hoping to start blogging more regularly, and model my blog after The Old New Thing. So I’m planning on posting in pairs: one technical post related to my work or Mozilla, and one non-technical post relating to personal posts about my family, music, or other things I find interesting.

I think I am raising Food Network junkies. I was making BLT sandwiches for lunch the other day, and my three-year-old daughter Claire was hungry. She asked me “are you going to start plating soon?”

She adores Michael Symon, and often when we turn on Dinner: Impossible she says “I love Michael Symon, he’s beeeaautiful.” Claire wants to be an Iron Chef when she grows up. Ellie really enjoys when Cat Cora is the Iron Chef, or when there’s a female challenger in general. And they all watch Good Eats with rapt attention.

It’s amazing to me how much and how quickly they learn. We play a “how do we cook it” game: I pick an ingredient, and they tell me how you’d cook it. Ellie, who is four years old, recently said to me, “Daddy, if you don’t heat the carrot pieces in a pan with oil, they won’t be soft in your carrot stew.” In their play-kitchen, they are always concocting soups and baked goods, and arguing about ingredients.

Ben Smedberg (Mozilla)When linking, the order of your command-line can be important

Occasionally, people will come on the #xulrunner or #extdev channel with a question about compiling XPCOM components. The question often goes something like this:

<IRCGuy> I’m following a tutorial on making XPCOM components, but I can’t seem to get them to compile. Can anyone tell me what my problem is?

Hint for asking a good question: IRCGuy needs to tell us some combination of 1) what tutorial he’s following, 2) what the failing command is or 3) what the error message is.

This time, IRCGuy’s compile command and error message are:

IRCGuy@IRCGuy-france:/mnt/data/IRCGuy/public/project/xpcom-test$ make
g++  -I/usr/local/include/xulrunner-1.9/unstable -I/usr/local/include/xulrunner-1.9/stable -L/usr/local/lib/xulrunner-devel-1.9/lib -Wl,-rpath-link,/usr/local/bin  -lxpcomglue_s -lxpcom -lnspr4 -fno-rtti -fno-exceptions -shared -Wl,-z,defs  france2.cpp -o france2.so
/tmp/cceFg2dD.o: In function `NSGetModule':
france2.cpp:(.text+0x38c): undefined reference to `NS_NewGenericModule2(nsModuleInfo const*, nsIModule**)'

IRCGuy’s problem is a problem of link ordering: with most unix-like linkers, it is very important to list object files and libraries in the correct order. The general order you want to follow is as follows:

  1. Object files
  2. Static libraries - specific to general
  3. Dynamic libraries

If an object file needs a symbol, the linker will only resolve that symbol in static libraries that are later in the link line.

The corrected command:

g++  -I/usr/local/include/xulrunner-1.9/unstable -I/usr/local/include/xulrunner-1.9/stable -fno-rtti -fno-exceptions -shared -Wl,-z,defs  france2.cpp -L/usr/local/lib/xulrunner-devel-1.9/lib -Wl,-rpath-link,/usr/local/bin  -lxpcomglue_s -lxpcom -lnspr4 -o france2.so

Bonus tip: correct linker flags for linking XPCOM components can be found on the Mozilla Developer Center article on the XPCOM Glue. As noted in the article, xpcom components want to use the “Dependent Glue” linker strategy.

Surfin' Safari (WebKit)Introducing SquirrelFish Extreme

Just three months ago, the WebKit team announced SquirrelFish, a major revamp of our JavaScript engine featuring a high-performance bytecode interpreter. Today we’d like to announce the next generation of our JavaScript engine - SquirrelFish Extreme (or SFX for short). SquirrelFish Extreme uses more advanced techniques, including fast native code generation, to deliver even more JavaScript performance.

For those of you who follow WebKit development and are interested in contributing, we’d like to report our results and what we did to achieve them.

How Fast is It?

This chart shows WebKit’s JavaScript performance in different versions - bigger bars are better.

bar graph showing WebKit 3.0: 5.4; WebKit 3.1: 18.8; SquirrelFish: 29.9; SquirrelFish Extreme: 63.6

The metric is SunSpider runs per minute. We present charts this way because “bigger is better” is easier to follow when you have a wide range of performance results. As you can see, SquirrelFish Extreme as of today is more than twice as fast as the original SquirrelFish, and over 10 times the speed you saw in Safari 3.0, less than a year ago. We are pretty pleased with this improvement, but we believe there is more performance still to come.

Quite a few people contributed to these results. I will mention a few who worked on some key tasks, but I’d also like to thank all of the many WebKit contributors who have helped with JavaScript and performance.

What makes it so fast?

SquirrelFish Extreme uses four different technologies to deliver much better performance than the original SquirrelFish: bytecode optimizations, polymorphic inline caching, a lightweight “context threaded” JIT compiler, and a new regular expression engine that uses our JIT infrastructure.

1. Bytecode Optimizations

When we first announced SquirrelFish, we mentioned that we thought that the basic design had lots of room for improvement from optimizations at the bytecode level. Thanks to hard work by Oliver Hunt, Geoff Garen, Cameron Zwarich, myself and others, we implemented lots of effective optimizations at the bytecode level.

One of the things we did was to optimize within opcodes. Many JavaScript operations are highly polymorphic - they have different behavior in lots of different cases. Just by checking for the most common and fastest cases first, you can speed up JavaScript programs quite a bit.

In addition, we’ve improved the bytecode instruction set, and built optimizations that take advantage of these improvements. We’ve added combo instructions, peephole optimizations, faster handling of constants and some specialized opcodes for common cases of general operations.

2. Polymorphic Inline Cache

One of our most exciting new optimizations in SquirrelFish Extreme is a polymorphic inline cache. This is an old technique originally developed for the Self language, which other JavaScript engines have used to good effect.

Here is the basic idea: JavaScript is an incredibly dynamic language by design. But in most programs, many objects are actually used in a way that resembles more structured object-oriented classes. For example, many JavaScript libraries are designed to use objects with “x” and “y” properties, and only those properties, to represent points. We can use this knowledge to optimize the case where many objects have the same underlying structure - as people in the dynamic language community say, “you can cheat as long as you don’t get caught”.

So how exactly do we cheat? We detect when objects actually have the same underlying structure — the same properties in the same order — and associate them with a structure identifier, or StructureID. Whenever a property access is performed, we do the usual hash lookup (using our highly optimized hashtables) the first time, and record the StructureID and the offset where the property was found. Subsequent times, we check for a match on the StructureID - usually the same piece of code will be working on objects of the same structure. If we get a hit, we can use the cached offset to perform the lookup in only a few machine instructions, which is much faster than hashing.

Here is the classic Self paper that describes the original technique. You can look at Geoff’s implementation of the StructureID class in Subversion to see more details of how we did it.

We’ve only taken the first steps on polymorphic inline caching. We have lots of ideas on how to improve the technique to get even more speed. But already, you’ll see a huge difference on performance tests where the bottleneck is object property access.

3. Context Threaded JIT

Another major change we’ve made with SFX is to introduce native code generation. Our starting point is a technique called a “context threaded interpreter”, which is a bit of a misnomer, because this is actually a simple but effective form of JIT compiler. In the original SquirrelFish announcement, we described our use of direct threading, which is about the fastest form of bytecode intepretation short of generating native code. Context threading takes the next step and introduces some native code generation.

The basic idea of context threading is to convert bytecode to native code, one opcode at a time. Complex opcodes are converted to function calls into the language runtime. Simple opcodes, or in some cases the common fast paths of otherwise complex opcodes, are inlined directly into the native code stream. This has two major advantages. First, the control flow between opcodes is directly exposed to the CPU as straight line code, so much dispatch overhead is removed. Second, many branches that were formerly inside opcode implmentations are now inline, and made visible and highly predictable to the CPU’s branch predictor.

Here is a paper describing the basic idea of context threading. Our initial prototype of context threading was created by Gavin Barraclough. Several of us helped him polish it and tune the performance over the past few weeks.

One of the great things about our lightweight JIT is that there’s only about 4,000 lines of code involved in native code generation. All the other code remains cross platform. It’s also surprisingly hackable. If you thought compiling to native code is rocket science, think again. Besides Gavin, most of us have little prior experience with native codegen, but we were able to jump right in.

Currently the code is limited to x86 32-bit, but we plan to refactor and add support for more CPU architectures. CPUs that are not yet supported by the JIT can still use the interpreter. We also think we can get a lot more speedups out of the JIT through techniques such as type specialization, better register allocation and liveness analysis. The SquirrelFish bytecode is a good representation for making many of these kinds of transforms.

4. Regular Expression JIT

As we built the basic JIT infrastructure for the main JavaScript language, we found that we could easily apply it to regular expressions as well, and get up to a 5x speedup on regular expression matching. So we went ahead and did that. Not all code spends a bunch of time in regexps, but with the speed of our new regular expression engine, WREC (the WebKit Regular Expression Compiler), you can write the kind of text processing code you’d want to do in Perl or Python or Ruby, and do it in JavaScript instead. In fact we believe that in many cases our regular expression engine will beat the highly tuned regexp processing in those other languages.

Since the SunSpider JavaScript benchmark has a fair amount of regexp content, some may feel that developing a regexp JIT is an “unfair” advantage. A year ago, regexp processing was a fairly small part of the test, but JS engines have improved in other areas a lot more than on regexps. For example, most of the individual tests on SunSpider have gotten 5-10x faster in JavaScriptCore — in some cases over 70x faster than the Safari 3.0 version of WebKit. But until recently, regexp performance hadn’t improved much at all.

We thought that making regular expressions fast was a better thing to do than changing the benchmark. A lot of real tasks on the web involve a lot of regexp processing. After all, fundamental tasks on the web, like JSON validation and parsing, depend on regular expressions. And emerging technologies — like John Resig’s processing.js library — extend that dependency ever further.

A Word About Benchmarks

We have included some performance results, but don’t take our word for it. You can get WebKit nightlies for Mac and Windows and try for yourself.

The primary benchmark we use to track JavaScript performance is SunSpider. Although, like all benchmarks, it has its flaws, we think it is a balanced test that covers many dimensions of the JavaScript language and many types of code. If you look at test by test results, you will see that different JavaScript implementations have their own strengths and weaknesses. Browser vendors and independent testers have been tracking this benchmark.

Next Steps and How You Can Contribute

We believe the SquirrelFish Extreme architecture has room for lots more optimization, and we’d love to see more developers and testers pitch in. Currently, we are looking at how to use the bytecode infrastructure to perform more information gathering at runtime and then using it to drive better code generation, and we are studying ways to make JS function calls faster. There is also a lot of basic tuning work to do to take more advantage of the basic architectural advances in SFX. In addition, we’re interested in having JIT back ends for other CPU architectures.

If you’d like to follow the development of WebKit’s JavaScript engine more closely, we have created the squirrelfish-dev@lists.webkit.org mailing list (subscribe here) and the #squirrelfish IRC channel on the FreeNode IRC network. Stop on by and you can learn more about our plans, and how you can help.

Try it Out

Try it, test it, browse with it. It’s now available in nightlies. We hope the changes we’ve made help improve your experience of the web.

UPDATE: For the curious, here are some comparisons of SFX to other leading JavaScript engines. Charles Ying has comparisons on a few more benchmarks.

UPDATE 2: For those of you who just can’t get enough of our little mascot, click the SquirrelFish below in a recent WebKit nightly for a demo of SVG animation support.

the SquirrelFish mascot

Boris Zbarsky (Mozilla)Poor test writing, part 1: Celtic Kane JavaScript speed test

This is the first in what will probably be a series of posts about poorly-written tests. Writing good tests is...

Robert Sayre (Mozilla)Just So We’re All On The Same Page, As It Were

All combinations of clear text HTTP traffic, cross domain JavaScript, DNS, and SSL present a security problem, in all browsers.

Robert Sayre (Mozilla)Mozilla is Linux

I think there’s been great progress on the Firefox first run experience. One thing that disturbed me about the initial response was the us vs. them mentality present in the feedback. It’s as though Linux users feel that Mozilla is a giant evil entity of some kind. The fact of the matter is that Mozilla is a tiny company that operates almost completely in the open, which means we make mistakes in public, and lots of us use Linux! We might have to run it on a macbook, because we need to fix bugs on three platforms minimum, but we do it. We’re also really small. Smaller than Opera, let alone Microsoft, Apple, and Google. We do employ a few lawyers. They believe in free software too.

Linux folks, please remember, we are you. As both users and developers. Here’s some Mozilla involvement you might not know about:

  • Mozilla SpiderMonkey is used by many JavaScript embeddings, and distributed as a separate package on Debian.
  • Mozilla developers actively contribute to Cairo, and Firefox 3 ships Cairo on all supported platforms. This was not practical or performant when Mozilla developers began contributing to Cairo.
  • Mozilla developers have contibuted to libbpng, libjpeg, littlecms, pango, pixman, and other graphics libraries.
  • Mozilla developers have actively contributed to valgrind, and built some amazing software on top of it.
  • Mozilla developers now maintain both GNOME and Qt UI code, and contribute patches upstream.
  • Mozilla helps to fund the SQLite Consortium.
  • Many Linux projects use bugzilla.
  • In 2007 alone, the Mozilla Foundation gave grants to the Perl Foundation, Creative Commons, the GNOME foundation, and the FSF, among others. It awarded contracts for work concerning OpenSSL, Accerciser, Orca, GNOME accessibility, and lots more open source accessibility work.
  • Mozilla code and tests are used by WebKit and Chrome, so if you use one of those, Mozilla has helped make it real. It’s not Internet Explorer, and that’s what’s most important. :)

Robert Sayre (Mozilla)Shapes

Everyone’s doing it.

Brendan added shapes to SpiderMonkey for Firefox 3 earlier this year. These exploit the latent types present in most JavaScript code. This gave the non-JIT bytecode VM in Firefox 3 quite a speedup. In fact, it’s still the fastest non-beta JS implementation. But, wow, there sure has been progress. In development versions, I think everyone other than IE is faster than Firefox 3 at this point.

In v8, I guess they call it a hidden class system. In SquirrelFish, I gather they call it StructureID.

Ian McKellar (Flock)Meanwhile, in the day job

A couple of months ago my role at Songbird shifted a little. Up till then I was working on the core product, fixing bugs and adding features across the whole product as part of the bird engineering team. Since we started working on 0.7 (aka Fugazi) I moved into a group initially called strategic development which then split and merged with the design and product group.

We’ve been looking through feedback from our users, primarily through surveys to determine what features we can add that will address most users’ feature requests. Doing this outside engineering has been great since they can focus on improving the core product, keeping a clear vision of what the product we want should be, while we’re working directly from end-user feedback.

My first project was a new Last.fm addon using our new playback history API. It was installed by default for all Songbird 0.7 users so it’s had quite a bit of use, some good feedback, some bug fixes and now some translations:

My next project was taking GeorgesSeeqpod addon, updating it and getting it ready for inclusion in Songbird. It wasn’t ready for Fugazi, but hopefully it’ll be ready for Genesis (our next release). Seeqpod is an MP3 search engine and our addon attempts to integrate it nicely into Songbird. I streamed a lot of random music while developing it. I spent a whole day listening to 80s Metallica.

Now I’m working on better music store integration using Songbird’s Web Page API.

Yngve Nysæter Pettersen (Opera)Extended Validation Update


Today we have formally EV-enabled the first two CAs.

For more information, please see The Rootstore's announcement

Vladimir Vukicevic (Mozilla)TraceMonkey: Coming To A Pocket Near You

The recent posts about TraceMonkey and JavaScript performance have all focused on x86, because, unsurprisingly, the majority of the web's desktop users are on x86 platforms.  However, mobile and handheld platforms are going to quickly become consumers of the full web, and core performance gains will often yield much more significant user-percptible performance improvements.  For example, on a desktop, a 5x speedup from 500ms to 100ms for a particular action results in "hey, that feels snappier".  On lower power devices, if the same testcase originally takes 5s, speeding it up to 1s turns an action that wasn't usable into something that is.

Over the past few weeks, I've spent some time getting nanojit working on ARM.  There were two pieces of this work: the first was adding support for emulated floating point, for use on devices that do not have a floating point unit.  This work is portable to any other platform without hardware floating point; it simply translates all floating point instructions within nanojit into appropriate function calls.  The other piece was adding support to the nanojit ARM backend for the VFP (vector floating point) unit that's present in most recent ARM cores, and emitting native VFP code.  The current speedup gains are in many cases quite similar to what we see on x86, though there is still much more ARM-specific work to be done to generate the most efficient code possible.

Let's look at the current speedup state.  Here are a few microbenchmarks from the SunSpider suite, testing a few core JS operations.  All the numbers are the speedup factor over current SpiderMonkey with tracing disabled (i.e., "5" means "5x as fast as no-tracing SpiderMonkey").

Next up are the individual results of the SunSpider benchmarks.

The large speedups are things that TraceMonkey can handle well currently, where most, if not all, of the benchmark is successfully traced.  The tail of tests that don't show any performance improvement are largely due to missing tracemonkey features, leading to a trace abort — the point at which the tracing infrastructure needs to go back to the interpreter because of an operation that it doesn't know to express.  One notable exception to that is the crypto-md5 test — the trace succeeds, but it's so large that executing the CSE optimization pass dwarfs any performance gains that happen on trace.  Hackers are on the case!

It's important to note that, much like on x86, this is still the early days of performance wins that are possible.  Core improvements in tracing will have an effect on both x86 and ARM (as well as x86-64, the three currently supported nanojit backends — anyone interested in doing a Sparc and/or PPC backend?), and there's still lots of work being done on nanojit itself.  The result of all this work will be a richer web experience on mobile and embedded devices, by allowing those users to take advantage of modern web applications that do much of their work on the browser instead of server side.  Mobile users should be able to try out the JIT in the next alpha release of Fennec by enabling a config setting, like users of our desktop Firefox nightly builds can do today.

This work was largely done on a BeagleBoard, which, as I mentioned earlier, is a great little device for any ARM work, or as a speedy little computer for multimedia/car PC/whatever else purposes.  Chris Blizzard just convinced me to do a separate blog post about the beagle, including all the bits and pieces that I needed to get things to work so that he can replicate my setup, so I'll talk about that separately soon!

Vladimir Vukicevic (Mozilla)Better Random Thoughts Than None At All

A bunch of things have happened over the past few weeks, and if I didn't write about them now, chances are I would've given up and just dropped them on the floor.  Shortly after returning from the Mozilla Summit — and surviving power outages (except for our wireless network, it stayed up, even though the rest of the hotel was out of power!), landslides, bears — I went to SIGGRAPH where I was on a panel as part of the cohosted Web3D Symposium.  There were a few things that happened at the conference that I'm excited about, and I'll be writing about those soon.

After SIGGRAPH, Mozilla hosted a number of core Cairo developers at the first-ever Cairo Summit.  It was great having everyone in the same room together (well, almost — Behdad was omnipresent by phone).  Carl and others took some great notes about the discussions, where we were able to both solve existing problems as well as do some planning for future work.

One particular work item that I was able to start at the summit was to begin setting up a buildbot for Cairo, so that we have constant build and test runs on all the major platforms that Cairo supports.  The initial plan is to set up a few flavours of Linux (Fedora and Ubuntu, as well as an x86-64 variant), Windows XP, and Mac OS X; in addition, Mozilla will be providing some Windows and Mac OS X machines that Cairo developers can use remotely for testing and development work on those platforms.  Having the builders also means that we'll be able to provide nightly (and release) binary packages for those platforms, thus lowering the bar for new developers who wish to start working whit Cairo on non-Linux systems.

We've still got a lot of work ahead of us both in Cairo and in Mozilla's graphics layer, but at this point the main focus is going to be performance.  This was the main outcome from the graphics session at the summit as well — Cairo and Thebes expose the right functionality, but the uncommon cases just need to be faster.  Jeff's been focusing optimizing common paths in pixman for ARM, and we'll soon be able to start focusing on GL acceleration.  We have some things to solve before we can go fully down the GL/DirectX route (plugins, compositor), but all that should come together.

Cairo's turned into a pretty solid cross-platform graphics library, and I'm excited that we've been able to contribute.  Another graphics library, Skia, has recently appeared as part of the Google Chrome code drop.  It's unfortunate that Google felt they needed to develop their own alternative in a closed fashion instead of joining an existing open source project.  The Cairo project, and through it the many open source projects that depend on it, could have benefitted from the work that was done on Skia behind closed doors.  Even worse, unlike most of the rest of the Chrome code, Skia is licensed under the Apache Public License v2.0.  This creates difficulties in being able to reuse the Skia code in most projects.  (From looking at the source, it seems like there's a private "upstream" for Skia as well, and that the Chrome version is just a copy... I guess maybe it'll get thrown over the wall at some point as well.)  I'll certainly be taking a detailed look, though.

I've also been spending time working on some aspects of our platform's mobile story; I picked up a BeagleBoard a while ago, and it's been a fantastic platform for development and profiling of Gecko on ARM.  That work deserves its own post, though, so I'll write up something later on this week.  (That's two things that I've now set myself up for followup posts about; I'll probably blog more this week than I have in the last two months!)

Brendan Eich (Mozilla)Popularity

It seems (according to one guru, but coming from this source, it's a left-handed compliment) that JavaScript is finally popular.

To me, a nerd from a tender age, this is something between a curse and a joke. (See if you are in my camp: isn't the green chick hotter?)

Brendan Eich convinced his pointy-haired boss at Netscape that the Navigator browser should have its own scripting language, and that only a new language would do, a new language designed and implemented in big hurry, and that no existing language should be considered for that role.

I don't know why Doug is making up stories. He wasn't at Netscape. He has heard my recollections about JavaScript's birth directly, told in my keynotes at Ajax conferences. Revisionist shenanigans to advance a Microhoo C# agenda among Web developers?

Who knows, and it's hard to care, but in this week of the tenth anniversary of mozilla.org, a project I co-founded, I mean to tell some history.

As I've often said, and as others at Netscape can confirm, I was recruited to Netscape with the promise of "doing Scheme" in the browser. At least client engineering management including Tom Paquin, Michael Toy, and Rick Schell, along with some guy named Marc Andreessen, were convinced that Netscape should embed a programming language, in source form, in HTML. So it was hardly a case of me selling a "pointy-haired boss" -- more the reverse.

Whether that language should be Scheme was an open question, but Scheme was the bait I went for in joining Netscape. Previously, at SGI, Nick Thompson had turned me on to SICP.

What was needed was a convincing proof of concept, AKA a demo. That, I delivered, and in too-short order it was a fait accompli.

Of course, by the time I joined Netscape, and then transferred out of the server group where I had been hired based on short-term requisition scarcity games (and where I had the pleasure of working briefly with the McCool twins and Ari Luotonen; later in 1995, Ari and I would create PAC), the Oak language had been renamed Java, and Netscape was negotiating with Sun to include it in Navigator.

The big debate inside Netscape therefore became "why two languages? why not just Java?" The answer was that two languages were required to serve the two mostly-disjoint audiences in the programming ziggurat who most deserved dedicated programming languages: the component authors, who wrote in C++ or (we hoped) Java; and the "scripters", amateur or pro, who would write code directly embedded in HTML.

Whether any existing language could be used, instead of inventing a new one, was also not something I decided. The diktat from upper engineering management was that the language must "look like Java". That ruled out Perl, Python, and Tcl, along with Scheme. Later, in 1996, John Ousterhout came by to pitch Tk and lament the missed opportunity for Tcl.

I'm not proud, but I'm happy that I chose Scheme-ish first-class functions and Self-ish (albeit singular) prototypes as the main ingredients. The Java influences, especially y2k Date bugs but also the primitive vs. object distinction (e.g., string vs. String), were unfortunate.

Back to spring of 1995: I remember meeting Bill Joy during this period, and discussing fine points of garbage collection (card marking for efficient write barriers) with him. From the beginning, Bill grokked the idea of an easy-to-use "scripting language" as a companion to Java, analogous to VB's relationship to C++ in Microsoft's platform of the mid-nineties. He was, as far as I can tell, our champion at Sun.

Kipp Hickman and I had been studying Java in April and May 1995, and Kipp had started writing his own JVM. Kipp and I wrote the first version of NSPR as a portability layer underlying his JVM, and I used it for the same purpose when prototyping "Mocha" in early-to-mid-May.

Bill convinced us to drop Kipp's JVM because it would lack bug-for-bug compatibility with Sun's JVM (a wise observation in those early days). By this point "Mocha" had proven itself via rapid prototyping and embedding in Netscape Navigator 2.0 , which was in its pre-alpha development phase.

The rest is perverse, merciless history. JS beat Java on the client, rivaled only by Flash, which supports an offspring of JS, ActionScript.

So back to popularity. I can take it or leave it. Nevertheless, popular Ajax libraries, often crunched and minified and link-culled into different plaintext source forms, are schlepped around the Internet constantly. Can we not share?

One idea, mooted by many folks, most recently here by Doug, entails embedding crypto-hashes in potentially very long-lived script tag attributes. Is this a good idea?

Probably not, based both on theoretical soundness concerns about crypto-hash algorithms, and on well-known poisoning attacks.

A better idea, which I heard first from Rob Sayre: support an optional "canonical URL" for the script source, via a share attribute on HTML5 <script>: If the browser has already downloaded the shared URL, and it still is valid according to HTTP caching rules, then it can use the cached (and pre-compiled!) script instead of downloading the src URL.

This avoids hash poisoning concerns. It requires only that the content author ensure that the src attribute name a file identical to the canonical ("popular") version of the library named by the shared attribute. And of course, it requires that we trust the DNS. (Ulp.)

This scheme also avoids embedding inscrutable hashcodes in script tag attribute values.

Your comments are welcome.

Ok, back to JavaScript popularity. We know certain Ajax libraries are popular. Is JavaScript popular? It's hard to say. Some Ajax developers profess (and demonstrate) love for it. Yet many curse it, including me. I still think of it as a quickie love-child of C and Self. Dr. Johnson's words come to mind: "the part that is good is not original, and the part that is original is not good."

Yet here we are. The web must evolve, or die. So too with JS, wherefore ES4. About which, more anon.

Firefox 3 looks like it will be popular too, based on space and time performance metrics. More on that soon, too.

Ian McKellar (Flock)Tracking WordPress using Git

I publish this blog through WordPress, for reasons I’ve outlined before. I run it with a custom theme and a bunch of plugins though, and I wanted a convenient way to keep my WordPress install up to date without having to reinstall everything all the time. I wanted source control for my blog install.

My first attempt involved mirroring WordPress SVN into a Git repository on github so that I had a Git version of the SVN tree (including branches, tags and every checkin separate) and seperate repository holding the changes I’d made for my web site. This eventually failed for two reasons, first the script I was using to mirror the SVN into Git had a habit of failing in bizarre ways and secondly having two repositories confused me.

Yesterday I decided to update my fairly outdated WordPress install, it had been missing security fixes for some time and was one minor version behind. Since tracking SVN hadn’t worked I tried a simpler approach, a single Git repository containing a master branch that tracks releases and an ianloic.com branch to track the state of my site.

I set up master with a fresh download of WordPress 2.5 from wordpress.com, created my ianloic.com branch and applied the differences between my site and the 2.5 SVN tag (for all it’s failures my old approach at least let me do this easily). I switched back to the master branch, deleted all the files (leaving my .git directory intact) and unpacked the new WordPress 2.6.1 tarball. I checked that in (to the master branch), tagged it 2.6.1 and then merged that into my ianloic.com branch. I pushed all that to github and then checked it out on my web server (at Dreamhost).

Normally with Git you’re tracking just the master branch, but I want both master and ianloic.com branches to be tracked so my .git/config contains:

[remote "origin"]
	url = git@github.com:ianloic/wordpress.git
	fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
	remote = origin
	merge = refs/heads/master
[branch "ianloic.com"]
	remote = origin
	merge = refs/heads/ianloic.com

Now it’s easy to track changes that I’m making to my site and update to the latest WordPress without risking losing anything. The process for updating to a new WordPress release is:

  • on my laptop check out the master branch
  • rm all the files except for .git from the directory
  • unpack the new release into the directory
  • git-add . — now git-status will indicate what has changed, been added or removed
  • git-commit to check in the new version of wordpress
  • git-tag versionnum to tag which version is currently in master
  • git-checkout ianloic.com
  • git-merge versionnum to merge the latest version into the site’s branch
  • git-push –all –tags to push all the branches and tags to github
  • on my web server, git-pull to update to the latest release

I end up with a tree that looks like this:

ianloic.com WordPress in Git

Ben Smedberg (Mozilla)Profiling Dromaeo Testcases with Shark

I’m taking a break from garbage collection for a week or so: I got stuck, and there are lots of other things going on I wanted to help out on. Yesterday and today’s project was profiling some DOM testcases.

Two days ago, Jason recently landed a great patch to minimize the XPConnect overhead of DOM calls (fast-path DOM). Prior to this patch, many profiles of DOM scripting were dominated by XPConnect overhead (marshaling calls from JS to binary XPCOM). So I decided to re-do some of these profiles and see if there were any easy wins lurking, now that the noise was gone. I first ran the Dromaeo tests in a build from mozilla-central and compared the results to Safari on the same machine. Now, I’m taking some of the comparatively worst performers and using Shark to profile the tests.

I figured that getting shark to profile individual tests would require some major hacking. But it turns out that Dromaeo already has support for wrapping tests with calls to generate Shark profiles! All I needed to do was hack a little bit to generate a single profile at a time.

I started by profiling the following test: DOM Modification (Prototype): update(). mozilla-central was 8x slower than Safari on this test.

  1. Start with a shark-enabled Firefox.
  2. Download or clone Dromaeo from here.
  3. Type `make web` to build a local copy of Dromaeo.
  4. Start shark for programmatic control as documented here.
  5. Point your browser at the test like so:
    file:///builds/dromaeo/web/index.html?dom-modify-prototype&shark=update&numTests=1
  6. Shark should do a little dance and pop up a profile viewer. For a quick overview on using the Shark profile viewer, see Vlad’s blog.
  7. By using the top-down view, I quickly discovered that over 70% of runtime was spent in a single function:

    Shark Top-Down View

  8. By double-clicking this function, I could see a heatmap of execution within the function: just two lines of code were responsible for most of the time!:
    A heatmap showing jsregexp.cpp.
  9. This was more than enough evidence to file a bug.
  10. After a bit of conversation with Brian Crower on IRC, I found that my initial hypothesis was wrong: The JS_ISSPACE
    macro is not really to blame. Every time it encountered a \s or \S in a regular expression character class, the code would loop over all 65,536 characters in the unicode basic plane and ask a series of lookup tables “is this character a space?” Because there are a small number of actual whitespace characters, I could replace this large loop with a small table of whitespace character ranges.
  11. The patch made this particular test 77% faster, from 850ms to 195ms.

I’ve already filed a bug on another test and will be working through at least four more significant slowdowns. Doing this profiling has been a lot of fun, and a nice change of pace from the garbage collection slog. I really encourage anyone who has a mac to spend a little time with Shark and a performance issue: it actually makes visualizing and analyzing performance problems fun.

Brendan Eich (Mozilla)TraceMonkey Update

We have been busy, mostly fixing bugs for stability, but also winning a bit more performance, since TraceMonkey landed on mozilla-central, from which Firefox 3.1 alpha-stage nightly builds are built. Tonight's builds include a fix for the bug that ilooped a SunSpider test (my apologies to those of you who suffered that bug's bite).

But what I'm sure everyone wants to know is: how do we compare to V8?

Here are the results from head-to-head SunSpider on Windows XP on a Mac Mini and Windows Vista on a MacBook Pro, testing against last night's Firefox automated build and yesterday's Chrome beta:
tm-v8-sunspider-totals.png

We win by 1.28x and 1.19x, respectively. Maybe we should rename TraceMonkey "V10" ;-).

Ok, it's only SunSpider, one popular yet arguably non-representative benchmark suite. We are not about to be braggy. ("Don't be braggy" is our motto here at Mozilla ;-).)

But it's worth digging deeper into the results. Let's look at the ratios by test:
tm-v8-sunspider-detail.png

We win on the bit-banging, string, and regular expression benchmarks. We are around 4x faster at the SunSpider micro-benchmarks than V8.

This graph does show V8 cleaning our clock on a couple of recursion-heavy tests. We have a plan, to trace recursion (not just tail recursion). We simply haven't had enough hours in the day to get to it, but it's "next".

This reminds me: TraceMonkey is only a few months old, excluding the Tamarin Tracing Nanojit contributed by Adobe (thanks again, Ed and co.!), which we've built on and enhanced with x86-64 support and other fixes. We've developed TraceMonkey in the open the whole way. And we're as fast as V8 on SunSpider!

This is not a trivial feat. As we continue to trace unrecorded bytecode and operand combinations, we will only get faster. As we add recursion, trace-wise register allocation, and other optimizations, we will eliminate the losses shown above and improve our ratios linearly across the board, probably by 2 or greater.

I'll keep updating the blog every week, as we do this work. Your comments are welcome as always.

V8 is great work, very well-engineered, with room to speed up too. (And Chrome looks good to great -- the multi-process architecture is righteous, but you expected no less praise from an old Unix hacker like me.)

What spectators have to realize is that this contest is not a playoff where each contending VM is eliminated at any given hype-event point. We believe that Franz&Gal-style tracing has more "headroom" than less aggressively speculative approaches, due to its ability to specialize code, making variables constant and eliminating dead code and conditions at runtime, based on the latent types inherent in almost all JavaScript programs. If we are right, we'll find out over the next weeks and months, and so will you all.

Anyway, we're very much in the game and moving fast -- "reports of our death are greatly exaggerated." Stay tuned!

Zack Rusin (WebKit)SVG in KDE

"Commitment" is one of the words that have never been used in this blog. Which is pretty impressive given that I've managed to use such words as sheep, llamas, raspberries, ninjas, donkeys, crack or woodchuck quite extensively (especially impressive in a technology centric blog).

That's because commitment implies that whatever it is one is committed to plays an important role in their life. It's a word that goes beyond the paper or the medium on which it was written. It enters the cold reality that surrounds us.

But today is all about commitment. It's about commitment that KDE made to a technology broadly refereed to as Scalable Vector Graphics. I took some time off this week and came to Germany where I talked about usage of SVG in KDE.

The paper about, what I like to call, the Freedom of Beauty, is available here:

https://www.svgopen.org/2008/papers/104-SVG_in_KDE/

It talks about the history of SVG in KDE, the rendering model used by KDE, it lists ways in which we use SVG and finally shows some problems which have been exposed by such diverse usage of SVG in a desktop environment. Please read it if you're interested in KDE or SVG.

Hopefully this paper marks a start of a more proactive role KDE is going to be playing in shaping of the SVG standard.

Ben Smedberg (Mozilla)Teaching wget About Root Certificates

I am setting up some temporary tinderboxes to repack localization builds. Because I don’t trust the DNS service from my home ISP, I wanted to download builds from ftp.mozilla.org using HTTPS. It turns out this was quite the challenging task, due to the following cute and relatively useless error message:

ERROR: Certificate verification error for ftp.mozilla.org: unable to get local issuer certificate
To connect to ftp.mozilla.org insecurely, use ‘–no-check-certificate’.

What this really means is “your copy of wget/OpenSSL didn’t come with any root certificates, and HTTPS just isn’t going to work until you get them and I know about them.”

Getting Root Certificates

The best way to get the root certificates you need is at this website. It has a tool that will convert the root certificates built-in to Mozilla NSS into the PEM format that OpenSSL expects. It also has pre-converted PEM files available for download if you’re lazy.

Installing cacert.pem into MozillaBuild (Windows)

To install cacert.pem so that it works with MozillaBuild:

  1. Copy cacert.pem to c:/mozilla-build/wget/cacert.pem
  2. Create the following configuration file at c:/mozilla-build/wget/wget.ini:
    ca_certificate=c:/mozilla-build/wget/cacert.pem

Ted filed a bug about setting this up automatically for a future version of MozillaBuild.

Installing cacert.pem on Mac:

The following instructions assume you got your wget from macports using port install wget.

  1. Copy cacert.pem to /opt/local/etc/cacert.pem
  2. Create the following configuration file at /opt/local/etc/wgetrc:
    ca_certificate=/opt/local/etc/cacert.pem

Ben Smedberg (Mozilla)IRC Communication

One of the important communication mechanisms in the Mozilla project is IRC. IRC is a great tool for instant communication among large groups of diverse people. However, it’s easy to mis-use IRC, so I’d like to propose some etiquette rules:

Think/search before your ask

Google can answer a fair number of questions. Keep the signal/noise ratio high on IRC by checking FAQs and google before asking questions.

Ask questions in the right channel

Some channels welcome newbies: #xulrunner and #extdev specifically welcome new XULRunner application authors and extension developers. Some channels (#developers) are used for serious/deep project communication, and don’t really welcome novices. If you’re not sure, feel free to silently watch the channel for a few minutes. If you don’t know which channel is right, feel free to ask “is this the right channel to ask questions about X”. The channel residents will let you know!

IRC is not good for some questions

Complicated questions are difficult to answer on IRC: “When I configure with –enable-XX while cross-compiling from YY, I have problem ZZ.” To answer this question you need to sort through all sorts of issues such as why you’re using –enable-XX, what the actual error message is, and details about the cross-compile setup. You’d be much better off posting this question to the appropriate newsgroup (mozilla.dev.builds in this case).

If I say to you “please post details about this question to the newsgroup”, it’s not because I don’t like you or don’t want to help you… it’s because IRC isn’t a good medium for answering your particular question.

IRC doesn’t take up my whole attention

Most of the people on IRC are also doing work (coding, reviewing, writing, whatever). If somebody doesn’t have time to talk to you right now, feel free to wait for somebody else to come along, or send email/post to the newsgroups. Demanding somebody’s attention on IRC is very rude. If you really need their attention, send email.

Don’t send uninvited private messages

Asking questions in the appropriate channel is nice behavior. Other people in the channel can look at the conversation and even provide help. By sending me a private message, you are demanding my attention (see above), and drastically limiting the number of people who can help you. Unless you know me really well, don’t send me private messages.

IRC is not email

If I don’t respond to you right away, don’t assume that I’ve seen your message. If I don’t respond to your message and you need to get in touch with me, please send me email. Please don’t ping every four hours until I respond to you.

Zack Rusin (WebKit)Fixes in Sonnet

As we all know inner beauty is the most important kind of beauty. Especially if you're ugly. Not ugly, don't sue me, I meant to say "easy on the eyes challenged". That's one of the reasons I like working on frameworks and libraries. It's the appeal of improving the inner beauty of certain things. I gave up on trying to improve the inner beauty of myself (when I was about 1) so this is the most I can do.

You can do it too. It's real easy. I took this week off because I'm going to Germany for SVG Open where I'll talk about SVG in KDE and today fixed a few irritating bugs in Sonnet.

One of the things that bugged me for a while was the fact that we kept marking misspelled text as red instead of using the God given red squiggly underline. Well, I say no more!
Our spelling dialog lists available dictionaries now and one can change them on the fly. That's good. Raspberries good. And raspberries are pretty darn good. Even sheep like raspberries. Or so I think, the only sheep I've ever seen was from a window of a car and it looked like an animal who enjoys raspberries. Who doesn't? The only problem was that it liked listing things like "en_GB-ise" or "en_GB-ize-w_accents" as language names which is really like a nasty bug in the raspberry. And what do you with bugs? I'm not quite certain myself but given the way this blog is heading it's surely something disturbing... Anywho. that's also fixed. Now we list proper and readable names. As in:

Working on Sonnet is a lot of fun. A small change in a pretty small library affects the entire KDE which is rather rewarding. So if you wanted to get into KDE development in an easy and fun way go to https://bugs.kde.org search for "kspell" or "sonnet" pick an entry and simply fix it!

Boris Zbarsky (Mozilla)Looking for an external hard drive

I'm sort of looking for an external hard drive to use for shared data storage now that Emma and I...

Brendan Eich (Mozilla)TraceMonkey: JavaScript Lightspeed

I'm extremely pleased to announce the launch of TraceMonkey, an evolution of Firefox's SpiderMonkey JavaScript engine for Firefox 3.1 that uses a new kind of Just-In-Time (JIT) compiler to boost JS performance by an order of magnitude or more.

Results

Let's cut straight to the charts. Here are the popular SunSpider macro- and micro-benchmarks average scores, plus results for an image manipulation benchmark and a test using the Sylvester 3D JS library's matrix multiplication methods:

assorted-benchmarks.png

Here are some select SunSpider micro-benchmarks, to show some near-term upper bounds on performance:

micro-benchmarks.png

This chart shows speedup ratios over the SpiderMonkey interpreter, which is why "empty loop with globals" (a loop using global loop control and accumulator variables) shows a greater speedup -- global variables in JavaScript, especially if undeclared by var, can be harder to optimize in an interpreter than local variables in a function.

Here are the fastest test-by-test SunSpider results, sorted from greatest speedup to least:

sunspider-part-1.png

The lesser speedups need their own chart, or they would be dwarfed by the above results:

sunspider-part-2.png

(Any slowdown is a bug we will fix; we're in hot pursuit of the one biting binary-trees, which is heavily recursive -- it will be fixed.)

With SunSpider, some of the longest-running tests are string and regular-expression monsters, and since like most JS engines, we use native (compiled C++) code for most of the work, there's not as much speedup. Amdahl's Law predicts that this will bound the weighted-average total Sunspider score, probably to around 2. No matter how fast we JIT the rest of the code, the total score will be... 2.

But this is only a start. With tracing, performance will keep going up. We have easy small linear speedup tasks remaining (better register allocation, spill reduction around built-in calls). We will trace string and regular expression code and break through the "2" barrier. We will even trace into DOM methods. The tracing JIT approach scales as you move more code into JS, or otherwise into view of the tracing machinery.

Finally, schrep created a screencast (UPDATE: link fixed) that visually demonstrates the speedup gained by TraceMonkey. These speedups are not just for micro-benchmarks. You can see and feel them.

How We Did It

We've been working with Andreas Gal of UC Irvine on TraceMonkey, and it has been a blast. We started a little over sixty days (and nights ;-) ago, and just yesterday, shaver pushed the results of our work into the mozilla-central Hg repository for inclusion in Firefox 3.1. The JIT is currently pref'ed off, but you can enable it via about:config -- just search for "jit" and, if you are willing to report any bugs you find, toggle the javascript.options.jit.content preference (there's a jit.chrome pref too, for the truly adventurous).

Before TraceMonkey, for Firefox 3, we made serious performance improvements to SpiderMonkey, both to its Array code and to its interpreter. The interpreter speedups entailed two major pieces of work:

  • Making bytecode cases in the threaded interpreter even fatter, so the fast cases can stay in the interpreter function.
  • Adding a polymorphic property cache, for addressing properties found in prototype and scope objects quickly, without having to look in each object along the chain.
I will talk about the property cache and the "shape inference" it is based on in another post.

By the way, we are not letting moss grow under our interpreter's feet. Dave Mandelin is working on a combination of inline-threading and call-threading that will take interpreter performance up another notch.

While doing this Firefox 3 work, I was reminded again of the adage:

Neurosis is doing the same thing over and over again, expecting to get a different result each time.
But this is exactly what dynamically typed language interpreters must do. Consider the + operator:
a = b + c;
Is this string concatenation, or number addition? Without static analysis (generally too costly), we can't know ahead of time. For SpiderMonkey, we have to ask further: if number, can we keep the operands and result in machine integers of some kind?

Any interpreter will have to cope with unlikely (but allowed) overflow from int to double precision binary floating point, or even change of variable type from number to string. But this is neurotic, because for the vast majority of JS code, in spite of the freedom to mutate type of variable, types are stable. (This stability holds for other dynamic languages including Python.)

Another insight, which is key to the tracing JIT approach: if you are spending much time in JS, you are probably looping. There's simply not enough straight line code in Firefox's JS, or in a web app, to take that much runtime. Native code may go out to lunch, of course, but if you are spending time in JS, you're either looping or doing recursion.

The Trace Trees approach to tracing JIT compilation that Andreas pioneered can handle loops and recursion. Everything starts in the interpreter, when TraceMonkey notices a hot loop by keeping cheap count of how often a particular backward jump (or any backward jump) has happened.

for (var i = 0; i < BIG; i++) {
    // Loop header starts here:
    if (usuallyTrue())
        commonPath();
    else
        uncommonPath();
}

Once a hot loop has been detected, TraceMonkey starts recording a trace. We use the Tamarin Tracing Nanojit to generate low-level intermediate representation instructions specialized from the SpiderMonkey bytecodes, their immediate and incoming stack operands, and the property cache "hit" case fast-lookup information.

The trace recorder completes when the loop header (see the comment in the code above) is reached by a backward jump. If the trace does not complete this way, the recorder aborts and the interpreter resumes without recording traces.

Let's suppose the usuallyTrue() function returns true (it could return any truthy, e.g. 1 or "non-empty" -- we can cope). The trace recorder emits a special guard instruction to check that the truthy condition matches, allowing native machine-code trace execution to continue if so. If the condition does not match, the guard exits (so-called "side-exits") the trace, returning to the interpreter at the exact point in the bytecode where the guard was recorded, with all the necessary interpreter state restored.

If the interpreter sees usuallyTrue() return true, then the commonPath(); case will be traced. After that function has been traced comes the loop update part i++ (which might or might not stay in SpiderMonkey's integer representation depending on the value of BIG -- again we guard). Finally, the condition i < BIG will be recorded as a guard.

// Loop header starts here:
inlined usuallyTrue() call, with guards
guard on truthy return value
guard that the function being invoked at this point is commonPath
inlined commonPath() call, with any calls it makes inlined, guarded
i++ code, with overflow to double guard
i < BIG condition and loop-edge guard
jump back to loop header

Thus tracing is all about speculating that what the interpreter sees is what will happen next time -- that the virtual machine can stop being neurotic.

And as you can see, tracing JITs can inline method calls easily -- just record the interpreter as it follows a JSOP_CALL instruction into an interpreted function.

One point about Trace Trees (as opposed to less structured kinds of tracing): you get function inlining without having to build interpreter frames at all, because the trace recording must reach the loop header in the outer function in order to complete. Therefore, so long as the JITted code stays "on trace", no interpreter frames need to be built.

If the commonPath function itself contains a guard that side-exits at runtime, then (and only then) will one or more interpreter frames need to be reconstructed.

Let's say after some number of iterations, the loop shown above side-exits at the guard for usuallyTrue() because that function returns a falsy value. We abort correctly back to the interpreter, but keep recording in case we can complete another trace back to the same loop header, and extend the first into a trace tree. This allows us to handle different paths through the control flow graph (including inlined functions) under a hot loop.

What It All Means

Pulling back from the details, a few points deserve to be called out:

  • We have, right now, x86, x86-64, and ARM support in TraceMonkey. This means we are ready for mobile and desktop target platforms out of the box.
  • As the performance keeps going up, people will write and transport code that was "too slow" to run in the browser as JS. This means the web can accomodate workloads that right now require a proprietary plugin.
  • As we trace more of the DOM and our other native code, we increase the memory-safe codebase that must be trusted not to have an exploitable bug.
  • Tracing follows only the hot paths, and builds a trace-tree cache. Cold code never gets traced or JITted, avoiding the memory bloat that whole-method JITs incur. Tracing is mobile-friendly.
  • JS-driven <canvas> rendering, with toolkits, scene graphs, game logic, etc. all in JS, are one wave of the future that is about to crest.
TraceMonkey advances us toward the Mozilla 2 future where even more Firefox code is written in JS. Firefox gets faster and safer as this process unfolds.

I believe that other browsers will follow our lead and take JS performance through current interpreter speed barriers, using just-in-time native code compilation. Beyond what TraceMonkey means for Firefox and other Mozilla projects, it heralds the JavaScript Lightspeed future we've all been anticipating. We are moving the goal posts and changing the game, for the benefit of all web developers.

Acknowledgments

I would like to thank Michael Franz and the rest of his group at UC Irvine, especially Michael Bebenita, Mason Chang, and Gregor Wagner; also the National Science Foundation for supporting Andreas Gal's thesis. I'm also grateful to Ed Smith and the Tamarin Tracing team at Adobe for the TT Nanojit, which was a huge boost to developing TraceMonkey.

And of course, mad props and late night thanks to Team TraceMonkey: Andreas, Shaver, David Anderson, with valuable assists from Bob Clary, Rob Sayre, Blake Kaplan, Boris Zbarsky, and Vladimir Vukićević.

Ben Smedberg (Mozilla)JSON serialization of interconnected object graphs

In it’s basic form, JSON cannot serialize cyclic graphs of objects, or graphs where multiple paths can lead to the same object. In a project I’m working on, I wanted to move such a graph of highly-interconnected objects from JS to python. So I have invented a format built on top of JSON that can be used to serialize/deserialize such graphs.

Basically, the JSON comes across as a large list:

[
  /* list[0] is the base object at the root of the eventual object graph. */
  {
    /* string, number, true/false, and null properties are serialized directly */
    “stringprop”: “stringvalue”,
    “numprop”: 3.1415,
    /* but lists and objects are not serialized directly. Instead, they are represented by an index
       into the base list. “sharp” is a nod to JS sharp variables, from which this was originally inspired */
    “complexprop”: {“sharp”: 1}
  },
  /* list[1] is referenced from list[0].complexprop. It also references itself, see below */
  [
    "simplestring",
    3,
    {"sharp": 1}
  ]
]

You can find JS for serializing these types of graphs here, and python for deserializing them here.

It turns out that I probably don’t actually need this code: I’ve found a simpler solution for my particular problem, but I wanted to share this solution in case other people might find it useful.

Zack Rusin (WebKit)Fast graphics

Instead of highly popular pictures of llamas today I'll post a few numbers. Not related to llamas at all. Zero llamas. These will be Qt/KDE related numbers. And there's no llamas in KDE. There's a dragon, but he doesn't hang around with llamas at all. I know what you're thinking: KDE is a multi-coltural project surely someone must be chilling with llamas. I said it before and I'll say it again, what an avarage KDE developer, two llamas, one hamster and five chickens do in a privacy of their own home is none of your business.

Lets take a simple application, called qgears2, based on David Reveman cairogears and see how it performs with different rendering backends. Pay attention to zero relation to llamas or any other animals. The application takes a few options, -image: to render using a CPU based raster engine, -render: to render using X11's Xrender and -gl to render using OpenGL (-llama option is not accepted). It has three basic tests, "GEARSFANCY" which renders a few basic paths with a linear gradient alpha blended on top, TEXT that tests some very simple text rendering and COMPO which is just compostion and scaling of images.



The numbers come from two different machines. One is my laptop which is running Xorg server version 1.4.2. Exa is 2.2.0. Intel driver 2.3.2. GPU is 965GM, CPU is T8300 at 2.4GHz running on Debian Unstable's kernel 2.6.26-1.
The second machine is running GeForce 6600 (NV43 rev a2), NVIDIA proprietary driver version G01-173.14.09, Xorg version 7.3, kernel 2.6.25.11, CPU is Q6600 @ 2.40GHz (thanks to Kevin Ottens for those numbers, as I don't have NVIDIA machine at the moment).

The results for each test are as follows:





















GEARSFANCY

I965NVIDIA
Xrender35.37
44.743
Raster63.41
41.999
OpenGL131.41
156.250
























TEXT

I965NVIDIA
Xrender13.389
40.683
Raster(incorrect results)
(incorrect results)
OpenGL36.496
202.840
























COMPO

I965NVIDIA
Xrender67.751
66.313
Raster81.833
70.472
OpenGL411.523
436.681


COMPO test isn't really fair because as I mentioned Qt doesn't use server side picture transformations with Xrender but it shows that OpenGL is certainly not slow at it.

So what these results show is that GL backend, which hasn't been optimized at all, is between 2 to 6 times faster than anything out there and that pure CPU based Raster engine is faster than the Xrender engine.

So if you're on an Intel GPU, or NVIDIA GPU rendering using GL will immediately make your application a number times faster. If you're running on a system with no capable GPU then using raster engine will make your application faster as well.
Switching Qt to use GL backend by default would result in all applications running a magnitude times faster. The quality would suffer though (unless HighQualityAntialiasing mode would be used in Qt in which case it would be the same). This certainly would fix our graphics performance woes and as a side-effect allow using GL shaders right on the widgets for some nifty effects.
On systems with no GPU raster engine is a great choice, on everything else GL is clearly the best option.

Robert Sayre (Mozilla)Wave of Mutilation, I mean, Standardization

RESTful JSON is a pretty good (and obvious) idea. I’ve noodled on it before. Joe has too. One thing I don’t like about Joe’s post is that it implies there’s a difference between documents and data. It’s an arbitrary distinction lacking a technical basis. It is, perhaps unintentionally, a way to apologize for Atom, the format and the protocol. Atom turned out to be a pretty good syndication format. The protocol spec is pretty good too, before you focus on the data being passed around. It turns out the pretty good syndication format is a sucky thing to author with, even though the authoring protocol is pretty good. This happened because the WG wrote the specs in the wrong order. The proposed JSON stuff tries to do so much less. No namespaces, required fields, yadda, yadda. You can Solve Any Problem… …if you’re willing to make the problem small enough.

I’ve also come to the conclusion that RESTful JSON is a dead-end, but it might be useful in the short term. This characterization is not an insult–it d