Monday, February 26, 2024

Meta Badge Post

Feb 16, 2024 marked the end of my 11.75 year career at Meta. As per tradition, I made an internal post to say good-bye accompanied by a photo of my badge. (Also pictured is my old Brik Book laptop cover with my custom Buck design, which I got way more mileage out of than I expected to, as Apple released some lousy MBPs for a stretch, so I ended up using my 2015 MBP with this cover for many years!)


What follows is a slightly modified version of my original post. See if you can solve the mystery embedded within.


Saturday, September 29, 2018

How I ended up writing opensnoop in pure C using eBPF

Hello friends, it's been awhile. I recently had the opportunity to do a deep dive on eBPF and I learned a lot in the process. There isn't a lot out there on the subject, so I decided to put together a long-form article about the experience: https://bolinfest.github.io/opensnoop-native/.

You'll notice that this is hosted on GitHub Pages rather than on bolinfest.com itself. I expect there to be typos to fix and cross-references to add over time, so it seemed easiest to colocate the article with the code and have both in version control so I can track changes properly. Also, it's nice to be able to compose things in Markdown (or in Quip and then export to Markdown) and then let a GitHub Pages template do the rest, particularly when it comes to syntax highlighting the code samples.

Monday, March 20, 2017

JavaScript vs. Python in 2017

I may be one of the last people you would expect to write an essay criticizing JavaScript, but here we are.

Two of my primary areas of professional interest are JavaScript and “programming in the large.” I gave a presentation back in 2013 at mloc.js where I argued that static typing is an essential feature when picking a language for a large software project. For this reason, among others, I historically limited my use of Python to small projects with no more than a handful of files.

Recently, I needed to build a command-line tool for work that could speak Thrift. I have been enjoying the Nuclide/Flow/Babel/ESLint toolchain a lot recently, so my first instinct was to use JavaScript for this new project, as well. However, it quickly became clear that if I went that route, I would have to spend a lot of time up front on getting the Thrift bindings to work properly. I couldn't convince myself that my personal preference for JavaScript would justify such an investment, so I decided to take a look at Python.

I was vaguely aware that there was an effort to add support for static typing in Python, so I Googled to find out what the state of the art was. It turns out that it is a tool named Mypy, and it provides gradual typing, much like Flow does for JavaScript or Hack does for PHP. Fortunately, Mypy is more like Hack+HHVM than it is like Flow in that a Python 3 runtime accepts the type annotations natively whereas Flow type annotations must be stripped by a separate tool before passing the code to a JavaScript runtime. (To use Mypy in Python 2, you have to put your type annotations in comments, operating in the style of the Google Closure toolchain.) Although Mypy does not appear to be as mature as Flow (support for incremental type checking is still in the works, for example), simply being able to succinctly document type information was enough to renew my interest in Python.

In researching how to use Thrift from Python, a Google search turned up some sample Python code that spoke to Thrift using asynchronous abstractions. After gradual typing, async/await is the other feature in JavaScript that I cannot live without, so this code sample caught my attention! As we recently added support for building projects in Python 3.6 at work, it was trivial for me to get up and running with the latest and greatest features in Python. (Incidentally, I also learned that you really want Python 3.6, not 3.5, as 3.6 has some important improvements to Mypy, fixes to the asyncio API, literal string interpolation like you have in ES6, and more!)

Coming from the era of “modern” JavaScript, one thing that was particularly refreshing was rediscovering how Python is an edit/refresh language out of the box whereas JavaScript used to be that way, but is no more. Let me explain what I mean by that:

  • In Python 3.6, I can create a new example.py file in my text editor, write Python code that uses async/await and type annotations, switch to my terminal, and run python example.py to execute my code.
  • In JavaScript, I can create a new example.js file in my text editor, write JavaScript code that uses async/await and type annotations, switch to my terminal, run node example.js, see it fail because it does not understand my type annotations, run npm install -g babel-node, run babel-node example.js, see it fail again because I don't have a .babelrc that declares the babel-plugin-transform-flow-strip-types plugin, rummage around on my machine and find a .babelrc I used on another project, copy that .babelrc to my current directory, run babel-node example.js again, watch it fail because it doesn't know where to find babel-plugin-transform-flow-strip-types, go back to the directory from which I took the .babelrc file and now copy its package.json file as well, remove the junk from package.json that example.js doesn't need, run npm install, get impatient, kill npm install, run yarn install, and run babel-node example.js to execute my code. For bonus points, babel-node example.js runs considerably slower than node example.js (with the type annotations stripped) because it re-transpiles example.js every time I run it.
Indeed, the JavaScript ecosystem offers all sorts of tools to speed up this process using daemons and caching, but you or someone on your team has to invest quite a bit of time researching, assembling, and maintaining a JavaScript toolchain for your project before anyone can write a line of “modern” JavaScript. Although you may ultimately achieve a nice edit/refresh workflow, you certainly will not have one out of the box as you would in Python.


“JavaScript is no longer an edit/refresh language.”

Another refreshing difference between JavaScript and Python is the “batteries included” nature of Python. If you look at the standard library that comes with JavaScript, it is fairly minimal. The Node environment does a modest job of augmenting what is provided by the standard library, but the majority of the functionality you need inevitably has to be fetched from npm. Specifically, consider the following functionality that is included in Python's standard library, but must be fetched from npm for a Node project:

As you can see, for many of these features, there are multiple third-party libraries that provide overlapping functionality. (For example, if you were looking for a JSON parser, would you choose parse-json, safe-json-parse, fast-json-parse, jsonparse, or json-parser?) To make matters worse, npm module names are doled out on a first-come, first-serve basis. Much like domain names, this means that great names often go to undeserving projects. (For example, judging from its download count, the npm module named logging makes it one of the least popular logging packages for JavaScript.) This makes the comparison of third-party modules all the more time-consuming since the quality of the name is not a useful signal for the quality of the library.

It might be possible that Python's third-party ecosystem is just as bad as npm's. What is impressive is that I have no idea whether that is the case because it is so rare that I have to look to a third-party Python package to get the functionality that I need. I am aware that data scientists rely heavily on third-party packages like NumPy, but unlike the Node ecosystem, there is one NumPy package that everyone uses rather than a litany of competitors named numpy-fast, numpy-plus, simple-numpy, etc.


“We should stop holding up npm as a testament to the diversity of the JavaScript ecosystem, but instead cite it as a failure of JavaScript's standard library.”

For me, one of the great ironies in all this is that, arguably, the presence of a strong standard library in JavaScript would be the most highly leveraged when compared to other programming languages. Why? Because today, every non-trivial web site requires you to download underscore.js or whatever its authors have chosen to use to compensate for JavaScript's weak core. When you consider the aggregate adverse impact this has on network traffic and page load times, the numbers are frightening.

So...Are You Saying JavaScript is Dead?

No, no I am not. If you are building UI using web technologies (which is a lot of developers), then I still think that JavaScript is your best bet. Modulo the emergence of Web Assembly (which is worth paying attention to), JavaScript is still the only language that runs natively in the browser. There have been many attempts to take an existing programming language and compile it to JavaScript to avoid “having to” write JavaScript. There are cases where the results were good, but they never seemed to be great. Maybe some transpilation toolchain will ultimately succeed in unseating JavaScript as the language to write in for the web, but I suspect we'll still have the majority of web developers writing JavaScript for quite some time.

Additionally, the browser is not the only place where developers are building UI using web technologies. Two other prominent examples are Electron and React Native. Electron is attractive because it lets you write once for Windows, Mac, and Linux while React Native is attractive because it lets you write once for Android and iOS. Both are also empowering because the edit/refresh cycles using those tools is much faster than their native equivalents. From a hiring perspective, it seems like developers who know web technologies (1) are available in greater numbers than native developers, and (2) can support more platforms with smaller teams compared to native developers.

Indeed, I could envision ways in which these platforms could be modified to support Python as the scripting language instead of JavaScript, which could change the calculus. However, one thing that all of the crazy tooling that exists in the JavaScript community has given way to is transpiler toolchains like Babel, which make it easier for ordinary developers (who do not have to be compiler hackers) to experiment with new JavaScript syntax. In particular, this tooling has paved the way for things like JSX, which I contend is one of the key features that makes React such an enjoyable technology for building UI. (Note that you can use React without JSX, but it is tedious.)

To the best of my knowledge, the Python community does not have an equivalent, popular mechanism for experimenting with DSLs within Python. So although it might be straightforward to add Python bindings in these existing environments, I do not think that would be sufficient to get product developers to switch to Python unless changes were also made to the language that made it as easy to express UI code in Python as it is in JavaScript+JSX today.

Key Takeaways

Python 3.6 has built-in support for gradual typing and async/await. Unlike JavaScript, this means that you can write Python code that uses these language features and run that file directly without any additional tooling. Further, its rich standard library means that you have to spend little time fishing around and evaluating third-party libraries to fill in missing gaps. It is very much a “get stuff done” server-side scripting language, requiring far less scaffolding than JavaScript to get a project off the ground. Although Mypy may not be as featureful or performant as Flow or TypeScript is today, the velocity of that project is certainly something that I am going to start paying attention to.

By comparison, I expect that JavaScript will remain strong among product developers, but those who use Node today for server-side code or command-line tools would probably be better off using Python. If the Node community wants to resist this change, then I think they would benefit from (1) expanding the Node API to make it more comprehensive, and (2) reducing the startup time for Node. It would be even better if they could modify their runtime to recognize things like type annotations and JSX natively, though that would require changes to V8 (or Chakra, on Windows), which I expect would be difficult to maintain and/or upstream. Getting TC39 to standardize either of those language features (which would force Google/Microsoft's hand to add native support in their JS runtimes) also seems like a tough sell.

Overall, I am excited to see how things play out in both of these communities. You never know when someone will release a new technology that obsoletes your entire toolchain overnight. For all I know, we might wake up tomorrow and all decide that we should be writing OCaml. Better set your alarm clock.

(This post is also available on Medium.)

Sunday, March 19, 2017

Python: Airing of Grievances

Although this list focuses on the negative aspects of Python, I am publishing it so I can use it as a reference in an upcoming essay about my positive experiences with Python 3.6. Update: This is referenced by my post, "JavaScript vs. Python in 2017."

I am excited by many of the recent improvements in Python, but here are some of my outstanding issues with Python (particularly when compared to JavaScript):

PEP-8

What do you get when you combine 79-column lines, four space indents, and snake case? Incessant line-wrapping, that's what! I prefer 100 cols and 2-space indents for my personal Python projects, but every time a personal project becomes a group project and  true Pythonista joins the team, they inevitably turn on PEP-8 linting with its defaults and reformat everything.

Having to declare self as the first argument of a method

As someone who is not a maintainer of Python, there is not much I could do to fix this, but I did the next best thing: I complained about it on Facebook. In this case, it worked! That is, I provoked Łukasz Langa to add a check for this (it now exists as B902 in flake8-bugbear), so at least when I inevitably forget to declare self, Nuclide warns me inline as I'm authoring the code.

.pyc files

These things are turds. I get why they are important for production, but I'm tired of seeing them appear in my filetree, adding *.pyc to .gitignore every time I create a new project, etc.

Lack of a standard docblock

Historically, Python has not had a formal type system, so I was eager to document my parameter and return types consistently. If you look at the top-voted answer to “What is the standard Python docstring format?” on StackOverflow, the lack of consensus extremely dissatisfying. Although Javadoc has its flaws, Java's documentation story is orders of magnitude better than that of Python's.

Lack of a Good, free graphical debugger

Under duress, I use pdb. I frequently edit Python on a remote machine, so most of the free tools do not work for me. I have not had the energy to try to set up remote debugging in PyCharm, though admittedly that appears to be the best commercial solution. Instead, I tried to lay the groundwork for a Python debugger in Nuclide, which would support remote development by default. I need to find someone who wants to continue that effort.

Whitespace-delimited

I'm still not 100% sold on this. In practice, this begets all sorts of subtle annoyances. Code snippets that you copy/paste from the Web or an email have to be reformatted before you can run them. Tools that generate Python code incur additional complexity because they have to keep track of the indent level. Editors frequently guess the wrong place to put your cursor when you introduce a new line whereas typing } gives it the signal it needs to un-indent without further interaction. Expressions that span a line frequently have to be wrapped in parentheses in order to parse. The list goes on and on.

Bungled rollout of Python 3

Honestly, this is one of the primary reasons I didn't look at Python seriously for awhile. When Python 3 came out, the rhetoric I remember was: “Nothing substantially new here. Python 2.7 works great for me and the 2to3 migration tool is reportedly buggy. No thanks.” It's rare to see such a large community move forward with such a poorly executed backwards-compatibility story. (Presumably this has contributed to the lack of a default Python 3 installation on OS X, which is another reason that some software shops stick with Python 2.) Angular 2 is the only other similar rage-inducing upgrade in recent history that comes to mind.

I don't understand how imports are resolved

Arguably, this is a personal failure rather than a failure of Python. That said, compared to Java or Node, whatever Python is doing seems incredibly complicated. Do I need an __init__.py file? Do I need to put something in it? If I do have to put something in it, does it have to define __all__? For a language whose mantra is “There's only one way to do it,” it is frustrating that the answer to all of these questions is “it depends.”

Busted Scoping Rules

For a language with built-in map() and filter() functions that underscore the support for first-order functions, I am perplexed with how bad the support for closures is in Python. Apparently your options are (1) lambda (which is extremely limited), or (2) a bastardized closure. Borrowing an example from an article that explains the new nonlocal keyword in Python 3, consider the following code in JavaScript:
function outside() {
  let msg = 'Outside!';
  console.log(msg);
  let inside = () => {
    msg = 'Inside!';
    console.log(msg);
  };
  inside();
  console.log(msg);
}
If you ran outside(), you would get the following output (which you would expect from a lexically scoped language like JavaScript):
Outside!
Inside!
Inside!
Whereas if you wrote this Python code:
def outside():
    msg = 'Outside!'
    print(msg)
    def inside():
        msg = 'Inside!'
        print(msg)
    inside()
    print(msg)
and ran outside(), you would get the following:
Outside!
Inside!
Outside!
In Python 2, the only way to emulate the JavaScript behavior is to use a mutable container type (which is gross):
def outside():
    msg_holder = ['Outside!']
    print(msg_holder[0])
    def inside():
        msg_holder[0] = 'Inside!'
        print(msg_holder[0])
    inside()
    print(msg_holder[0])
Though apparently Python 3 has a new nonlocal keyword that makes this less distasteful:
def outside():
    msg = 'Outside!'
    print(msg)
    def inside():
        nonlocal msg
        msg = 'Inside!'
        print(msg)
    inside()
    print(msg)
I realize that JavaScript generally forces you to type more with its var, let and const qualifiers, but I find this to be a small price to pay in exchange for unambiguous variable scopes.

Sunday, October 18, 2015

Hacking on Atom Part II: Building a Development Process

Over one year ago, I wrote a blog post: “Hacking on Atom Part I: CoffeeScript”. This post (Part II) is long overdue, but I kept putting it off because I continued to learn and improve the way I developed for Atom such that it was hard to reach a “stopping point” where I felt that the best practices I would write up this week wouldn't end up becoming obsolete by something we discovered next week.

Honestly, I don't feel like we're anywhere near a stopping point, but I still think it's important to share some of the things we have learned thus far. (In fact, I know we still have some exciting improvements in the pipeline for our developer process, but those are gated by the Babel 6.0 release, so I'll save those for another post once we've built them out.)

I should also clarify that when I say “we,” I am referring to the Nuclide team, of which I am the tech lead. Nuclide is a a collection of packages for Atom to provide IDE-like functionality for a variety of programming languages and technologies. The code is available on GitHub (and we accept contributions!), but it is primarily developed by my team at Facebook. At the time of this writing, Nuclide is composed of 40 Atom packages, so we have quite a bit of Atom development experience under our belts.

My Transpiler Quest

When I spun up the Nuclide project, I knew that we were going to produce a lot of JavaScript code. Transpilers such as Traceur were not very mature yet, but I strongly believed that ES6 was the future and I didn't want to start writing Nuclide in ES5 and have to port everything to ES6 later. On a high level, I was primarily concerned about leveraging the following language extensions:
  • Standardized class syntax
  • async/await
  • JSX (for React)
  • type annotations
Note that of those four things, only the first one is specified by ES6. async/await appears to be on track for ES7, though I don't know if the TC39 will ever be able to agree on a standard for type annotations or let JSX in, but we'll see.

Because of my diverse set of requirements, it was hard to find a transpiler that could provide all of these things:
  • Traceur provided class syntax and other ES6 features, but more experimental things like async/await were very buggy.
  • TypeScript provided class syntax and type annotations, but I knew that Flow was in the works at Facebook, so ultimately, that is what we would use for type annotations.
  • React came with jstransform, which supported JSX and a number of ES6 features.
  • recast provided a general JavaScript AST transformation pipeline. Most notably, the regenerator transform to provide support for yield and async/await was built on recast.
Given my constraints and the available tools, it seemed like investing in recast by adding more transforms for the other features we wanted seemed like the most promising way to go. In fact, someone had already been working on such an endeavor internally at Facebook, but the performance was so far behind that of jstransform that it was hard to justify the switch.

For awhile, I tried doing crude things with regular expressions to hack up our source so that we could use regenerator and jstransform. The fundamental problem was that transpilers do not compose  because if they both recognize language features that cause parse errors in the other, then you cannot use them together. Once we started adding early versions of Flow into the mix to get type checking (and Flow's parser recognized even less of ES6 than jstransform did), the problem became even worse. For a long time, in an individual file, we could have async/await or type checking, but not both.

To make matters worse, we also had to run a file-watcher service that would write the transpiled version of the code someplace where Atom could load it. We tried using a combination of gulp and other things, but all too often a change to a file would go unnoticed, the version on disk would not get transpiled correctly, and we would then have a subtle bug (this seemed to happen most often when interactive rebasing in Git).

It started to become questionable whether putting up with the JavaScript of the future was worth it. Fortunately, it was around this time that Babel (née 6to5) came on the scene. It did everything that we needed and more. I happily jettisoned the jstransform/regenerator hack I had cobbled together. Our team was so much happier and more productive, but there were still two looming issues:
  • Babel accepted a much greater input language than Flow.
  • It did not eliminate the need for a file-watching service to transpile on the fly for Atom.
Rather than continue to hack around the problems we were having, we engaged with the Flow and Atom teams directly. For the Flow team, supporting all of ES6 has been an ongoing issue, but we actively lobbied to prioritize to support features that were most important to us (and caused unrecoverable parse errors if they were not addressed), such as support for async/await and following symlinks through require/import statements.

To eliminate our gulp/file-watching contraption, we upstreamed a pull request to Atom to support Babel natively, just as it already had support for CoffeeScript. Because we didn't want to identify Babel files via a special extension (we wanted the files to be named .js rather than .es6 or something), and because (at least at the time), Babel was too slow to categorically transpile all files with a .js extension, we compromised on using the heuristic “if a file has a .js extension and its contents start with 'use babel', then transpile the file with Babel before evaluating it.” This has worked fairly well for us, but it also meant that everyone had to use the set of Babel options that we hardcoded in Atom. However, once Babel 6.0 comes out, we plan to work with Atom to let packages specify a .babelrc file so that every package can specify its own Babel options.

It has been a long road, but now that we are in a world where we can use Babel and Flow to develop Atom packages, we are very happy.

Node vs. Atom Packages

Atom has its own idea of a package that is very similar to an npm package, but differs in some critical ways:
  • Only one version/instance of an Atom package can be loaded globally in the system.
  • An Atom package cannot declare another Atom package as a dependency. (This somewhat follows from the first bullet point because if two Atom packages declared a dependency on different versions of the same Atom package, it is unclear what the right way to resolve it would be.)
  • Because Atom packages cannot depend on each other, it is not possible to pull in another one synchronously via require().
  • Atom packages have special folders such as styles, grammars, snippets, etc. The contents of these folders must adhere to a specific structure, and their corresponding resources are loaded when the Atom package is activated.
This architecture is particularly problematic when trying to build reusable UI components. Organizationally, it makes sense to put something like a combobox widget in its own package that can be reused by other packages. However, because it likely comes with its own stylesheet, it must be bundled as an Atom package rather than a Node package.

To work around these issues, we introduced the concept of three types of packages in Nuclide: Node/npm, Node/apm, and Atom/apm. We have a README that explains this in detail, but the key insight is that a Node/apm package is a package that is available via npm (and can be loaded via require()), but is structured like an Atom package. We achieved this by introducing a utility, nuclide-atom-npm, which loads the code from a Node package, but also installs the resources from the styles/ and grammars/ directories in the package, if present. It also adds a hidden property on global to ensure that the package is loaded only once globally.

Nuclide has many packages that correspond to UI widgets that we use in Atom: nuclide-ui-checkbox, nuclide-ui-dropdown, nuclide-ui-panel, etc. If you look at the implementation of any of these packages, you will see that they load their code using nuclide-atom-npm. Using this technique, we can reliably require the building blocks of our UI synchronously, which would not be the case if our UI components were published as Atom packages. This is especially important to us because we build our UI in Atom using React.

React

In July 2014, the Atom team had a big blog post that celebrated their move to React for the editor. Months later, they had a very quiet pull request that removed the use of React in Atom. Basically, the Atom team felt that to achieve the best possible performance for their editor, they needed to hand-tune that code rather than risk the overhead of an abstraction, such as React.

This was unfortunate for Nuclide because one of the design limitations of React is that it expects there to be only one instance of the library installed in the environment. When you own all of the code in a web page, it is not a big deal to commit to using only one version (if anything, it's desirable!), but when you are trying to create an extensible platform like Atom, it presents a problem. Either Atom has to choose the version of React that every third-party package must use, or every third-party package must include its own version of React.

The downside of Atom mandating the version of React is that, at some point, some package authors will want them to update it when a newer version of React comes out whereas other package authors will want them to hold back so their packages don't break. The downside of every third-party package (that wants to use React) including its own version is that multiple instances of React are not guaranteed to work together when used in the same environment. (It also increases the amount of code loaded globally at runtime.)

Further, React and Atom also conflict because they both want to control how events propagate through the system. To that end, when Atom was using React for its editor, it created a fork of React that did not interfere with Atom's event dispatch. This was based on React v0.11, which quickly became an outdated version of React.

Because we knew that Atom planned to remove their fork of React from their codebase before doing their 1.0 release, we needed to create our own fork that we could use. To that end, Jonas Gebhardt created the react-for-atom package, which is an Atom-compatible fork of React based off of React v0.13. As we did for the nuclide-atom-npm package, we added special logic that ensured that require('react-for-atom') could be called by various packages loaded by Atom, but it cached the result on the global environment so that subsequent calls to require() would return the cached result rather than load it again, making it act as a singleton.

Atom does not provide any sort of built-in UI library by default. The Atom UI itself does not use a standard library: most of the work is done via raw DOM operations. Although this gives package authors a lot of freedom, it also arrests some via a paradox of choice. On Nuclide, we have been using React very happily and successfully inside of Atom. The combination of Babel/JSX/React has facilitated producing performant and maintainable UI code.

Monorepo

Atom is developed as a collection of over 100 packages, each contained in its own repository. From the outside, this seems like an exhausting way to develop a large software project. Many, many commits to Atom are just minor version bumps to dependencies in package.json files across the various repos. To me, this is a lot of unnecessary noise.

Both Facebook and Google have extolled the benefits of a single, monolithic codebase. One of the key advantages over the multi-repo approach is that it makes it easier to develop and land atomic changes that span multiple parts of a project. For this and other reasons, we decided to develop the 100+ packages for Nuclide in a single repository.

Unfortunately, the Node/npm ecosystem makes developing multiple packages locally out of one repository a challenge. It has a heavy focus on semantic versioning and encourages dependencies to be fetched from https://www.npmjs.org/. Yes, it is true that you can specify local dependencies in a package.json file, but then you cannot publish your package.json file as-is.

Rather than embed local, relative paths in package.json files that get rewritten upon publishing to npm (which I think would be a reasonable approach), we created a script that walks our tree, symlinking local dependendencies under node_modules while using npm install to fetch the rest. By design, we specify the semantic version of a local dependency as 0.0.0 in a package.json file. These version numbers get rewritten when we publish packages to npm/apm.

We also reject the idea of semantic versioning. (Jeremy Ashkenas has a great note about the false promises of semantic versioning, calling it “romantic versioning.”) Most Node packages are published independently, leaving the package author to do the work of determining what the new version number should be based on the changes in the new release. Practically speaking, it is not possible to automate the decision of whether the new release merits a minor or major version bump, which basically means the process is imperfect. Even when the author gets the new version number right, it is likely that he/she sunk a lot of time into doing so.

By comparison, we never publish a new version of an individual Nuclide package: we publish new versions of all of our packages at once, and it is always a minor-minor version bump. (More accurately, we disallow cycles in our own packages' dependencies, so we publish them in topological order, all with the same version number where the code is derived from the same commit hash in our GitHub repo.) It is indeed the case that for many of our packages, there is nothing new from version N to N+1. That is, the only reason N+1 was published is because some other package in our repo needed a new version to be published. We prefer to have some superfluous package publications than to sink time into debating new version numbers and updating scores of package.json files.

Having all of the code in one place makes it easier to add processes that are applied across all packages, such as pre-transpiling Babel code before publishing it to npm or apm or running ESLint. Also, because our packages can be topologically sorted, many of the processes that we run over all of our packages can be parallelized, such as building or publishing. Although this seems to fly in the face of traditional Node development, it makes our workflow dramatically simpler.

async/await

I can't say enough good things about using async/await, which is something that we can do because we use Babel. Truth be told, as a side-project, I have been trying to put all of my thoughts around it into writing. I'm at 35 pages and counting, so we'll see where that goes.

An oversimplified explanation of the benefit is that it makes asynchronous code no harder to write than synchronous code. Many times, JavaScript developers know that designing code to be asynchronous will provide a better user experience, but give in to writing synchronous code because it's easier. With async/await, you no longer have to choose, and everyone wins as a result.

Flow

Similar to async/await, I can't say enough good things about static typing, particularly optional static typing. I gave a talk at the first mloc.js, “Keeping Your Code in Check,” demonstrating different approaches to static typing in JavaScript, and argued why it is particularly important for large codebases. Flow is a fantastic implementation of static typing in JavaScript.

Closing Thoughts

In creating Nuclide, we have written over 50,000 lines of JavaScript[1] and have created over 100 packages, 40 of which are Atom packages[2]. Our development process and modularized codebase has empowered us to move quickly. As you can tell from this essay (or from our scripts directory), building out this process has been a substantial, deliberate investment, but I think it has been well worth it. We get to use the best JavaScript technologies available to us today with minimal setup and an edit/refresh cycle of a few seconds.

Honestly, the only downside is that we seem to be ahead of Atom in terms of the number of packages it can support. We have an open issue against Atom about how to install a large number of packages more efficiently (note this is a problem when you install Nuclide via nuclide-installer, but not when you build from source). Fortunately, this should [mostly] be a simple matter of programming™: there is no fundamental design flaw in Atom that is getting in the way of a solution. We have had an outstanding working relationship with Atom thus far, finding solutions that improve things for not just Nuclide, but the entire Atom community. It has been a lot of fun to build on top of their platform. For me, working on Atom/Nuclide every day has been extremely satisfying due to the rapid speed of development and the tools we are able to provide to our fellow developers as a result of our work.

[1] To exclude tests and third-party code, I got this number by running:
git clone https://github.com/facebook/nuclide.git
cd nuclide
git ls-files pkg | grep -v VendorLib | grep -v hh_ide | grep -v spec | grep -e '\.js$' | xargs wc -l
[2] There is a script in Nuclide that we use to list our packages in topologically sorted order, so I got these numbers by running:
./scripts/dev/packages | wc -l
./scripts/dev/packages --package-type Atom | wc -l

Thursday, March 5, 2015

Trying to prove that WeakMap is actually weak

I don't have a ton of experience with weak maps, but I would expect the following to work:
// File: example.js
var key = {};
var indirectReference = {};
indirectReference['key'] = (function() { var m = {}; m['foo'] = 'bar'; return m; })();
var map = new WeakMap();

map.set(key, indirectReference['key']);
console.log('Has key after setting value: %s', map.has(key));

delete indirectReference['key'];
console.log('Has key after deleting value: %s', map.has(key));

global.gc();
console.log('Has key after performing global.gc(): %s', map.has(key));
I downloaded the latest version of io.js (so that I could have a JavaScript runtime with WeakMap and global.gc() and ran it as follows:
./iojs --expose-gc example.js
Here is what I see:
Has key after setting value: true
Has key after deleting value: true
Has key after performing global.gc(): true
Despite my best efforts, I can't seem to get WeakMap to give up the value that is mapped to key. Am I doing it wrong? Obviously I'm making some assumptions here, so I'm curious where I'm off.

Ultimately, I would like to be able to use WeakMap to write some tests to ensure certain objects get garbage collected and don't leak memory.

Tuesday, August 19, 2014

Hacking on Atom Part I: CoffeeScript

Atom is written in CoffeeScript rather than raw JavaScript. As you can imagine, this is contentious with “pure” JavaScript developers. I had a fairly neutral stance on CoffeeScript coming into Atom, but after spending some time exploring its source code, I am starting to think that this is not a good long-term bet for the project.

Why CoffeeScript Makes Sense for Atom

This may sound silly, but perhaps the best thing that CoffeeScript provides is a standard way to declare JavaScript classes and subclasses. Now before you get out your pitchforks, hear me out:

The “right” way to implement classes and inheritance in JavaScript has been of great debate for some time. Almost all options for simulating classes in ES5 are verbose, unnatural, or both. I believe that official class syntax is being introduced in ES6 not because JavaScript wants to be thought of as an object-oriented programming language, but because the desire for developers to project the OO paradigm onto the language today is so strong that it would be irresponsible for the TC39 to ignore their demands. This inference is based on the less aggressive maximally minimal classes proposal that has superseded an earlier, more fully-featured, proposal, as the former states that “[i]t focuses on providing an absolutely minimal class declaration syntax that all interested parties may be able to agree upon.” Hooray for design by committee!

Aside: Rant

The EcmaScript wiki is the most frustrating heap of documentation that I have ever used. Basic questions, such as, “Are Harmony, ES.next, and ES6 the same thing?” are extremely difficult to answer. Most importantly, it is impossible to tell what the current state of ES6 is. For example, with classes, there is a proposal under harmony:classes, but another one at Maximally Minimal Classes. Supposedly the latter supersedes the former. However, the newer one has no mention of static methods, which the former does and both Traceur and JSX support. Perhaps the anti-OO folks on the committee won and the transpilers have yet to be updated to reflect the changes (or do not want to accept the latest proposal)?

In practice, the best information I have found about the latest state of ES6 is at https://github.com/lukehoban/es6features. I stumbled upon that via a post in traceur-compiler-discuss that linked to some conference talk that featured the link to the TC39 member’s README.md on GitHub. (The conference talk also has an explicit list of what to expect in ES7, in particular async/await and type annotations, which is not spelled out on the EcmaScript wiki.) Also, apparently using an entire GitHub repo to post a single web page is a thing now. What a world.

To put things in perspective, I ran a git log --follow on some key files in the src directory of the main Atom repo, and one of the earliest commits I found introducing a .coffee file is from August 24, 2011. Now, let’s consider that date in the context of modern JavaScript transpiler releases:

As you can see, at the time Atom was spinning up, CoffeeScript was the only mature transpiler. If I were starting a large JavaScript project at that time (well, we know I would have used Closure...) and wanted to write in a language whose transpilation could be improved later as JavaScript evolved, then CoffeeScript would have made perfect sense. Many arguments about what the “right JavaScript idiom is” (such as how to declare classes and subclasses) go away because CoffeeScript is more of a “there’s only one way to do it” sort of language.

As I mentioned in my comments on creating a CoffeeScript for Objective-C, I see three primary benefits that a transpiled language like CoffeeScript can provide:

  • Avoids boilerplate that exists in the target language.
  • Subsets the features available in the source language to avoid common pitfalls that occur when those features are used in the target language.
  • Introduces explicit programming constructs in place of unofficial idioms.

Note that if you have ownership of the target language, then you are in a position to fix these things yourself. However, most of us are not, and even those who are may not be that patient, so building a transpiler may be the best option. As such, there is one other potential benefit that I did not mention in my original post, but has certainly been the case for CoffeeScript:

  • Influences what the next version of the target language looks like.

But back to Atom. If you were going to run a large, open source project in JavaScript, you could potentially waste a lot of time trying to get your contributors to write JavaScript in the same way as the core members of the project. With CoffeeScript, there is much less debate.

Another benefit of using CoffeeScript throughout the project is that config files are in CSON rather than JSON. (And if you have been following this blog, you know that the limitations of JSON really irritate me. Go JSON5!) However, CSON addresses many of shortcomings of JSON because it supports comments, trailing commas, unquoted keys, and multiline strings (via triple-quote rather than backslashes). Unfortunately, it also supports all of JavaScript as explained in the README:

“CSON is fantastic for developers writing their own configuration to be executed on their own machines, but bad for configuration you can't trust. This is because parsing CSON will execute the CSON input as CoffeeScript code...”
Uh...what? Apparently there’s a project called cson-safe that is not as freewheeling as the cson npm module, and it looks like Atom uses the safe version. One of the unfortunate realities of the npm ecosystem is that the first mover gets the best package name even if he does not have the best implementation. C’est la vie.

Downsides of Atom Using CoffeeScript

I don’t want to get into a debate about the relative merits of CoffeeScript as a language here (though I will at some other time, I assure you), but I want to discuss two practical problems I have run into that would not exist if Atom were written in JavaScript.

First, many (most?) Node modules that are written in CoffeeScript have only the transpiled version of the code as part of the npm package. That means that when I check out Atom and run npm install, I have all of this JavaScript code under my node_modules directory that has only a vague resemblance to its source. If you are debugging and have source maps set up properly, then this is fine, but it does not play nice with grep and other tools. Although the JavaScript generated from CoffeeScript is fairly readable, it is not exactly how a JavaScript developer would write it by hand, and more importantly, single-line comments are stripped.

Second, because Atom is based on Chrome and Node, JavaScript developers writing for Atom have the benefit of being able to rely on the presence of ES6 features as they are supported in V8. Ordinary web developers do not have this luxury, so it is very satisfying to be able to exploit it! However, as ES6 introduces language improvements, they will not be available in CoffeeScript until CoffeeScript supports them. Moreover, as JavaScript evolves into a better language (by copying features from language like CoffeeScript), web developers will likely prefer “ordinary” JavaScript because it is likely that they will have better tool support. As the gap between JavaScript and CoffeeScript diminishes, the cost of doing something more nonstandard (i.e., using a transpiler) does not outweigh the benefits as much as it once did. Arguably, the biggest threat to CoffeeScript is CoffeeScript itself!

Closing Thoughts

Personally, I plan to develop Atom packages in JavaScript rather than CoffeeScript. I am optimistic about where JavaScript is going (particularly with respect to ES7), so I would prefer to be able to play with the new language features directly today. I don’t know how long my code will live (or how long ES6/ES7 will take), but I find comfort in knowing that I am less likely to have to rewrite my code down the road when JavaScript evolves. Finally, there are some quirks to CoffeeScript that irk me enough to stick with traditional JavaScript, but I’ll save those for another time.