microformats

Liminal Existence

Clouds in Iceland

Saturday, January 30, 2010

Hot Code Loading in Node.js

Reading through Fever today, this post by Jack Moffitt caught my eye. In it, he discusses a hack to allow a running Python process to dynamically reload code. While the hack itself, shall we say, lacks subtlety, Jack's post got me thinking. It's true, Erlang's hot code loading is a great feature, enabling Erlang's 99.9999999% uptime claims. It occurred to me that it wouldn't be terribly difficult to implement for node.js' CommonJS-based module loader.

A few hours (and a tasty home-made Paella later), here's my answer: Hotload node branch.

Umm… What does it do?

var requestHandler = require('./myRequestHandler');

process.watchFile('./myRequestHandler', function () {
  module.unCacheModule('./myRequestHandler');
  requestHandler = require('./myRequestHandler');
}

var reqHandlerClosure = function (req, res) {
  requestHandler.handle(req, res);
}

http.createServer(reqHandlerClosure).listen(8000);

Now, any time you modify myRequestHandler.js, the above code will notice and replace the local requestHandler with the new code. Any existing requests will continue to use the old code, while any new incoming requests will use the new code. All without shutting down the server, bouncing any requests, prematurely killing any requests, or even relying on an intelligent load balancer.

Awesome! How does it work?

Basically, all node modules are created as sandboxes, so that as long as you don't use global variables, you can be sure that any modules you write won't stomp on others' code, and vice versa, you can be sure that others' modules won't stomp on your code.

Modules are loaded by require()ing them and assigning the return to a local variable, like so:

var http = require('http');

The important insight is that the return value of require() is a self-contained closure. There's no reason it has to be the same each time. Essentially, require(file) says "read file, seal it in a protective case, and return that protective case." require() is smart, though, and caches modules so that multiple attempts to require() the same module don't waste time (synchronously) reading from disk. Those caches don't get invalidated, though, and even though we can detect when files change, we can't just call require() again, since the cached version takes precedence.

There are a few ways to fix this, but the subtleties rapidly complicate matters. If the ultimate goal is to allow an already-executing module (e.g., an http request handler) to continue executing while new code is loaded, then automatic code reloading is out, since changing one module will change them all. In the approach I've taken here, I tried to achieve two goals:

  1. Make minimal changes to the existing node.js require() logic.
  2. Ensure that any require() calls within an already-loaded module will return functions corresponding to the pre-hot load version of the code.

The latter goal is important because a module expects a specific set of behaviour from the modules on which it depends. Hot loading only works so long as modules have a consistent view of the world.

To accomplish these goals, all I've done is move the module cache from a global one into the module itself. Reloading is minimised by copying parent's caches into child modules (made fast and efficient thanks to V8's approach to variable handling). Any module can load a new version of any loaded modules by first removing that module from its local cache. This doesn't affect any other modules (including dependent modules), but will ensure that any sub-modules are reloaded, as long as they're not in the parent's cache.

By taking a relatively conservative approach to module reloading, I believe this is a flexible and powerful approach to hot code reloading. Most server applications have a strongly hierarchical code structure; as long as code reloading is done at the top-level, before many modules have been required, it can be done simply and efficiently.

While I hope this patch or a modified one will make it into node.js, this approach can be adapted to exist outside of node's core, at the expense of maintaining two require() implementations.

Labels: , , ,

7 Comments:

OpenID onebigfluke.com said...

Good to see you blogging again!

Sunday, 31 January 2010 02:17:00 GMT  
Anonymous Michal Migurski said...

That's pretty awesome. The Python version, too. I think what led to the one-time total dominance of CGI and PHP for these kinds of things was the fact that entire scripts and programs were reloaded for each request, making development and deployment a brainless, FTP-driven process. It's always baffled me that mod_python flipped this around. Here's to file modification times and ease of working!

Sunday, 31 January 2010 18:54:00 GMT  
Blogger Austin said...

Very cool idea. I don't quite get how a proper reload would look within the node environment. Since you've got a pre-existing http module ( for example ) out there and a bunch of callbacks are registered to that, am I on the hook to programmatically re-construct that environment using the freshly loaded module? Is there some way to know when objects using the older module become idle so you can safely de-commission them? Or does the simple act of taking their port away make them eventual candidates for V8 garbage collection?

Sunday, 31 January 2010 22:09:00 GMT  
Blogger Blaine said...

Michal: Three Cheers!

Austin: the example code here *is* a proper reload; basically, as soon as the file changes on disk, the watchFile() event fires, and the requestHandler variable is updated. Since that variable is just a pointer to some abstract function, when it changes, so too does the requestHandler function inside reqHandlerClosure(). Next time an incoming request hits the socket, the new requestHandler() function gets called, rather than the old one.

You don't need to re-construct the environment, since all that's happening is that your handler code is getting re-evaluated (once) when it's required.

Old code goes away (in theory! I haven't done a detailed analysis) as soon as all the handles to it have gone. More to the point, once all the requests being handled by the old code have finished, the old handler will be automatically garbage collected by V8.

Monday, 1 February 2010 23:01:00 GMT  
OpenID id said...

One of the thing proper to Erlang's hot code loading is that it uses a vm-centralized code server that handles and garbage collects the different versions of code.

This server can in turn notify subscribers about code changes; the OTP framework uses this to propagate code upgrades in its behaviours (gen_fsm, gen_event [and all the handlers], gen_server) with a callback function that lets the programmer define what changes to the data structures and state currently in running code have to be applied.

Does node.js have anything similar to that? Or is there anything of the kind planned?

I don't really know the in and outs of node.js, but from the blog post, it seems like there's no way to update current state and the only real upgrades will be done on new calls. Is this because node.js just doesn't have the same objectives in mind?

Tuesday, 2 February 2010 20:08:00 GMT  
Blogger jerry said...

here is a Object # a Module has no method 'unCacheModule' error. why?

Wednesday, 3 February 2010 07:22:00 GMT  
Anonymous Vitaliy said...

Nice idea, thanx. Server may by down, or this is safely?

Friday, 12 February 2010 15:19:00 GMT  

Post a Comment

Links to this post:

Create a Link

<< Home