microformats

Liminal Existence

Clouds in Iceland

Sunday, April 01, 2007

MapReduce in 36 lines of Ruby

This has been burning a hole in my head since August, after Joel's post made it blindingly obvious that Ruby is the perfect language for distributed programming. I have some code that properly implements partitioning, etc, but never got around to finishing it sufficiently for a proper release. Here's the core idea; if anyone wants the partitioning code, ping me at romeda@gmail.com. mapreduce_enumerable.rb:

require 'rubygems'
require 'ringy_dingy'
require 'ruby2ruby'

module Enumerable
  def dmap(&block)
    self.each_with_index do |element,idx|
      ring_server.write([:dmap, Process.pid, block.to_ruby, element, idx])
    end

    results = []
    while results.size < self.size
      result, idx = ring_server.take([:dmap, Process.pid, nil, nil]).last(2)
      results[idx] = result
    end

    results
  end

  def ring_server
    return @ring_server if @ring_server

    ringy_dingy = RingyDingy.new nil
    @ring_server = ringy_dingy.ring_server
  end
end
mapreduce_runner.rb:

require 'rubygems'
require 'ruby2ruby'
require 'ringy_dingy'

ringy_dingy = RingyDingy.new nil
ring_server = ringy_dingy.ring_server

loop do
  pid, block, element, idx = ring_server.take([:dmap, nil, nil, nil, nil]).last(4)
  begin
    result = eval(block).call(element)
  rescue Object => err
    result = err
  end
  puts "Got #{result} from #{element} for #{pid}."
  ring_server.write([:dmap, pid, result, idx])
end
From the shell:
$ sudo gem install RingyDingy
$ sudo gem install ruby2ruby
$ ring_server &
$ ruby mapreduce_runner &
$ ruby mapreduce_runner &
From irb:

> require 'mapreduce_enumerable'
> (1..100).to_a.dmap { |v| v * 2 }

Labels:

6 Comments:

Blogger Soyapi said...

It's official. Blaine is the new Larry Page!

Monday, 2 April 2007 07:49:00 GMT+01:00  
Anonymous Anonymous said...

Like some earlier attempts, it's missing the reduce part, but you managed to add code mobility which is definitely a step forward :)

Tuesday, 3 April 2007 09:17:00 GMT+01:00  
Blogger Blaine said...

The reduce part is pretty trivial; the implementation is essentially the same as for dmap, but uses ruby's inject method, instead.

Likewise, one could create distributed each, find, etc., methods.

Tuesday, 3 April 2007 11:04:00 GMT+01:00  
Blogger stoyan said...

missing .rb after the script name

ruby mapreduce_runner.rb &

is maybe the right one

Wednesday, 4 April 2007 02:16:00 GMT+01:00  
Anonymous Anonymous said...

It may be the most trivial thing in the world, but it isn't really MapReduce until it Maps and Reduces... is it?

Friday, 27 April 2007 02:08:00 GMT+01:00  
Blogger Shannon -jj Behrens said...

I Googled for "MapReduce ruby" and ran into your post. I saw your picture and thought, "Hey wait a minute! I've seen that guy before!" I guess that's what happens when you interview at half the companies in Silicon Valley!

Happy Hacking!

Sunday, 10 June 2007 16:52:00 GMT+01:00  

Post a Comment

Links to this post:

Create a Link

<< Home