microformats

Liminal Existence

Clouds in Iceland

Sunday, April 01, 2007

MapReduce in 36 lines of Ruby

This has been burning a hole in my head since August, after Joel's post made it blindingly obvious that Ruby is the perfect language for distributed programming. I have some code that properly implements partitioning, etc, but never got around to finishing it sufficiently for a proper release. Here's the core idea; if anyone wants the partitioning code, ping me at romeda@gmail.com. mapreduce_enumerable.rb:

require 'rubygems'
require 'ringy_dingy'
require 'ruby2ruby'

module Enumerable
  def dmap(&block)
    self.each_with_index do |element,idx|
      ring_server.write([:dmap, Process.pid, block.to_ruby, element, idx])
    end

    results = []
    while results.size < self.size
      result, idx = ring_server.take([:dmap, Process.pid, nil, nil]).last(2)
      results[idx] = result
    end

    results
  end

  def ring_server
    return @ring_server if @ring_server

    ringy_dingy = RingyDingy.new nil
    @ring_server = ringy_dingy.ring_server
  end
end
mapreduce_runner.rb:

require 'rubygems'
require 'ruby2ruby'
require 'ringy_dingy'

ringy_dingy = RingyDingy.new nil
ring_server = ringy_dingy.ring_server

loop do
  pid, block, element, idx = ring_server.take([:dmap, nil, nil, nil, nil]).last(4)
  begin
    result = eval(block).call(element)
  rescue Object => err
    result = err
  end
  puts "Got #{result} from #{element} for #{pid}."
  ring_server.write([:dmap, pid, result, idx])
end
From the shell:
$ sudo gem install RingyDingy
$ sudo gem install ruby2ruby
$ ring_server &
$ ruby mapreduce_runner &
$ ruby mapreduce_runner &
From irb:

> require 'mapreduce_enumerable'
> (1..100).to_a.dmap { |v| v * 2 }

Labels:

6 Comments:

Blogger Soyapi said...

It's official. Blaine is the new Larry Page!

Monday, 2 April 2007 07:49:00 o'clock IST  
Anonymous Anonymous said...

Like some earlier attempts, it's missing the reduce part, but you managed to add code mobility which is definitely a step forward :)

Tuesday, 3 April 2007 09:17:00 o'clock IST  
Blogger Blaine said...

The reduce part is pretty trivial; the implementation is essentially the same as for dmap, but uses ruby's inject method, instead.

Likewise, one could create distributed each, find, etc., methods.

Tuesday, 3 April 2007 11:04:00 o'clock IST  
Blogger stoyan said...

missing .rb after the script name

ruby mapreduce_runner.rb &

is maybe the right one

Wednesday, 4 April 2007 02:16:00 o'clock IST  
Anonymous Anonymous said...

It may be the most trivial thing in the world, but it isn't really MapReduce until it Maps and Reduces... is it?

Friday, 27 April 2007 02:08:00 o'clock IST  
Blogger Shannon -jj Behrens said...

I Googled for "MapReduce ruby" and ran into your post. I saw your picture and thought, "Hey wait a minute! I've seen that guy before!" I guess that's what happens when you interview at half the companies in Silicon Valley!

Happy Hacking!

Sunday, 10 June 2007 16:52:00 o'clock IST  

Post a Comment

Links to this post:

Create a Link

<< Home