This has been burning a hole in my head since August, after
Joel's post made it blindingly obvious that Ruby is the perfect language for distributed programming. I have some code that properly implements partitioning, etc, but never got around to finishing it sufficiently for a proper release. Here's the core idea; if anyone wants the partitioning code, ping me at romeda@gmail.com.
mapreduce_enumerable.rb:
require 'rubygems'
require 'ringy_dingy'
require 'ruby2ruby'
module Enumerable
def dmap(&block)
self.each_with_index do |element,idx|
ring_server.write([:dmap, Process.pid, block.to_ruby, element, idx])
end
results = []
while results.size < self.size
result, idx = ring_server.take([:dmap, Process.pid, nil, nil]).last(2)
results[idx] = result
end
results
end
def ring_server
return @ring_server if @ring_server
ringy_dingy = RingyDingy.new nil
@ring_server = ringy_dingy.ring_server
end
end
mapreduce_runner.rb:
require 'rubygems'
require 'ruby2ruby'
require 'ringy_dingy'
ringy_dingy = RingyDingy.new nil
ring_server = ringy_dingy.ring_server
loop do
pid, block, element, idx = ring_server.take([:dmap, nil, nil, nil, nil]).last(4)
begin
result = eval(block).call(element)
rescue Object => err
result = err
end
puts "Got #{result} from #{element} for #{pid}."
ring_server.write([:dmap, pid, result, idx])
end
From the shell:
$ sudo gem install RingyDingy
$ sudo gem install ruby2ruby
$ ring_server &
$ ruby mapreduce_runner &
$ ruby mapreduce_runner &
From irb:
> require 'mapreduce_enumerable'
> (1..100).to_a.dmap { |v| v * 2 }
Labels: mapreduce ruby code