distributed computing

Prologue This post is my take on reviving an old project (the last commit was 3 years ago) born around 2007/2008 at Nokia Research Center and written in Erlang. What was exciting for me is the fact that Disco project is capable of running Python MapReduce Jobs against an Erlang core, how awesome is that! — Erlang is a synonym for parallel processing and high availability. I successfully built it though and ran a 250M records dataset which is 10GB+ in size using a Python MapReduce job that finished in 28 minutes (improved from 44 minutes) using a cluster of 3 EC2 free-tier t2....

distributed computing

Disco: Erlang/Python MapReduce #2

Intro to Disco: an Erlang/Python MapReduce