Packaging Elasticrawl using Traveling Ruby

13 Jan 2015   ruby

Phusion the makers of the Passenger Ruby server have recently launched Traveling Ruby. It simplifies deploying Ruby tools by generating deployment packages for OS X and Linux. These packages ship with a built-in Ruby 2.1 interpreter meaning the user doesn’t need to install or upgrade their local version of Ruby.

I decided to give it a try with my elasticrawl tool. It’s a CLI tool that automates creating AWS Elastic MapReduce jobs that process Common Crawl data. This post has a walkthrough of what it does.

Problems with deploying Ruby tools

I love developing with Ruby but distributing client apps written in Ruby is painful. The Ruby version that ships with operating systems usually lags far behind the latest version. The RubyGems repository and tools like rbenv are great if you’re a Ruby developer. But if your end-users aren’t Ruby developers they don’t want to learn these tools just to use your app.

So when I was writing my elasticrawl CLI tool I was torn. I considered using Go since it’s easier to deploy. This is a common problem for example the popular Vagrant tool is developed in Ruby. But Hashicorp the makers of Vagrant have used Go for all their newer tools like Packer and Serf.

For me learning Go would have taken a lot of time that I didn’t have. Also deployment was the only problem I had with Ruby. All the libraries I needed existed for Ruby and speed wasn’t a concern since the app mainly just makes API calls. So when I heard about Traveling Ruby it looked like exactly what I needed.

Installing via RubyGems

I started out distributing elasticrawl as a Ruby Gem; this adds an elasticrawl command that provides the CLI interface.

The app uses an embedded SQLite database for keeping track of the Common Crawl data and the Hadoop jobs. This means it uses the sqlite3 gem, which has a C extension that has to be compiled. So the user also has to install the development headers for SQLite and the package to install depends on the OS version or Linux distro version. The nokogiri gem is a HTML/XML parser that also has a C extension for performance reasons. So more development headers need to be installed.

This deploy process is a terrible user experience for non-Ruby developers and also could turn into a maintenance nightmare for me.

Packaging using Traveling Ruby

The Traveling Ruby approach is to generate 3 packages for OS X, Linux 32-bit and Linux 64-bit. The deployment packages are tarballs that contain a Ruby 2.1 interpreter. It also supports adding Gems to the package and some widely used Gems that require C extensions. This includes the sqlite and nokogiri gems used by elasticrawl so I was all set.

I followed this 3 part tutorial that covers how to develop a Rake task that generates the 3 deploy packages. This post from Dr Nic on packaging Bosh was also very useful. I shamelessly copied his approach of creating a new Git repository called traveling-elasticrawl. This just contains a Gemfile and a Rakefile. The Gemfile installs the elasticrawl gem and specifies the versions for the nokogiri and sqlite3 gems. The Rakefile automates the process of generating the 3 deploy packages.

Installing via a deploy package

The new deploy instructions are much simpler. You download the deploy package from CloudFront. Extract the tar archive and run the elasticrawl command from the extracted directory.


I’ve been really impressed with Traveling Ruby. It solves a very real problem with distributing Ruby apps. It’s currently being expanded to support more C extensions. The Rake task is quite complex but the BOSH example from Dr Nic was very helpful and luckily for me elasticrawl has fewer dependencies.

I was surprised at how large the packages are at 22 MB compressed and 112 MB extracted. Most of this is the gems dependencies (88 MB) rather than the Ruby interpreter (21 MB).

comments powered by Disqus