Thursday, February 07, 2013

A workflow of managing Chef with knife

My colleague Jeff Roberts has been trying hard to teach me how to properly manage Chef cookbooks, roles, and nodes using the knife utility. Here is a workflow for doing this in a way that aims to be as close as possible to 'best practices'. This assumes that you already have a Chef server set up.

1) Install chef on your personal machine

In my case, my personal laptop is running OS X (gasp! I used to be a big-time Ubuntu-everywhere user, but I changed my mind after being handed a MacBook Air).

I won't go into the gory details on installing chef locally, but here are a few notes:

  •  I installed the XCode command line tools, then I installed Homebrew. Note that plain XCode wasn't sufficient, I had to download the dmg package for the command line tools and install that.
  •  I installed git via brew.
  •  I installed chef via
  •  I installed the EC2 plugin for knife via
    • cd /opt/chef/embedded/bin/; sudo ./gem install knife-ec2
2) Clone chef-repo locally

Best practices dictate that you keep your chef-repo directory structure in version control. If you are using git, like we do, then you need to clone that locally via a git clone command.

3) Deploy chef client and validation keys

The keys are kept in chef-repo/.chef in our case. You need 2 keys: your_username.pem and validation.pem. You need to coordinate with your Chef server administrator to get them. A good way of passing keys around is to encrypt them on encypher.it and send the link in an email, and communicate the decryption  by some out of band mechanism (such as voice).

4) Configure knife via knife.rb

You need a knife.rb file which sits in chef-repo/.chef as well. Here's a sample (replace username with your actual username, and set the proper EC2 access keys):
log_level :info
log_location STDOUT
node_name 'username'
client_key '/Users/username/chef-repo/.chef/username.pem'
validation_client_name 'chef-validator'
validation_key '/Users/username/chef-repo/.chef/validation.pem'
chef_server_url 'http://chef.example.com:4000'
cache_type 'BasicFile'
cache_options( :path => '/tmp/checksums' )
cookbook_path [ './cookbooks' ]
# EC2:
knife[:aws_access_key_id] = "xxxxxxxxxx"
knife[:aws_secret_access_key] = "XXXXXXXXXXXXXXXXXXX"
5) Test your knife setup

An easy way to see if knife can communicate properly with the Chef server at this point is to list the nodes in your infrastructure via

knife node list

If this doesn't work, you need to troubleshoot it until you make it work ;-)

BTW, in my case, I need to run knife while in the chef-repo directory, for it to properly read the files in the .chef subdirectory.

6) Create a new cookbook

For this example, I'll create a cookbook called myblog. The idea is to install nginx and Octopress.

The proper command to use is:

# knife cookbook create myblog

This will create a directory called myblog under chef-repo/cookbooks, and it will populate it with files and subdirectories pertaining to that cookbook (such as attributes, definitions, recipes, etc).

7) Download any other required cookbooks

For this example, I will download the nginx cookbook from the Opscode community cookbooks. I first search for the nginx cookbook, then I install it:

# knife cookbook site search nginx
# knife cookbook site install nginx


Once the nginx cookbook is installed locally, you still need to upload it to the Chef server:

# knife cookbook upload nginx

8) Create recipe for installing nginx and Octopress in new cookbook

Now that the pre-requisite cookbook is installed and uploaded to Chef, you can use it in your custom cookbook. You need to add references to the pre-requisite cookbook (nginx) in the following 2 files under cookbooks/myblog:


Add this to metadata.rb:

depends "nginx"


Add this to README.md:

Requirements
============
* nginx


The actual custom recipe for myblog lives in cookbooks/myblog/recipes/default.rb. In my case, here's what I do to install Octopress:


include_recipe 'nginx'
include_recipe 'ruby::1.9.1'

# set default ruby to point to 1.9.1 (which is actually 1.9.3!)
system("update-alternatives --install /usr/bin/ruby ruby /usr/bin/ruby1.9.1 400 --slave /usr/share/man/man1/ruby.1.gz ruby.1.gz /usr/share/man/man1/ruby1.9.1.1.gz --slave /usr/bin/ri ri /usr/bin/ri1.9.1 --slave /usr/bin/irb irb /usr/bin/irb1.9.1 --slave /usr/bin/rdoc rdoc /usr/bin/rdoc1.9.1")

# install bundler via gems
system("gem install bundler")

# get octopress source code and install it via bundle and rake
system("cd /opt/; git clone git://github.com/imathis/octopress.git octopress")
system("cd /opt/octopress; bundle install; rake install")


It's a pretty convoluted way of installing Octopress, and it requires installing version 1.9.1 of ruby via the Opscode ruby cookbook first. It took me a few tries to get it right, but it seems to do the job, although I know running system commands on the remote node is not the preferred way of configuring nodes with Chef.

9) Upload new cookbook to Chef server

In order for the new cookbook you created to be available, you need to upload it to the Chef server:

# knife cookbook upload myblog

10) Test new cookbook by deploying new node

At this point, you are ready to test your shiny new cookbook. I did this by launching a new EC2 instance associated with the recipe in the myblog cookbook.

What follows is a command line using the knife EC2 plugin which took me a few tries to get right. It works for me, so I hope it will work for you too if you ever decide to do something similar. I had to dig into the knife-ec2 source code to get to some of these options, since they aren't documented in the README.

# knife ec2 server create -r "role[base], recipe[myblog]" -I ami-0d153248 --flavor c1.medium --region us-west-1 -g sg-7babb117 -i ~/.ssh/mykey.pem -x ubuntu -N myblog -S mykey -s subnet-ca6d20a3 -T T=techblog --ebs-size 50

This tells knife to launch an Ubuntu 12.04 instance (the -I AMI_ID option) associated with a 'base' role and the 'myblog' recipe (the -r option), size c1.medium (the --flavor option), in the us-west-1 region (the --region option), in a given security group (the -g option) and a given VPC subnet (the -s option), using the mykey.pem to ssh into the instance (the -i option -- where mykey.pem is the private key corresponding to the keypair you specify with the -S option) as user ubuntu (the -x option), using the mykey keypair name (the -S option -- this is a keypair that you must already have created), with a Chef node name of myblog (the -N option), an EC2 tag of techblog (the -T option), and finally an EBS root volume size of 50 GB (the --ebs-size option). Whew.

If everything goes well, you'll see something similar to this:

Instance ID: i-3c98b265
Flavor: c1.medium
Image: ami-0d153248
Region: us-west-1
Availability Zone: us-west-1b
Security Group Ids: sg-7babb117
Tags: TtechblogNametechblog
SSH Key: mykey

Waiting for server................
Subnet ID: subnet-ca7e10a3
Private IP Address: 10.10.14.4

followed by the output of an ssh session in which chef-client will run on the newly created instance. You'll be able to see if the chef-client run was successful or not. In either case, you should able to ssh into the new instance with the mykey.pem private key.

11) Commit any new or modified cookbooks

Now that you tested you cookbooks (both the pre-requisite ones such as nginx, and new ones such as myblog), you need to commit them to the chef-repo git repository so other members of your team can take advantage of them. You do this with git add, git commit and git push.

12) Other useful knife commands

You should be able to get information from the Chef server about the new node you just launched by running:

# knife node show techblog
Node Name:   techblog
Environment: _default
FQDN:        
IP:          10.10.14.4
Run List:    role[base],  recipe[myblog]
Roles:       base, sysadmin_sudoers
Recipes:     apt, ntp, timezone, chef-client::service, chef-client::delete_validation, base-apps, users::sysadmins, sudo, nagios-plugins, ruby, rubygems, sensu::client, myblog
Platform:    ubuntu 12.04
Tags:        

You can also edit the run list of a given node by running:

# knife node edit techblog

(you need to set your EDITOR variable to your favorite editor first).

To inspect a given role, use:

# knife role show monitoring
chef_type:            role
default_attributes:   
description:          Installs the sensu monitoring client and related software
env_run_lists:        
json_class:           Chef::Role
name:                 monitoring
override_attributes:  
run_list:            
    recipe[nagios-plugins]
    recipe[ruby]
    recipe[rubygems]
    recipe[sensu::client]

There are many other knife commands you can use -- in fact, using knife to its full potential is an art in itself. Here is a sample of knife commands, courtesy of our Chef guru Jeff Roberts:

This command searches sensu.client.subscriptions and finds node that are running the mysql check.

knife search node "sensu_client_subscriptions:mysql"  

Show the sensu subscriptions for the jira.corp node.

knife node show jira.corp -a sensu.client.subscriptions

Show the EC2 attrs for the test_box node.

knife node show test_box -a "ec2"

Search all nodes and find ones in the "us-west-*" availability zone.

knife search node "ec2_placement_availability_zone:us-west-*" -a "ec2"

Search for all nodes in the role, "webserver" and show the "apache.sites" attribute.

knife search node "role:webserver" -a apache.sites

List all of the versions of the cookbook "nginx".

knife cookbook show nginx

Find all of the nodes in the "prod" environment.

knife search node "chef_environment:prod"

Find the last next available UID.

knife search users "*:*" -a uid | grep uid | sort

Monday, February 04, 2013

Some gotchas when installing Octopress on Ubuntu

Here are some quick notes I took while trying to install the Octopress blog engine on a box running Ubuntu 12.04. I tried following the official instructions and I chose the RVM method. The first gotcha is that you have to have the development tools (compilers, linkers etc) installed already. So you need to run:

# apt-get install build-essential


Then I ran the recommended commands in order to install rvm and rubygems:

# curl -L https://get.rvm.io | bash -s stable --ruby
# source /usr/local/rvm/scripts/rvm
# rvm install 1.9.3
# rvm use 1.9.3
# rvm rubygems latest



I then installed git and grabbed the octopress source code:
 
# apt-get install git
# git clone git://github.com/imathis/octopress.git octopress
# cd octopress/


When trying to install bundler, I got this error:

# gem install bundler

ERROR:  Loading command: install (LoadError)
   cannot load such file -- zlib
ERROR:  While executing gem ... (NameError)
   uninitialized constant Gem::Commands::InstallCommand


Googling around, I found this answer on Stack Overflow which talked about the same error. The solution was to install the zlib1g-dev package, then reinstall rvm so it's aware of zlib, then install bundler.

# apt-get install  zlib1g-dev
# rvm reinstall 1.9.3
# gem install bundler


At this point I was able to continue the installation by running these commands (in the octopress source top directory):

# bundle install
# rake install


That's about it about installing Octopress. I still have to figure out how to front Octopress with nginx, and how to actually start using it.

Friday, February 01, 2013

IT stories from the trenches #2

The year was 1996. I was a graduate student in CS at USC. My first contact with Unix (I had started my career as a programmer in C++ under DOS and later Windows 3.1; yes, I am dating myself big time here!). My first task as a research assistant was to recompile the X server so it can use shared memory extensions. This was part of an experiment that was studying multimedia applications over a high speed proprietary network.

I started optimistically. I got acquainted with the compilers and linkers on the HP-UX platform we were using, I learned all kinds of neat Unix command line utilities, but there was only one glitch -- the X binary wouldn't compile. I tried everything in my power. I even searched on Altavista (no Google in 1996). Nothing worked. Finally, I posted a plea for help on comp.windows.x. A gentle soul by the name of Kaleb Keithley replied first on that thread, then on separate email threads (I was using pine as my email client at the time) and nudged me towards the solution to my issue. It turns out that the X server's frame buffer was not supported on that particular HP workstation where I was trying to compile it. I tried immediately on another type of HP workstation and everything worked like a charm. This was almost 3 months after I started on this project.

Lesson learned? Don't give up. It's a very unpleasant feeling to bang your head against a wall, but in my experience, I noticed over and over again that it's absolutely the best way to learn a new technology or tool. In my case, I knew Unix commands and the Unix development toolchain pretty well at the end of those 3 months. Just persevere in the face of that sinking feeling in the pit of your stomach that another day has passed and you haven't made much progress. There will be an EUREKA moment, I guarantee it.

Another lesson learned: ask for help early and often. These days it's probably on IRC that you would help most quickly, but between IRC, Stack Exchange, Google Groups and Twitter, chances are somebody had already seen the problem you're facing.

As a side note, when you do find a solution to a long-standing problem, do everybody a favor and blog about it. Most often than not, I find solutions to my technical problems by reading blog posts. Even if they bring me only 80% of the way to a solution, it's a huge help.

For the curious, here's a PDF of the paper that resulted from my X server experiments.

Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...