Monday, June 26, 2006

OpenWengo Code Camp

Found this via the buildbot-devel mailing list: OpenWengo Code Camp. It seems similar in philosophy and goals to the Google Summer of Code. Excerpt from the home page:

"OpenWengo Code Camp is a friendly, challenging and mind-stimulating contest aimed at pushing open source software projects forward.
Students apply for proposed software development subjects for which they have a particular interest in. These subject proposals describe ways to bring enhancements to existing or new FOSS projects, generally by writing source code.

If their application is accepted, they get the chance to be mentored by open source software contributors to work during 2 months on the subject for which they applied. At the end of summer, mentors give their appreciation: if goals were successfully reached, students get 3500 euros of cash.

Mentors get 500 euros of cash if they played their role which consist mainly in helping students to complete their work successfully and evaluating their work at intermediate and final stages."

Sounds pretty reasonable to me, and the cash is not bad either :-)

Looks like one of the proposals involves building a SQL backend for buildbot -- hopefully the project will go through.

Devon

The 3rd milestone for the Cheesecake/SoC project has been completed -- code name devon. This iteration had 3 stories:

1. Create functional tests that actually execute cheesecake_index script. Check that Cheesecake is:
  • properly cleaning up
  • leaving log file when package is broken and is removing it otherwise
  • computing score properly
  • handling its command line options properly

2. Write script that will automatically download and score all packages from PyPI.
  • Each package should have its score and complete Cheesecake output logged.
  • Gather time statistics for each package.
  • Make a summary after scoring all packages:
    • number of packages for which Cheesecake raised an exception
    • manually check first/last 10 packages and think about improving scoring techniques
3. Add support for egg packages
  • Refactor supported packages interface
  • Add support for installing eggs via setuptools easy_install
As far as story #2 is concerned, Michał and I discussed some modifications and tweaks we need to do to the scoring algorithms so that more packages get higher scores. Here are some ideas that we already implemented:
  • don't decrease installability score if a package is not hosted on PyPI (the package still needs to have a valid download link on its PyPI page);
  • split required files and directories into 3 categories: high, medium, and low importance, each category getting a score of 30, 20, and 10 points respectively;
  • here is the current classification, where Doc means the file can also have a 'txt' or 'html' extension, and OneOf means the score is given if any one of the files/directories in the specified list is found:
cheese_files = {
Doc('readme'): 30,
OneOf(Doc('license'), Doc('copying')): 30,
OneOf(Doc('announce'), Doc('changelog')): 20,
Doc('install'): 20,
Doc('authors'): 10,
Doc('faq'): 10,
Doc('news'): 10,
Doc('thanks'): 10,
Doc('todo'): 10,
}
cheese_dirs = {
OneOf('doc', 'docs'): 30,
OneOf('test', 'tests'): 30,
'demo': 10,
OneOf('example', 'examples'): 10,
}

We're getting ready to release cheesecake out in the wild pretty soon, I'd say in a couple of weeks -- so stay tuned!

We've also seen some activity on the cheesecake-users and cheesecake-dev mailing lists, and as always we encourage people interested in this project to send us feedback/suggestions/criticisms. We've been known to always take constructive criticism into account :-)

Update

Read also Michał's post on devon.

Sunday, June 11, 2006

Camembert

The second week of the Cheesecake/SoC project has ended, and all the stories have been completed. We chose the name camembert for this iteration. It included some very tasty (or should I say tasteful) refactoring from Michał, who sprinkled some magic pixie dust in the form of metaclasses and __getitem__ wizardry. It also included some development environment-related tasks, all of them executed via buildbot: automatically generating epydoc documentation and publishing it, running coverage numbers and publishing them, and converting the reST-based README file into Trac Wiki format. This last task had as a side-effect the creation of a little tool that Michał called rest2trac, which will be made available in the near future. Currently it does the conversions that we need for the markup we use in the README file.

All in all, another productive week, and lots of good work from Michał. Check out his Mousebender blog for more information.

Friday, June 09, 2006

Xen installation and configuration

Courtesy of my co-worker Henry Wong, here's a guide on installing and configuring Xen on an RHEL4 machine.

Introduction

Xen is a set of kernel extensions that allow for paravirtualization of operating systems that support these kernel extensions, allowing for near-native performance for the guest operating systems. These paravirtualized systems require a compatible kernel to be installed for it to be aware of the underlying Xen host. The Xen host itself needs to be modified in order to be able to host these systems. More information can be found at the Xen website.

Sometime in the future, XenSource will release a stable version that supports the installation of unmodified guest machine on top of the Xen host. This itself requires that the host machine hardware have some sort of virtualization technology integrated into the processor. Both Intel and AMD have their own versions of virtualization technology, VT for short, to meet this new reqirement. To distinguish between the two competing technologies, we will refer to Intel's VT as its codename, Vanderpool, and AMD's VT as Pacifica.


Installation

Before starting, it is highly recommended that you visit the Xen Documentation site. This has a more general overview of what is involved with the setup, as well as some other additional information.

Terminology

  • domain 0 (dom0): In terms of Xen, this is the host domain that hosts all of the guest machines. It allows for the creation and destruction of virtual machines through the use of Python-based configuration files that has information on how the machine is to be constructed. It also allows for the management of any resources that is taken up by the guest domains, i.e. networking, memory, physical space, etc.

  • domain U (domU): In terms of Xen, this is the guest domain, or the unpriviledged domain. The guest domain has resources assigned to it from the host domain, along with any limits that are set by the host domain. None of the physical hardware is available directly to the guest domain, instead the guest domain must go through the host interface to access the hardware.

  • hypervisor: Xen itself is a hypervisor, or in other words, something that is capable of running multiple operating systems. A more general definition is available here.

Prerequisites

Xen Hypervisor Requirements
  • A preexisting Linux installation, preferably something running 2.6. In this case, we'll be running with Redhat Enterprise Linux 4 Enterprise Server Update 3.

  • At least 1GB or more of RAM

  • 40GB+ disk space available

  • (OPTIONAL) Multiple CPU's. Hyperthreading doesn't count in this case. The more, the better, since Xen 3.0 is capable of virtualized SMP for the guest operating system.

Guest Domain Requirements

  • A preexisting Linux installation, preferably something running either the same kernel version as the host-to-be or newer. More on this later in the page.

  • Some storage for the guest domain. An LVM-based partitioning scheme would be ideal, but you can use a file to back the storage for the machine.

Xen Hypervisor Installation Procedure

  1. Obtain the installation tarball from XenSource Download Page. In this case, grab the one for RHEL4.

  2. Extract the tarball to a directory with sufficient space and follow the installation instructions that are provided by XenSource. For RHEL4, it is recommended that you force the upgrade of glibc and the xen-kernel RPMs. This will be explained in detail further in the page.

  3. Append the following to the grub.conf/menu.lst configuration file for the GRUB bootloader:

    title Red Hat Enterprise Linux ES-xen (2.6.16-xen3_86)
    root (hd0,0)
    kernel /xen-3.0.gz dom0_mem=192M
    module /vmlinuz-2.6-xen root=/dev/VolGroup00/LogVol00 ro console=tty0
    module /initrd-2.6-xen.img

    This might change depending on the version that is installed, but for the most part, using just the major versions should work. Details about the parameters will be explained later in the page.

  4. Reboot the machine with the new kernel.

The machine should now be running the Xen kernel

Guest Domain Storage Creation Procedure

LVM Backed Storage

By default, RHEL4 (and basically any new Linux distribution that uses a 2.6 kernel by default) uses the LVM (Logical Volume Manager) in order to keep track of system partitions in a logical fashion. There are two important things about LVM, the logical volume and the volume group. The volume group consists of several physical disks that are grouped together during creation, with each volume group having a unique identifier. Logical volumes are then created on these volume groups, and can be given a unique name. These logical volumes are able to grab a pool of available space on the volume group, with any specified size, properties, etc. If you wish to learn more about LVM, a visit to the LVM HOWTO on the Linux Documentation Project site is recommended.

Physical Partition Backed Storage

Far easier to create than an LVM, but with a little less flexibility, the physical partition backed storage for a guest machine just uses a system partition to store the data of the virtual machine. This partition needs to be formatted to a filesystem that is supported by the host, if you are to use the paravirtualization approach for domain creation.

File-Backed Storage

By far the easiest way to get a guest domain up and running, a file-backed store for the guest allows you to put the file anywhere where there is space. You wouldn't have to give up any extra partitions in order to create the virtual machine. But, this incurs a performance penalty.

Guest Domain Installation Procedure

  1. Create an image tarball from the preexisting Linux installation for the guest. Use tar along these lines:

    tar --exclude=/ --exclude=/sys/* --exclude=/tmp/* --exclude=/dev/* --exclude=/proc/* -czpvf  /

    Note that the excludes are before rather than after the short flags. This is because the -f short option is positional, and thus it needs a name immediately after the option.

  2. Move the tarball over to the Xen hypervisor machine.

  3. Mount the desired location of the guest storage on the hypervisor.

  4. Unpack the tarball into the guest storage partition.

  5. Copy the modules for the Xen kernel into the guest's /lib/modules directory. You can use the following command to copy the modules directory, replacing with the guest storage mount point:

    $ cp -r /lib/modules/`uname -r`/ /lib/modules/
  6. Move the /lib/tls directory to /lib/tls.disabled for the guest. This operation is specific to Redhat-based systems. Due to the way that glibc is compiled, the guest operating system will incur a performance penalty if this is not done. Ignore this step for any non-Redhat systems.

Initial setup of the guest is completed.


Running With Xen

Creating and starting a guest domain

  1. Create a guest configuration file under /etc/xen. Use the following example as a guideline:

    kernel = "/boot/vmlinuz-2.6-xen"                # The kernel to be used to boot the domU
    ramdisk = "/boot/initrd-2.6.16-xenU.img" # Need the initrd, since most of these systems run udev

    memory = 256 # Base memory allocation
    name = "xmvm1" # Machine name
    cpus = "" # Specific CPU's to assign the vm, leave blank
    vcpus = 1 # Number of available CPU's to the system
    vif = [ '' ] # Defines the virtual network interface

    # LVM-based storage
    disk = [ 'phy:VolGroup01/xenvm1-root,hda1,w', # Guest storage device mapping to the virtual machine
    'phy:VolGroup01/xenvm1-swap,hda2,w' ]

    root = "/dev/hda1 ro" # Root partition kernel parameter
  2. Mount the guest storage partition and edit the /etc/fstab for the guest to reflect any changes made to the configuration file. Remove any extraneous mount points that won't be recognized by the guest when the system is started, otherwise the guest machine will not boot.

  3. Start the maching using the following command:

    $ xm create -c 

    This will create the machine and attach it to a virtual console. You can detach from the console using CTRL-].

Further setup is still required, but it is OS-specific. The network interfaces will need to be setup for the guest machine.

Python at UC Riverside

I interviewed a candidate for a QA position a couple of days ago; he had a bachelor's degree in Comp. Science from UC Riverside. I was happy to find out that Python is the main language taught there, along with C++. They participated in an XP-style project based on Python and Pygame. What's not to like?

One thing though that I don't really get is that they didn't seem to put emphasis at all on unit testing. They had short iterations, customer feedback, pair programming, but no unit tests. How can you teach XP without stressing the importance of unit tests? To me "XP with no unit tests" is an oxymoron, up there with "making soup in a sieve", or even -- dare I say -- "work on the Cheesecake project to keep the cuddly teddy-bear of an effbot happy" (extra Cheesecake points to anyone who can spot the multiple oxymorons in the last phrase.)

Update 06/11/06

Peter Fröhlich contacted me via email, and told me that, in all probability, the XP class I mentioned is one that he taught when we was at UCR. Peter pointed out that unit tests were mentioned in the class, and students were encouraged to use them, but were not forced to do so. I stand corrected in my assessment above. This shows that fact checking should be a practice more widely used by bloggers! :-)

I'm also happy to report that Peter uses Python in the classes he teaches at JHU too. I just wish more people in academia would follow his example.

Monday, June 05, 2006

Got brie?

Cheesecake/Summer of Code project: 1 down, 11 to go. The numbers represent weeks that are allocated for Google SoC projects this year, up to the August 21st deadline.

We decided to have 1-week iterations, which we entered as milestones in Trac. Each iteration consists of several stories which are entered as tickets of type 'enhancement' in Trac. Stories are estimated in points, with roughly 2 points per day. So a 4-point story is estimated at roughly 2 days of work. We'll keep the maximum number of points per iteration to 8, a bit less than the maximum velocity, but more realistic, because there's always something that comes up and needs to be taken care of. Plus, a story is not done if it's not well tested.

Each story is split into tasks that take roughly a few hours each, and the tasks are entered as tickets of type 'task' in Trac (here are all the tickets we've entered so far, by milestone.)

To keep things fun, each iteration has a code name inspired by cheese varieties out of the Monty Python Cheese Shop skit. We assign code names in alphabetical order, skipping certain letters because of obscure/hard-to-remember cheese varieties for that letter -- such as appenzeller.

The first milestone/iteration was the delicious brie. I'm very pleased to report that Michał made great progress and completed the following 2 stories that were selected for this iteration:
  • Bugfixes (2 points)
    • Score is decresed for .pyc files, the same should be done for .pyo.
    • Filenames checking should be a bit more restrictive than checking regular expression. Readme, README and readme are acceptable names, but ReAdMe is not.
    • Files that change cheesecake index cannot be empty
  • Docstring index enhancements (6 points)
  • Use the latest pydoctor.
  • Check docstrings contents: make sure they're not empty.
  • Check docstrings for use of epytext.
  • Check docstrings for use of ReST.
  • Check docstrings for use of Javadoc.
  • Write finer-grained unit tests for docstring index.
I think that splitting the workload into stories, and stories into well-defined tasks, has helped the development and testing process a lot, while keeping the project light (or agile, if you prefer) on project management overhead. Another thing I like about Trac is that it shows closed ticket numbers with a strike-through line, so by enumerating the tasks that a story depends on -- like here -- we can see at a glance when the story has been completed.

We also have a 'product backlog' of sorts: the SummerOfCode06 Wiki page where we jot down ideas about things to implement next. Stories for the next iterations will be chosen out of the pool of stories that are already there. The rough month-to-month schedule is: June is mostly about Cheesecake 'core' enhancements, July will be dedicated to the PyPI integration, and August to investigating and implementing ways to run Cheesecake in a VM/sandbox. Things might shift a bit, but having small iterations and frequent releases will help in adapting to changes that will undoubtedly occur. Speaking of releases, we're planning on releasing Cheesecake itself around the end of June.

Also check out Michał's SoC blog -- aptly named Mousebender -- for more details on what he's working on.

I encourage people interested in this project to start posting to the cheesecake-dev mailing list. The more feedback we get from the community, the more we can fine-tune the various Cheesecake index measurements.

And speaking of feedback: some people expressed their displeasure at having the Cheesecake scores up on PyPI, thinking that this would turn PyPI into a 'hall of shame'. A good compromise might be to make the Cheesecake score for a given PyPI package visible only to the creator of the package (Richard Jones's idea), while showing the top N or top X% packages in each index category to everybody, so that people can have practical examples of packages that scored high (my idea). This would avoid the whole 'public hall of shame' controversy, and turn Cheesecake+PyPI into a 'public hall of fame'. Michał is also thinking about adding links to explanatory pages next to PyPI scores, so that people understand what counts for the score, why it's worth improving it and how to improve it.

Here also is some great feedback from Will Guaraldi. We'll certainly start working on this very soon:

"Maybe create a --recommend option that for each category if the score is less than half (or something along those lines), Cheesecake spits out a short blurb about what the category score means and a url for where to go for more information about how to fix it. Then at that url (it could point to a specific page on this wiki) are resources regarding that category.

For example, somewhere else on this wiki I asked about where I might find documentation on what kind of information should go into files like README, CHANGELOG, ... That information would be really useful to have for all the categories.

If you need help--I'd be happy to help build such documentation. If you we put the information in the wiki, then it can grow and evolve over time which would also be useful because it might provide feedback into those specific categories and tests."

Anyway, for now, enjoy the brie and get ready for some camembert!

Friday, June 02, 2006

Cheesecake mailing lists

If you are interested in the Cheesecake project, you can now subscribe to two mailing lists: cheesecake-dev and cheesecake-users (thanks, Titus!)

In the near future, most of the discussions will take place on cheesecake-dev, since the Cheesecake/SoC project is in full swing.

Michał and I already posted a couple of threads with feedback that we got from the CommentsPage on the Trac Wiki. We'd love to get more feedback, so please don't spare us! You can also add ideas to the SummerOfCode06 Wiki page.

Thursday, June 01, 2006

Sparklines and sparkplot

Via Darren Rowse: how infosthetics is using sparklines to display its relative daily Google AdSense earnings. Looks like they're using the PHP sparkline library, but hey, you can always give my sparkplot module a try too! Maybe you want to restore romance to the sports page? Then sparkplot might just be the ticket :-)

Several people expressed interest recently in sparkplot, so I guess it's time for me to dust it off a bit and release it in the wild. Stay tuned.

Treasure trove: AYE conference articles

If you're interested in Amplifying Your Effectiveness, then you'll enjoy these articles from the AYE conference, written by AYE hosts and guests who explore both the technical and the human sides of software and IT development. Also check out the links to various blogs on the AYE conference page. To summarize in 3 short words: Jerry Weinberg rulez :-)

Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...