Monday, June 05, 2006

Got brie?

Cheesecake/Summer of Code project: 1 down, 11 to go. The numbers represent weeks that are allocated for Google SoC projects this year, up to the August 21st deadline.

We decided to have 1-week iterations, which we entered as milestones in Trac. Each iteration consists of several stories which are entered as tickets of type 'enhancement' in Trac. Stories are estimated in points, with roughly 2 points per day. So a 4-point story is estimated at roughly 2 days of work. We'll keep the maximum number of points per iteration to 8, a bit less than the maximum velocity, but more realistic, because there's always something that comes up and needs to be taken care of. Plus, a story is not done if it's not well tested.

Each story is split into tasks that take roughly a few hours each, and the tasks are entered as tickets of type 'task' in Trac (here are all the tickets we've entered so far, by milestone.)

To keep things fun, each iteration has a code name inspired by cheese varieties out of the Monty Python Cheese Shop skit. We assign code names in alphabetical order, skipping certain letters because of obscure/hard-to-remember cheese varieties for that letter -- such as appenzeller.

The first milestone/iteration was the delicious brie. I'm very pleased to report that Michał made great progress and completed the following 2 stories that were selected for this iteration:
  • Bugfixes (2 points)
    • Score is decresed for .pyc files, the same should be done for .pyo.
    • Filenames checking should be a bit more restrictive than checking regular expression. Readme, README and readme are acceptable names, but ReAdMe is not.
    • Files that change cheesecake index cannot be empty
  • Docstring index enhancements (6 points)
  • Use the latest pydoctor.
  • Check docstrings contents: make sure they're not empty.
  • Check docstrings for use of epytext.
  • Check docstrings for use of ReST.
  • Check docstrings for use of Javadoc.
  • Write finer-grained unit tests for docstring index.
I think that splitting the workload into stories, and stories into well-defined tasks, has helped the development and testing process a lot, while keeping the project light (or agile, if you prefer) on project management overhead. Another thing I like about Trac is that it shows closed ticket numbers with a strike-through line, so by enumerating the tasks that a story depends on -- like here -- we can see at a glance when the story has been completed.

We also have a 'product backlog' of sorts: the SummerOfCode06 Wiki page where we jot down ideas about things to implement next. Stories for the next iterations will be chosen out of the pool of stories that are already there. The rough month-to-month schedule is: June is mostly about Cheesecake 'core' enhancements, July will be dedicated to the PyPI integration, and August to investigating and implementing ways to run Cheesecake in a VM/sandbox. Things might shift a bit, but having small iterations and frequent releases will help in adapting to changes that will undoubtedly occur. Speaking of releases, we're planning on releasing Cheesecake itself around the end of June.

Also check out Michał's SoC blog -- aptly named Mousebender -- for more details on what he's working on.

I encourage people interested in this project to start posting to the cheesecake-dev mailing list. The more feedback we get from the community, the more we can fine-tune the various Cheesecake index measurements.

And speaking of feedback: some people expressed their displeasure at having the Cheesecake scores up on PyPI, thinking that this would turn PyPI into a 'hall of shame'. A good compromise might be to make the Cheesecake score for a given PyPI package visible only to the creator of the package (Richard Jones's idea), while showing the top N or top X% packages in each index category to everybody, so that people can have practical examples of packages that scored high (my idea). This would avoid the whole 'public hall of shame' controversy, and turn Cheesecake+PyPI into a 'public hall of fame'. Michał is also thinking about adding links to explanatory pages next to PyPI scores, so that people understand what counts for the score, why it's worth improving it and how to improve it.

Here also is some great feedback from Will Guaraldi. We'll certainly start working on this very soon:

"Maybe create a --recommend option that for each category if the score is less than half (or something along those lines), Cheesecake spits out a short blurb about what the category score means and a url for where to go for more information about how to fix it. Then at that url (it could point to a specific page on this wiki) are resources regarding that category.

For example, somewhere else on this wiki I asked about where I might find documentation on what kind of information should go into files like README, CHANGELOG, ... That information would be really useful to have for all the categories.

If you need help--I'd be happy to help build such documentation. If you we put the information in the wiki, then it can grow and evolve over time which would also be useful because it might provide feedback into those specific categories and tests."

Anyway, for now, enjoy the brie and get ready for some camembert!

1 comment:

PJE said...

PyPI already is a hall of shame for many packages; IMO we might as well make it official. ;-)

Seriously, not showing *all* the scores publically isn't as good an idea, because people can't tell how they compare to the norm. It's a lot easier for somebody to just assume that "everybody else's score is probably just as bad".

Mainly, though, the purpose of showing the score is to benefit the *users*, not the *authors*. The authors can run Cheesecake before uploading if they're worried about being embarrassed. Showing the score allows users not to waste their time on things that can't be installed or in some cases even downloaded.

Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...