Tuesday, March 02, 2010

Getting past hung remote processes in Fabric

This is a note for myself, but maybe it will be useful to other people too.

I've been using Fabric version 1.0a lately, and it's been working very well, with an important exception: when launching remote processes that get daemonized, the 'run' Fabric command which launches those processes hangs, and needs to be forcefully killed on the server where I run the 'fab' commands.

I remembered vaguely reading on the Fabric mailing list something about the ssh channel not being closed properly, so I hacked the source code (operations.py) to close the channel before waiting for the stdout/stderr capture threads to exit.

Here's my diff:
git diff fabric/operations.py 
diff --git a/fabric/operations.py b/fabric/operations.py
index 9c567c1..fe12450 100644
--- a/fabric/operations.py
+++ b/fabric/operations.py
@@ -498,12 +498,13 @@ def _run_command(command, shell=True, pty=False, sudo=False, user=None):
     # Close when done
     status = channel.recv_exit_status()
     
+    # Close channel
+    channel.close()
+
     # Wait for threads to exit so we aren't left with stale threads
     out_thread.join()
     err_thread.join()

-    # Close channel
-    channel.close()

     # Assemble output string
     out = _AttributeString("".join(capture_stdout).strip())
I realize this is a hack, but it solved my particular problem...If you've seen this and have found a different solution, please leave a comment.

3 comments:

Ian Bicking said...

Often hangs like these can be avoided if the thread is created in daemon mode. Well... sometimes. This at least feels similar to hung thread problems with an HTTP server, except you only really notice those when you try to abort the server (which is maybe your problem too?)

Wes Winham said...

If your command is hanging, it's probably because your command doesn't close its input/output streams. To test, try running:
ssh user@myserver.com -C "/etc/init.d/hanging_command start"

If it hangs like that, then it's not fabric's problem specifically. If rewriting your script to properly handle input and output isn't an option or is just too difficult, an easy workaround that I use a lot is to call the command from fabric like:

run("/etc/init.d/hanging_command start < /dev/null &> /dev/null")

I'm sure someone who is a real sysadmin would understand this more than I do, but that's what's been working for me. /etc/init.d/rabbitmq-server is one command that I always seem to need to do this for.

(Apologies if this is a quadruple post. Having trouble with the blogger comment system)

Grig Gheorghiu said...

Ian and Wes -- thanks for the comments. In my case the processes I'm launching remotely are daemonized tornado processes whose stdout/stderr is captured to a pipe going to the logger utility, then to syslog. Fairly complex, and maybe this is what trips Fabric up. I'll clean it up a little by not using logger.

Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...