Tuesday, June 21, 2005

Keyword-based logging with the py library

I've been experimenting lately with the logging functionality recently introduced in the py library. Most of the functionality is still in early experimental stage, but the main ideas are pretty well defined. I want to review some of these ideas here and hopefully get some feedback.

Holger Krekel, the main py library developer, introduced the idea of keyword-based logging and of logging producers and consumers. A log producer is an object used by the application code to send messages to various log consumers. The consumers can be STDOUT, STDERR, log files, syslog, the Windows Event Log, etc. When you create a log producer, you define a set of keywords that are then used to both route the logging messages to consumers, and to prefix those messages.

Here's a quick example of code that works with the current svn version of the py library (which you can get from http://codespeak.net/svn/py/dist). I saved the following code in a file called logmodule.py:
# logmodule.py

import py

log = py.log.Producer("logmodule")

def runcmd(cmd):
log.cmd(cmd)
# real code follows

def post2db(data):
log.post2db(data)
# real code follows

if __name__ == "__main__":
runcmd("Running command CMD1")
post2db("PASS")
I started by instantiating a log object of class py.log.Producer with the keyword "logmodule". Since I didn't associate any consumer with the log producer, a default consumer will be used, which prints all messages to STDOUT.

The log producer object does not have any pre-defined methods associated with it. Instead, you can invoke arbitrarily-named methods. When accessing a brand new method (such as log.cmd or log.post2db), another producer object is returned on the fly, with the name of the method added as a keyword to the already-existent set of keywords. When calling the returned object as a function with the log message as the argument, the keywords are looked up to see if there is any log consumer associated with them. If no consumer is found, the default one is used and the message is pri nted to STDOUT.

Running the code above produces:
[logmodule:cmd] Running command CMD1
[logmodule:post2db] PASS
What happened behind the scenes when invoking log.cmd("Running command CMD1") was that a log producer object called log.cmd was created with ('logmodule', 'cmd') as the set of keywords. When calling the object as a function with the message as the argument, the message was by default printed to STDOUT, since no consumer was found to be associated with those keywords.
The message was also prefixed with the keywords, joined by ":".

Now let's assume we don't want any messages to be logged when using the logmodule module directly. To achieve this, we associate a null consumer with the "logmodule" keyword by adding the following line somewhere at the top of our module :
py.log.setconsumer("logmodule", None)
(NOTE: the name of the setconsumer function might change, but the idea will be the same, namely that any keyword or set of keywords can be associated with log consumers).

Now, when we invoke log.cmd("Running command CMD1"), the keyword matching logic will first look for a consumer associated with ('logmodule', 'cmd'), since this is the complete set of keywords. If it doesn't find it, it looks for a consumer associated with 'logmodule'. In our case, it will find the consumer we designated (None), so it will use that consumer and not log the message anywhere.

The simple mechanism of associating log consumers with keywords offers a tremendous amount of flexibility in configuring logging for an application. The association keywords-consumers can be done at run time, via a configuration file or via command-line arguments, and the application code doesn't need to know where logging gets routed. As an example, let's assume that we want to use the 2 top-level functions runcmd and post2db from the logmodule module, and at the same time we want to log messages to other consumers. Here is some code I saved in a file called use_logmodule.py:
# use_logmodule.py

import py
from logmodule import runcmd, post2db

py.log.setconsumer("logmodule", py.log.STDERR)
py.log.setconsumer("logmodule post2db", None)

runcmd("Running command CMD2")
post2db("FAIL")
My goal was to print all messages to STDERR by default, and not print any messages associated with the "post2db" keyword (in a real life situation, those messages could be associated with some database-management code for example).

To achieve my goal, I associated a consumer that uses STDERR with the keyword "logmodule" and a "/dev/null"-type consumer (i.e. None) with the keywords "logmodule post2db". Now when I run the above code, I get the following message printed to STDERR:
[logmodule:cmd] Running command CMD2
I could just as easily have routed the logging messages to a log file by associating a producer of type py.log.Path (currently there are 3 types of pre-defined consumers offered by the py.log package: py.log.STDOUT, py.log.STDERR and py.log.Path; in the future, they will be augmented with other types of consumers such as syslog, Windows Event Logs, etc.). Here's how to log all "post2db"-related messages to a file:
py.log.setconsumer("logmodule post2db", py.log.Path("/tmp/logmodule.out"))
When running the code in use_logmodule.py, the following line will be printed to /tmp/logmodule.out:
[logmodule:post2db] FAIL
A question you might have at this point is how does all this relate to a more traditional use of logging, based on severity levels such as debug, info, warn, error, critical. Let's assume we want our application code to issue calls to log.debug, log.info and log.error, while being able at the same time to direct these various levels of severity to different logging mechanisms. We could for example log debug messages to a log file, informational messages to the console (STDOUT or STDERR), and error messages to both a log file and the console. Here's one way to achieve this via the py.log keyword-based approach. I saved the following code in a file called config.py:
# config.py

import sys
import py

log = py.log.Producer("")
log.debug = py.log.Producer("logfile")
log.info = py.log.Producer("console")
log.error = py.log.Producer("console logfile")

logfile = "/tmp/myapp.out"
py.log.setconsumer("logfile", py.log.Path(logfile))
py.log.setconsumer("console", py.log.STDOUT)

l = open(logfile, 'a', 1)

def console_logfile_logger(msg):
print >>sys.stdout, str(msg)
print >>l, str(msg)

py.log.setconsumer("console logfile", console_logfile_logger)
Most of this stuff should look familiar by now. I instantiated an object called log of class Producer, with no keywords associated with it. This object will serve as the default producer object, so the application will be able to call log(some_msg) and have some_msg printed to STDOUT by default. Next, I created the debug, info and error attributes for the log object. This will enable me to call log.debug(msg), log.info(msg) and log.error(msg) in my application code, without necessarily caring where those messages get routed. As it stands right now in the configuration file, the log.debug object is declared as a log.Producer object associated with the keyword "logfile". Similarly, the log.info object is associated with the keyword "console" and the log.error object is associated with both keywords "console logfile".

The next step is defining consumers for the various keywords. For "logfile" I set a consumer of type Path, pointing to the file /tmp/myapp.out. For "console" I set the STDOUT consumer, and for "console logfile" I set a custom consumer that prints its argument (the message to be logged) to STDOUT and also appends it to the log file. As an aside, I want to note that consumers are simply functions that take a message as an argument and manage the message as they see fit (if you look in the testing code for the py.log module in the py library, you'll see consumers that simply append the message to a list for example).

Here is an example of application code that uses the above config.py file:
# appcode.py

import config

log = config.log

log.debug("DEBUG MESSAGE")
log.info("INFO MESSAGE")
log.error("ERROR MESSAGE")
log.warn("WARNING MESSAGE")
Running appcode.py produces:
[console] INFO MESSAGE
[console:logfile] ERROR MESSAGE
[warn] WARNING MESSAGE
At the same time, the file /tmp/myapp.out contains:
[logfile] DEBUG MESSAGE
[console:logfile] ERROR MESSAGE
Note that we didn't declare log.warn anywhere. It was dynamically transformed by py.log into a producer object with keyword "warn" and printed by default to STDOUT, since the "warn" keyword didn't have any consumer associated with it.

I didn't want to complicate things too much in the code above, but in my opinion it would be better to have 'debug', 'info' and 'error' as part of the keywords that also prefix each message, so that you can see at a glance what kind of severity level the message has. To achieve that, you would change the declarations for log.debug, log.info and log.error in config.py to:
log.debug = py.log.Producer("logfile").debug
log.info = py.log.Producer("console").info
log.error = py.log.Producer("console logfile").error
or alternatively to:
log.debug = py.log.Producer("logfile debug")
log.info = py.log.Producer("console info")
log.error = py.log.Producer("console logfile error")
Now if you run appcode.py, you'll see this printed on the screen:
[console:info] INFO MESSAGE
[console:logfile:error] ERROR MESSAGE
[warn] WARNING MESSAGE
This also makes it easier to turn off debugging for example. All you need to do is to associate the null consumer with the keywords "logfile debug":
py.log.setconsumer("logfile debug", None)
Note that the consumer associated with the single keyword "logfile" is still the Path consumer that prints to /tmp/myapp.out, so non-debug messages would still get routed to that file.

In conclusion, I think that the keyword-based approach works really well for the logging needs of any application. I emphasize once more that the py.log code is still in its infancy and as such is prone to changes (including naming changes), so I wouldn't rely on py.log just yet for production code. But the main idea of associating log producers with log consumers via keywords should be the same, and this is I think the novelty of this approach.

One enhancement that I've discussed with Holger is to have a way to specify that you want multiple keywords to be routed each to a different consumer. So for example, if you declared
log.error = py.log.Producer("console logfile")
a call to log.error(some_msg) would try to match the "console" keyword individually to a consumer, and the "logfile" keyword to another consumer.

Right now, the keyword matching logic only looks for a single consumer that matches either ("console", "logfile") or just ("console",), so you need to declare a custom consumer such as the console_logfile_logger function I described above.

In the multiple-keyword approach, you wouldn't need to do that, instead you would just associate "logfile" with a consumer of type Path, "console" with a consumer of type STDOUT/STDERR, and both consumers would then receive the message when log.error(some_msg) was called.

If you think you can use the keyword-based approach to logging in your own application code, or if you have a use case that you would like to see implemented via py.log, or if you have any suggestions at all, please leave a comment here of send email to py-dev at codespeak dot net.

4 comments:

Ian Bicking said...

I haven't been able to keep up with the discussion on the mailing list, so thanks for summarizing the state here. This is definitely something I'll be looking into using with Paste, since I have been putting off using the logging module for some time, even though I know I need some kind of logging. This seems much more approachable.

Some things that I'll probably be thinking about are things like using logging to replace print statements (with consumers that show the accumulated log directly on web pages) and using logging for other kinds of reports (like unexpected exceptions or warnings -- both of which should produce the same kind of report, but one of which aborts the request).

I think a hierarchy of keywords would be useful to define. For instance, error is-a warning (is-a info, etc), could be used to create logging levels and ranges

Anonymous said...

Hi,

I very much like the idea of using method names to define logging channels.

I'd love to be able to do something like that:

log.cmd.debug("Debug message")
log.cmd.error("Error message")

or even

log.cmd("Debug message")
log.cmd.error("Error message")

And then later, be able to filter out log messages based on keywords used.

Thanks!

Anonymous said...

Maybe you can give a look at log4j and all its offsprings, amongst them log4py

The producer/consumer pattern was applied here first, with loggers (producers) and appenders (consumers).

You can create as many logger object as you want (per module, or per client connection, or whatever)

An XML config file hooks loggers to one or more appenders (console, file, db, udp, etc.)

loggers are hierarchical: disabling high level logger (e.g. module_A_logger) disables all child loggers (module_A_init, etc)

Finally the tagged logging was introduced here also.

Very nice. I hate Java but it's one of the few case where a really superior tool was designed in Java first, then ported to most of the other general languages.

For my part I'm gonna try that Py logging, thank you for pointing it.

Ian Bicking said...

Anonymous: log4j was what inspired the standard Python logging module. It's been a tremendous non-success. Maybe it's not fair, and maybe that module deserves more respect than it gets, but empirically it's fairly easy to see that the module just isn't being used.

Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...