(This is based on a 15 minute talk for the London Python Code Dojo – slides available from SlideShare)

My interest in FluidDB began earlier this year when I attended a talk by Nicholas Tollervey at the London Clojure Dojo. I was expecting yet another talk about yet another non-relational database, but what I discovered was something different. The idea of a shared database storing “things” which anyone could tag with data seemed to be a rather powerful concept, yet simple and elegant. I thought it was a very cool and interesting idea.

But there was a problem.

How could people actually explore the “Fluidverse”? While people using FluidDB are building up conventions, such as the naming and content of tags, and there are tools out there to drill down through the hierarchies of tags and namespaces… there had to be an easier way to find the tags that were of interest to me. I decided I needed data in order to begin finding ways to explore… but what data?

So, while pondering the idea one evening, I was listening to the band Napalm Death and realised I had the answer. One of the things I love to do is find new bands, particularly extreme metal ones, and one way I do this is follow links between bands on Wikipedia. Wikipedia is a great source of band biographies, the content is under a Creative Commons license, and the band biographies often have lists of related bands and genres.

This seemed like a really good starting point to take data I’m interested in, build relationships between the data and give me something to start exploring with. I hacked together a scraper which used Napalm Death as a starting point and branched outwards in a “six degrees of separation” way, initially dumping the information directly into the FluidDB sandbox.

After a few runs, it made more sense to scrape to an intermediate file, and load that instead – allowing me to clean up typos, adjust names, amend tags and also allow me to regenerate the data in FluidDB’s sandbox without having to keep hitting Wikipedia. An example of the output format is as follows:

band:Burzum
  metaljoe/music/band_name = Burzum
  metaljoe/music/source_url = http://en.wikipedia.org/wiki/Burzum
  metaljoe/music/genre/black_metal -> Black metal
  metaljoe/music/genre/dark_ambient -> Dark ambient
  metaljoe/music/related_bands = ['Darkthrone', 'Mayhem', 'Old Funeral']

I wasn’t planning to release the source code, but have had some interest in it so I’ve decided to release it under the MIT license. You can find the code on my BitBucket account: http://bitbucket.org/metaljoe/fluidinyourear-scraper – note this is just the scraper code, not the loader.

With the data in place, I then needed to build something for exploring the relationships in the data. Enter “Fluid In Your Ear”, a very simple web application built around Python, Django and the excellent FOM (Fluid Object Manager) created by Ali Afshar. Given the nature of the bands, there is also a liberal application of Heavy Metal Umlauts – the power of which, courtesy of a particular Black Metal band, managed to crash the FluidDB sandbox a few times by exposing a unicode bug.

The application is deliberately very simple. I’m not a graphics genius (painting with real acrylic paints is my field), and at the moment it’s a basic core – you can browse genres and bands, and explore relationships between the two. I’ve already discovered some new bands through following the links, and re-discovered some older ones.

Due to the six-degrees nature, there is quite a lot that doesn’t fit into a metal or punk category which is quite cool. I’ve encountered a jazz musician called John Zorn who has crossed into hardcore punk and grindcore, to produce some outstanding music I would probably not have found before.

The source code is pretty grotty and the first casualty was a lack of tests. Shocking. In order to improve my confidence in the code and make it easier to refactor, I added unit tests using Django’s test harness and some functional testing using the Twill web testing framework. An example of the Twill test code is as follows:

# test missing genre
go http://127.0.0.1:8000/genre/progressive_vegetarian_grindcore
code 404

# test with trailing slash
go http://127.0.0.1:8000/genre/jazz/
code 200

# test without trailing slash
go http://127.0.0.1:8000/genre/jazz
code 200

# check page contents
find '<h2>Jazz</h2>'
find '<div id="related_bands">'
find '<li><a href="/band/Frank%20Zappa">Frank Zappa</a></li>'

So where next?

Well, first off is to get the application online so my plan was to port to Google App Engine. Unfortunately, I hit a few snags with the fact my app runs Django 1.2 and App Engine is using 1.1. I considered bundling Django in the app, but it became obvious that I’m not really using much of Django’s functionality – some URL routing and templates. The creator of FOM introduced me to Flask, a lightweight web framework, and it looks perfect for my needs. So I’m going to port to Flask and Google App Engine at the same time.

Another thing I want to do is add a JavaScript “social” layer over the top, allowing some of the richness of FluidDB to shine through and allow the addition of functionality not originally envisaged. I’m also hoping people will tag bands with ratings, annotations and the like with a hope to making recommendations possible.

In a similar way, I want the application code to be reusable and reskinnable so people can customise and create their own starting point. Maybe someone will produce a Classical In Your Ear in the future?

Source code is available from BitBucket, if you fancy a giggle at the clumsy bits: http://bitbucket.org/metaljoe/fluidinyourear – released under the GNU Affero GPL.

EuroPython 2010

July 24, 2010

Well, I’m back from another great EuroPython – my thanks to John Pinner and his EuroPython crew, the presenters, Josette from O’Reilly, and all my fellow delegates. Regrettably, I might not be able to make the next EuroPython in the wonderful city of Florence, but I might be able to make it to PyCon AU in Sydney next year. Fingers crossed on that one – I might even pluck up the courage to do a talk!

Monday

Monday’s talks kicked off with Raymond Hettinger’s “Idiomatic Python”. He guaranteed everyone would learn something new and he definitely delivered. I walked away with some interesting gems of Pythonic knowledge, from basic stuff like enumerate() through to more advancing knowledge like the surprising behaviour of super().

Ezio Melotti followed after the break with a useful overview of the Python development process, which inspired me to join the Python Core sprint on Friday. I’ve already contributed my first couple of patches to Python 3.2 – the first of which has been committed to trunk already.

“How Import Works” was a run through by Brett Cannon on how, you guessed it, Python imports code. Although you’ll probably never need to modify the standard import process, it’s fascinating to discover how it all works. Best of all, new utilities in Python 3 make it much easier to customise things safely if you ever need to.

“PostgreSQL’s Python Soup” was a little disappointing, with a good chunk of the talk being a general one about relational databases. The interesting stuff about connecting to PostgreSQL came quite near the end when we were introduced to the many different connectors. And I mean many.

Monstrum is a new HTTP functional testing framework for Python 3, and was detailed in “Testing HTTP Apps with Python3″. It looks very cool, and the logo would make a great t-shirt! Continuing the web testing theme, Raymond Hettinger followed up with “Python & Selenium Testing”. Alongside an overview of Selenium, Raymond introduced Sauce Labs, a company providing commercial support for Selenium, the web testing framework, via their cloud service.

To round off the day, I attended two talks by Richard Watts and Tony Ibbs of Kynesim who presented Muddle, their open source build system which looks very cool, and KBUS which is an elegant and lightweight messaging system implemented as a Linux kernel extension.

Tuesday

The day’s talks began with a rather packed presentation by Guido on AppStats, an AppEngine monitoring tool. I must admit that I did partially go to hear him talk, as did probably a lot of other people, but also because I was interested to learn more about AppEngine. I picked up some interesting bits of information about AppEngine internals, and the tool looks fantastic. However, I did feel a bit out of my depth.

Bart Demeulenaere’s “Pyradiso Rapid Software Development” was a call to arms, a request and discussion on how to create a rapid application development framework in Python that supports the multicore world out-of-the-box.

Engaging both my computer science and art historical sides, I then attended Richard Barrett-Small’s “The Trojan Snake”. Richard works for the Victoria & Albert Museum, and has been instrumental in introducing Python and Django as a way to clean-up and replace a mess of bespoke PHP, ASP.NET and Java. An interesting look at how an organisation faced issues with bespoke applications in a variety of technologies, and found Python to be a flexible and effective solution. It’s a shame more people didn’t attend.

I was an avid reader of Michele Simionato’s blog posts, covering his experiences with Scheme, so it was great to hear him speak at EuroPython with his talk entitled “Intro to Functional Programming”. Functional programming is receiving a surge of interest amongst programmers, thanks to the search for better ways to deal with concurrency, so it was good for someone to cover other reasons to start exploring FP.

“Tickery, Pyjamas & FluidDB” by Terry Jones was a run through of Tickery, a FluidDB application for analysing Twitter friend sets. Regular readers of this blog will know that I’ve been interested in FluidDB since attending an introductory talk by Nicholas Tollervey. Definitely worth a look.

Rounding out the end of the day were Andrew Godwin’s “Fun with Databases and Django” and Russel Winder’s “Pythonic Parallelism with CSP”. Andrew’s talk provided a general overview of some of the more advanced features of the Django ORM, including Django’s interaction with non-relational databases such as MongoDB and CouchDB. Meanwhile, Russel Winder is one of the key players in pushing Python support for concurrent and parallel programming. CSP has been around for over 30 years, and now Python programmers can begin to take advantage of the ideas thanks to projects like Python-CSP and PyCSP.

Wednesday

Wednesday kicked off with Nicholas Tollervey’s “Organise a Python Code Dojo!”. Nicholas shared his experiences running the London Code Dojo – the joys, the successes… and the things that didn’t quite go to plan.

“HotPy – A comparison” was Mark Shannon’s comparison between three different VMs for Python: PyPy, Unladen Swallow and his own HotPy project which runs on top of the GVMT.

Henrik Vendelbo gave two short talks in succession. The first was “Real Time Websites with Python” which gave a simple example of the Tornado web server, released as open source by FriendFeed. The second was “Custom Runtimes with PyPy”, a very interesting talk on using PyPy to provide an easy way to bundle up a custom Python application as a self-contained app.

qooxdoo is a full-featured JavaScript framework for developing Rich Internet Applications. It’s quite an impressive piece of work, not only because of the sheer size of the code base. Managing large code bases can be a headache, especially if you also need to process that source to produce more compact, optimised versions for delivery to specific browsers. In “Python for Javascript Apps”, Thomas Herchenröder explained how they use a set of Python tools to make dealing with the code a much more pleasant experience.

“A Python’s DNA” by Erik Groeneveld illustrated an issue I’ve hit myself: how to configure a complex component-based system in a simple, maintainable way. The answer Erik came up with for Meresco is not to use XML or similar to express configurations, but use Python instead to provide a rich, flexible, and elegant way to define dependencies, data flows and configurations.

It’s not often a talk begins by telling you that the subject of the talk is now officially dead, and a new project has taken over. Kay Schlühr’s “The Trails of EasyExtend” quickly changed to “The Trails of LangScape”. LangScape evolved from a toolkit for writing language extensions to Python, but takes a much broader view with the aim of allowing easy ways to extend many different languages.

Thursday

The morning began with “Building a python web app”. Anthony Theocharis and Nathan Wright took us through their journey to find the right Python web framework to implement MediaCore. They discussed Django, TurboGears and Pylons, illustrating their experiences with each and why they settled on Pylons. The different approaches to database access and templating languages were also examined in brief, followed by how they open sourced the project. An entertaining and interesting talk, refreshing by the fact they apparently came from a non-Python background so lacked any preconceptions.

Next up, Denis Bilenko introduced gevent in his talk “gevent network library”. gevent is a network library for handling large numbers of connections efficiently and, more importantly, elegantly. Optional, seamless monkey patching of standard library modules makes it very easy to work with existing code and libraries. I also discovered Gunicorn a Python implementation of Ruby’s Unicorn webserver.

Finally, my selection of conference talks closed with “Arduino and Python” by Michael Sparks. The talk should probably have been renamed “How not to blow up your computer” as Michael crammed in a quick intro to electricity and electronics into a few slides to dispel concerns about inexpensive pieces of electronics destroying your computer. The talk was illustrated by his prototype of a rather interesting TV remote control developed at a BBC R&D workshop, built with an Arduino… and Python of course.

Friday

Big thanks to Brett Cannon and Richard Jones for putting up with me all day while I got to grips with downloading, building, testing and patching Python 3.2. It was a really enjoyable day, and very satisfying to have a patch committed to the Python source code – no matter how small.

Django Templates and Emails

October 22, 2009

A Django application I’ve been working on sends a confirmation email near the end of a workflow. The content of that email closely matches the final web page of the workflow. Constructing an email in Django is pretty straightforward, and using templates makes the process a lot easier and more flexible.

My naive initial approach was to construct my HTML in a string, using Python’s string formatting, and then use the string as the body of the email. That was a reasonable starting point, but it sucked because it’s messy, it duplicates functionality and frankly it’s a waste of the power Django has to offer.

The next step was to harness templates to generate my content for the email. I actually need an email with both HTML and plain text content, and I already mentioned that the content is reasonably close to that used elsewhere, so there’s plenty of potential for code reuse.

The django.template.loader module contains the handy render_to_string function which, surprise, allows us to render our template to a string. It parallels the more common render_to_response function found in your view code, so I can supply a template name and a context in the same way I would for my view code: (contrived example follows)

from django.template.loader import render_to_response

html_content = render_to_string( "my_template.html", my_context )

I’ll go into that HTML content shortly, but suffice to say the template can reuse snippets of template code from the rest of my application. I can therefore create reusable content that is embeddable in both my web page and email body. Likewise, I can generate a text version of the body, though reusing HTML snippets is obviously not so easy or even appropriate:

text_content = render_to_string( "my_template.txt", my_context )

I can then take these strings and use them to construct my email:

from django.core.email import EmailMultiAlternatives

my_mail = EmailMultiAlternatives( subject="Foo Confirmed",
                                  body=text_content,
                                  from_email="test@foo.bar",
                                  to=[ "recipient@foo.bar" ]
                                )
my_mail.attach_alternative( html_content, "text/html" )

Now, back to the HTML content. There are a few things to be aware of with HTML in email:

  • Keep the <head></head> section empty as the content is likely to be ignored and could even be stripped out
  • Use a <style></style> section after your <body> tag to inline your CSS code
  • Don’t use JavaScript (do you really need to be told about this?)
  • Inline references to external resources like images are likely to be blocked

Think of the HTML in your email body as needing to be a self-contained environment. I’ve included some online resources at the end of this post which will give you more in-depth information.

That’s fine for CSS, but how do you add inline images to your emails? The answer is to create MIMEImage instances of any images and attach to your email with appropriate Content-ID and Content-Disposition headers. There’s a very simple example at http://www.djangosnippets.org/snippets/1507/ that worked nicely for me.

Hope the above is a useful starting point!

Further reading:

Testing Django Emails

October 18, 2009

In the past, my experience testing the sending of emails from web frameworks had not been a particularly smooth or easy one. There was often a bit of hackery involved, and attempting to do so from a test harness of some sort was often tricky. Fortunately, testing emails from Django (1.0+) test cases is pretty straightforward and very pleasant to use.

When you run the Django testrunner, the runner will temporarily override the SMTPConnection class. Any emails sent from code called by the testrunner will not be sent out, but captured and stored for analysis within your tests.

Let’s say I have some code that will send out a notification email, and I want to test that it works. I could have a test case along the lines of:

class NotificationEmailTest( TestCase ):
    def test_send_notification_email( self ):
        send_notification_email( "Foo!" )
        ## test goes here

But how do I check the email was sent? Well, the overridden SMTPConnection stores emails in the django.core.mail.outbox list. I can therefore check to see if I have one email message in the outbox. This outbox is only available within tests, it does not exist in normal Django operation.

First, I need an import so I can access the outbox:

from django.core import mail

Then I have to revise the above test to include my assertion:

class NotificationEmailTest( TestCase ):
    def test_send_notification_email( self ):
        """ Test send_notification_email sends out one email """
        send_notification_email( "Foo!" )

        self.assertEqual( len(mail.outbox), 1 )

The outbox is guaranteed to be empty at the start of each test, so you don’t need to do anything in terms of setup or teardown.

Testing presence of an email is fine, but we normally want to test the content of a message (otherwise it could be junk!). Each item in the outbox is an instance of EmailMessage, with all the expected attributes. If we wanted to add a second assertion to the above test, say to check the subject, we could do so:

class NotificationEmailTest( TestCase ):
    def test_send_notification_email( self ):
        """ Test send_notification_email sends out one email """
        send_notification_email( "Foo!" )

        self.assertEqual( len(mail.outbox), 1 )
        self.assertEqual( mail.outbox[0].subject, "Foo!" )

In this case, the single parameter for send_notification_email is actually the subject of the email, so my test checks to see if this matches.

When testing the contents of to and bcc, there is apparently no cc, be aware these are tuples of addresses when writing your tests.

class NotificationEmailTest( TestCase ):
    def test_send_notification_email( self ):
        """ Test send_notification_email sends out one email """
        send_notification_email( "Foo!" )

        self.assertEqual( len(mail.outbox), 1 )
        self.assertEqual( mail.outbox[0].to, [ "test@foo.bar" ] )

This is fine if we have one recipient, but if our recipient list is going to have multiple addresses we need to allow for this in our tests. If we knew the exact ordering we could compare to a list but, unless the ordering matters, we end up making the test fragile. Instead, we can test for the presence of email addresses individually:

        self.assertTrue( "test@foo.bar" in mail.outbox[0].to )
        self.assertTrue( "wibble@amiga.foo" in mail.outbox[0].to )

Django makes testing of emails easy, so there’s no excuse for not incorporating email tests into your Django test suites.

Hope that gives you a starting point. Have fun!

Further reading:

Nose-Driven Development

September 15, 2009

One of the best phrases I’ve encountered in my professional programming career is “smelly code”. It’s great: it sounds childish enough to make me smile, yet also highlights an important indicator in code quality.

Like many, I first encountered the concept of code smells in Martin Fowler’s essential “Refactoring” book, although the term was coined by Kent Beck and predates the book. We’ve all seen examples, and even contributed our own aroma to code. Code duplication; excessively large classes, methods, functions, modules; unreadable code; tightly-coupled code; fragile code; overly complex code; code that circumvents encapsulation.

Over the years, I’ve picked up a nose for code smells. I get a feel for code that is starting to smell, and usually get that prick of guilt when my own code begins to emit an unpleasant fragrance, even if I don’t always know immediately how to deal with it. Deodorant comments like “# REFACTOR: this feels wrong” or “// REFACTOR: this code is ugly!” start to crop up as a warning sign that things need to be looked at, if immediate action can’t be taken.

The nose kicked in this afternoon while working on a Django view. Many smells start off with the best of intentions, and this was no exception. In order to make my tests pass, I started off with a fairly harmless bit of code that interacted with a form object. Over a period of a few tests, the code began to expand and some duplication kicked in. List comprehensions entered into the equation. The nose began to kick in. I can’t show the code, unfortunately, and I’m too lazy to concoct some example code that illustrates my point, so bear with me…

Once all the functionality I needed had been implemented and the code checked-in, I cast an eye over the code and realised it looked ugly. The warning signs were there:

  • two almost-identical list comprehensions
  • temporary variables used to make the list comprehensions more readable
  • several lines to do something which should be simple
  • the need to re-read the code more than once to “see” what it does

*facepalm* What was I thinking when I wrote that??

First step was to extract the code into a utility function, in the hope of making the view code more readable and make it easier to eliminate the duplication. I find extracting code can be an excellent first step in dealing with ugly code, but it has to be used responsibly: otherwise you end up moving the smells into another layer of abstraction.

In the process of creating the function, another smell became apparent:

  • borderline intimate knowledge of the form class

This smell instantly pointed out where and how to refactor the code. The extraction was still necessary, but instead of creating a utility function in the view it was apparent that the code could be moved into two methods of the form. A quick change and a re-run of the tests confirmed that the extraction worked and the view instantly looked simpler and more readable. However, the new methods still didn’t quite feel right.

Next step was to take the list comprehensions and break them out into multi-line for loops. It might sound counter-intuitive to take code that occupies one line (albeit with preparatory code) and turn it into four or five lines of code, but the result was for the best. The intent of the code became clearer and the need for those temporary variables was removed.

All that remains is a minor whiff, a single bit of hardcoding used to identify particular fields in the form. My nose is telling me that introspection will lead me down the path of perfumed code…

Further reading:

Also of note:

(I’ve not tried this yet – just found it!)

I’ve been playing with Django for a while now, but today was the first day writing proper production code which will end up on a public-facing website. It will be the first Django application in use by the company, so there’s a lot riding on getting it right.

Django is great for writing web applications quickly and easily – whenever I hit upon a problem, I find something in Django that already does what I need, or I can implement something small in Python that fills in the gap.

One really nice feature is that it’s easy to test Django using both of Python’s in-built testing frameworks: unittest and doctest. I’m not a fan of doctests, though I can see why some people find them useful. I do love unittest though, and so it’s great that Django has provided enhancements to unittest to support things like basic browser client tests, loading test data fixtures and using a test database (sqlite is great for this).

The client doesn’t aim to replace dedicated web test systems such as Selenium, Windmill or Twill. It just offers a lightweight way to test your Django application’s functionality out-of-the-box. You can check response codes, test which template is being rendered, fire test data to forms, or check for text in returned pages. This was very useful today when fleshing out the first part of the application: a form with various fields, buttons, drop-downs and validation logic. I could write my tests first, a bit at a time, then implement the functionality to pass the tests. Supported by the confidence given by the tests, I could check the specified logic was correct, keep refactoring the code, and keep the design supple at the same time.

Of course, I will eventually need to invest in dedicated web testing, probably Selenium, but for the moment the Django TestCase gives me a very quick and flexible way to check functionality while things are changing frequently. Django rocks!

Further reading:

Follow

Get every new post delivered to your Inbox.

Join 431 other followers