(This is based on a 15 minute talk for the London Python Code Dojo – slides available from SlideShare)
My interest in FluidDB began earlier this year when I attended a talk by Nicholas Tollervey at the London Clojure Dojo. I was expecting yet another talk about yet another non-relational database, but what I discovered was something different. The idea of a shared database storing “things” which anyone could tag with data seemed to be a rather powerful concept, yet simple and elegant. I thought it was a very cool and interesting idea.
But there was a problem.
How could people actually explore the “Fluidverse”? While people using FluidDB are building up conventions, such as the naming and content of tags, and there are tools out there to drill down through the hierarchies of tags and namespaces… there had to be an easier way to find the tags that were of interest to me. I decided I needed data in order to begin finding ways to explore… but what data?
So, while pondering the idea one evening, I was listening to the band Napalm Death and realised I had the answer. One of the things I love to do is find new bands, particularly extreme metal ones, and one way I do this is follow links between bands on Wikipedia. Wikipedia is a great source of band biographies, the content is under a Creative Commons license, and the band biographies often have lists of related bands and genres.
This seemed like a really good starting point to take data I’m interested in, build relationships between the data and give me something to start exploring with. I hacked together a scraper which used Napalm Death as a starting point and branched outwards in a “six degrees of separation” way, initially dumping the information directly into the FluidDB sandbox.
After a few runs, it made more sense to scrape to an intermediate file, and load that instead – allowing me to clean up typos, adjust names, amend tags and also allow me to regenerate the data in FluidDB’s sandbox without having to keep hitting Wikipedia. An example of the output format is as follows:
band:Burzum metaljoe/music/band_name = Burzum metaljoe/music/source_url = http://en.wikipedia.org/wiki/Burzum metaljoe/music/genre/black_metal -> Black metal metaljoe/music/genre/dark_ambient -> Dark ambient metaljoe/music/related_bands = ['Darkthrone', 'Mayhem', 'Old Funeral']
I wasn’t planning to release the source code, but have had some interest in it so I’ve decided to release it under the MIT license. You can find the code on my BitBucket account: http://bitbucket.org/metaljoe/fluidinyourear-scraper – note this is just the scraper code, not the loader.
With the data in place, I then needed to build something for exploring the relationships in the data. Enter “Fluid In Your Ear”, a very simple web application built around Python, Django and the excellent FOM (Fluid Object Manager) created by Ali Afshar. Given the nature of the bands, there is also a liberal application of Heavy Metal Umlauts – the power of which, courtesy of a particular Black Metal band, managed to crash the FluidDB sandbox a few times by exposing a unicode bug.
The application is deliberately very simple. I’m not a graphics genius (painting with real acrylic paints is my field), and at the moment it’s a basic core – you can browse genres and bands, and explore relationships between the two. I’ve already discovered some new bands through following the links, and re-discovered some older ones.
Due to the six-degrees nature, there is quite a lot that doesn’t fit into a metal or punk category which is quite cool. I’ve encountered a jazz musician called John Zorn who has crossed into hardcore punk and grindcore, to produce some outstanding music I would probably not have found before.
The source code is pretty grotty and the first casualty was a lack of tests. Shocking. In order to improve my confidence in the code and make it easier to refactor, I added unit tests using Django’s test harness and some functional testing using the Twill web testing framework. An example of the Twill test code is as follows:
# test missing genre go http://127.0.0.1:8000/genre/progressive_vegetarian_grindcore code 404 # test with trailing slash go http://127.0.0.1:8000/genre/jazz/ code 200 # test without trailing slash go http://127.0.0.1:8000/genre/jazz code 200 # check page contents find '<h2>Jazz</h2>' find '<div id="related_bands">' find '<li><a href="/band/Frank%20Zappa">Frank Zappa</a></li>'
So where next?
Well, first off is to get the application online so my plan was to port to Google App Engine. Unfortunately, I hit a few snags with the fact my app runs Django 1.2 and App Engine is using 1.1. I considered bundling Django in the app, but it became obvious that I’m not really using much of Django’s functionality – some URL routing and templates. The creator of FOM introduced me to Flask, a lightweight web framework, and it looks perfect for my needs. So I’m going to port to Flask and Google App Engine at the same time.
In a similar way, I want the application code to be reusable and reskinnable so people can customise and create their own starting point. Maybe someone will produce a Classical In Your Ear in the future?
Source code is available from BitBucket, if you fancy a giggle at the clumsy bits: http://bitbucket.org/metaljoe/fluidinyourear – released under the GNU Affero GPL.