Fluid In Your Ear: When FluidDB Meets Heavy Metal
October 11, 2010
(This is based on a 15 minute talk for the London Python Code Dojo – slides available from SlideShare)
My interest in FluidDB began earlier this year when I attended a talk by Nicholas Tollervey at the London Clojure Dojo. I was expecting yet another talk about yet another non-relational database, but what I discovered was something different. The idea of a shared database storing “things” which anyone could tag with data seemed to be a rather powerful concept, yet simple and elegant. I thought it was a very cool and interesting idea.
But there was a problem.
How could people actually explore the “Fluidverse”? While people using FluidDB are building up conventions, such as the naming and content of tags, and there are tools out there to drill down through the hierarchies of tags and namespaces… there had to be an easier way to find the tags that were of interest to me. I decided I needed data in order to begin finding ways to explore… but what data?
So, while pondering the idea one evening, I was listening to the band Napalm Death and realised I had the answer. One of the things I love to do is find new bands, particularly extreme metal ones, and one way I do this is follow links between bands on Wikipedia. Wikipedia is a great source of band biographies, the content is under a Creative Commons license, and the band biographies often have lists of related bands and genres.
This seemed like a really good starting point to take data I’m interested in, build relationships between the data and give me something to start exploring with. I hacked together a scraper which used Napalm Death as a starting point and branched outwards in a “six degrees of separation” way, initially dumping the information directly into the FluidDB sandbox.
After a few runs, it made more sense to scrape to an intermediate file, and load that instead – allowing me to clean up typos, adjust names, amend tags and also allow me to regenerate the data in FluidDB’s sandbox without having to keep hitting Wikipedia. An example of the output format is as follows:
band:Burzum metaljoe/music/band_name = Burzum metaljoe/music/source_url = http://en.wikipedia.org/wiki/Burzum metaljoe/music/genre/black_metal -> Black metal metaljoe/music/genre/dark_ambient -> Dark ambient metaljoe/music/related_bands = ['Darkthrone', 'Mayhem', 'Old Funeral']
I wasn’t planning to release the source code, but have had some interest in it so I’ve decided to release it under the MIT license. You can find the code on my BitBucket account: http://bitbucket.org/metaljoe/fluidinyourear-scraper – note this is just the scraper code, not the loader.
With the data in place, I then needed to build something for exploring the relationships in the data. Enter “Fluid In Your Ear”, a very simple web application built around Python, Django and the excellent FOM (Fluid Object Manager) created by Ali Afshar. Given the nature of the bands, there is also a liberal application of Heavy Metal Umlauts – the power of which, courtesy of a particular Black Metal band, managed to crash the FluidDB sandbox a few times by exposing a unicode bug.
The application is deliberately very simple. I’m not a graphics genius (painting with real acrylic paints is my field), and at the moment it’s a basic core – you can browse genres and bands, and explore relationships between the two. I’ve already discovered some new bands through following the links, and re-discovered some older ones.
Due to the six-degrees nature, there is quite a lot that doesn’t fit into a metal or punk category which is quite cool. I’ve encountered a jazz musician called John Zorn who has crossed into hardcore punk and grindcore, to produce some outstanding music I would probably not have found before.
The source code is pretty grotty and the first casualty was a lack of tests. Shocking. In order to improve my confidence in the code and make it easier to refactor, I added unit tests using Django’s test harness and some functional testing using the Twill web testing framework. An example of the Twill test code is as follows:
# test missing genre go http://127.0.0.1:8000/genre/progressive_vegetarian_grindcore code 404 # test with trailing slash go http://127.0.0.1:8000/genre/jazz/ code 200 # test without trailing slash go http://127.0.0.1:8000/genre/jazz code 200 # check page contents find '<h2>Jazz</h2>' find '<div id="related_bands">' find '<li><a href="/band/Frank%20Zappa">Frank Zappa</a></li>'
So where next?
Well, first off is to get the application online so my plan was to port to Google App Engine. Unfortunately, I hit a few snags with the fact my app runs Django 1.2 and App Engine is using 1.1. I considered bundling Django in the app, but it became obvious that I’m not really using much of Django’s functionality – some URL routing and templates. The creator of FOM introduced me to Flask, a lightweight web framework, and it looks perfect for my needs. So I’m going to port to Flask and Google App Engine at the same time.
Another thing I want to do is add a JavaScript “social” layer over the top, allowing some of the richness of FluidDB to shine through and allow the addition of functionality not originally envisaged. I’m also hoping people will tag bands with ratings, annotations and the like with a hope to making recommendations possible.
In a similar way, I want the application code to be reusable and reskinnable so people can customise and create their own starting point. Maybe someone will produce a Classical In Your Ear in the future?
Source code is available from BitBucket, if you fancy a giggle at the clumsy bits: http://bitbucket.org/metaljoe/fluidinyourear – released under the GNU Affero GPL.
FluidDB Objects in 10 Minutes
July 14, 2010
When looking at FluidDB‘s tags (see FluidDB Tags in 10 Minutes), we briefly touched upon FOM‘s Object class (not to be confused with Python’s native object class). I instantiated an object representing the planet Mars:
>>> from fom.mapping import Object >>> mars = Object(about="planet:Mars") >>> mars <Object planet:Mars>
The thing to note is that I supplied the about keyword. By instantiating this way, I have no way of knowing if it existed before or not – the appropriate FluidDB object is created if one doesn’t exist already. In the big scheme of things it probably doesn’t matter, but it’s important to be aware.
The about keyword sets the value of a useful tag that is often assigned to objects when created: fluiddb/about. The about tag is used to label an object with something descriptive. It can be anything, but certain conventions are slowly evolving amongst FluidDB developers and users.
For example, an application I am developing uses the convention “band:<band name>” to label an object as representing a particular musical band. Nicholas J Radcliffe has written more on the subject on his About Tag blog which is a recommended read.
Let’s take a look at that tag:
>>> mars.get("fluiddb/about")
('planet:Mars', 'application/vnd.fluiddb.value+json')
An optional about tag is very useful but not all objects have, or indeed need, one. Fortunately, we can also refer to any object by its unique GUID which is guaranteed to exist:
>>> mars.uid '276da99e-c8d9-42c9-99ae-3db69a5e9ef0'
If I wanted to load an object by its GUID, I simply supply the uid parameter (it’s a good job I know the ID!):
>>> red_planet = Object( uid='276da99e-c8d9-42c9-99ae-3db69a5e9ef0' )
>>> red_planet
<Object planet:Mars>
>>> red_planet.get("fluiddb/about")
('planet:Mars', 'application/vnd.fluiddb.value+json')
Tags form the real power of FluidDB, so it’s important to be able to determine what tags have been added to an object. Let’s take our Mars example first:
>>> red_planet.tag_paths ['metaljoe/foo', 'fluiddb/about'] >>> red_planet.tags [<fom.mapping.Tag object at 0x57edb0>, <fom.mapping.Tag object at 0x57fd10>]
The first attribute stores the tag names, while the second returns these tags instantiated as FOM Tag objects. We can check they are indeed proper Tag instances by peeking at the descriptions:
>>> [ tag.description for tag in red_planet.tags ] ['behold: the foo tag', 'A description of what an object is about.']
I can also test for the presence of a tag by using the has() method:
>>> red_planet.has( "metaljoe/foo" ) True >>> red_planet.has( "the_mekon/likes" ) False
URLs were briefly mentioned in an earlier post about tags – each tag has a corresponding URL. FluidDB’s RESTful API means each object in the system also has a unique URL. For example, on the FluidDB sandbox instance our planet:Mars object can be referenced by the URL http://sandbox.fluidinfo.com/objects/276da99e-c8d9-42c9-99ae-3db69a5e9ef0 – pointing your browser to that URL returns a little bit of JSON:
{"tagPaths": ["metaljoe\/foo", "fluiddb\/about"]}
An object’s tags can also be referenced via URLs. For example, the fluiddb/about tag is referenced by the following URL:
http://sandbox.fluidinfo.com/objects/276da99e-c8d9-42c9-99ae-3db69a5e9ef0/fluiddb/about
Which yields the contents of the tag for that object:
"planet:Mars"
Note that a browser will be making GET requests to these resources. Being RESTful, we can make full use of different HTTP methods (GET, PUT, POST, DELETE) to perform reading, creating, updating and deleting of our FluidDB resources, but that’s a subject for another time.
FluidDB Tags in 10 Minutes
July 7, 2010
So, with your namespaces structured it’s now time to add tags.
I’ll assume you already have a FluidDB session open. To add a tag to your current namespace:
>>> namespace.tag_names
[]
>>> namespace.create_tag( "foo", "behold: the foo tag", False )
<FluidResponse (201, 'application/json', None, {'id': 'cac795ce-777f-4d64-94cc-6df5356eb651', 'URI': 'http://fluiddb.fluidinfo.com/tags/metaljoe/foo'})>
>>> namespace.tag_names
['foo']
Okay, let’s take a look at that in more detail. The create_tag() method has three arguments: the tag name, the tag description and a mysterious boolean value at the end. The last value specifies whether the tag should be indexed – in this case, I have left it unindexed. At present, I’m not sure how the indexing works or if it gives any noticeable advantages at present. I should really make an effort to find out….
When my tag has been created, FOM returns a response from FluidDB. For those not fluent in HTTP, 201 is the Created response status: my tag was successfully created. You can also see that FluidDB has assigned my tag a unique GUID, and that the URI of my tag is http://fluiddb.fluidinfo.com/tags/metaljoe/foo – everything in FluidDB can be referenced by a URL, FluidDB’s RESTful API, and everything has a GUID.
>>> namespace.tag_paths ['metaljoe/foo']
Yup, that looks pretty conclusive.
Having a tag defined is one thing, the power of that tag comes when you apply it to objects in the system. Let’s find a suitable object first:
>>> from fom.mapping import Object >>> mars = Object(about="planet:Mars") >>> mars <Object planet:Mars>
Tagging the object is trivial:
>>> mars.set( "metaljoe/foo", None )
As is retrieving the value of that tag:
>>> mars.get( "metaljoe/foo" ) (None, 'application/vnd.fluiddb.value+json')
Now, setting the value to None is pretty dull. How about we set it to something more interesting?
>>> mars.set( "metaljoe/foo", "The red planet" )
>>> mars.get( "metaljoe/foo" )
('The red planet', 'application/vnd.fluiddb.value+json')
Actually, we might not want the value set in the system as a FluidDB JSON value. Let’s try setting it to the text/plain MIME type:
>>> mars.set( "metaljoe/foo", "The red planet", "text/plain" )
>>> mars.get( "metaljoe/foo" )
('The red planet', 'text/plain')
Python lists can also be stored easily enough:
>>> mars.set( "metaljoe/foo", ["The red planet", "My favourite planet"] ) >>> mars.get( "metaljoe/foo" ) (['My favourite planet', 'The red planet'], 'application/vnd.fluiddb.value+json')
Finally, tags have their own class in FOM:
>>> from fom.mapping import Tag >>> tag = Tag( "metaljoe/foo" ) >>> tag <fom.mapping.Tag object at 0x5b62d0> >>> tag.description 'behold: the foo tag'
FluidDB Namespaces in 10 Minutes
July 4, 2010
I’ve been working on a pet project with FluidDB recently and thought I’d post a few observations.
FluidDB is the core technology being developed by a company called FluidInfo. It’s in alpha test and the general idea is a central (I’m loathe to use the word “cloud”) database populated by objects. That’s objects as in things rather than the object oriented programming concept of object. Objects are tagged by users, and those tags define information and relationships between objects.
Each user can build up and tear down their selection of available tags, arranging these tags in a series of namespaces. For example, I have a namespace defined for an application I’m building:
metaljoe/music
That namespace consists of my root namespace (me, my user name) with a child namespace entitled music. There are no rules on how people structure their namespaces – I could easily call it metaljoe/aural or metaljoe/rhubarb if I wanted. As FluidDB evolves, conventions will naturally arise as people, hopefully, find certain naming styles preferable or popular. For the moment, anything goes.
First, I need to instantiate and bind my proxy to FluidDB. Using The excellent FOM library for Python: (please note that I am using the FluidDB sandbox rather than the live database)
>>> from fom.session import Fluid >>> fluid = Fluid( "http://sandbox.fluidinfo.com" ) >>> fluid.db.client.login( "metaljoe", <password> ) >>> fluid.bind()
I then retrieve my user’s namespace:
>>> from fom.mapping import Namespace >>> namespace = Namespace( u"metaljoe" )
To retrieve the list of child namespaces as either “leaf” names or full paths:
>>> namespace.namespace_names ['music'] >>> namespace.namespace_paths [u'metaljoe/music']
If I want to create a new namespace under the current one, I can do this…
>>> namespace.create_namespace( u"test", u"A test namespace" ) <fom.mapping.Namespace object at 0x743670>
…or if I decide to remove it:
>>> namespace.namespace_names ['music', 'test'] >>> namespace = Namespace( u"metaljoe/test" ) >>> namespace.delete() <FluidResponse (204, 'text/html', None, '')> >>> namespace = Namespace( u"metaljoe" ) >>> namespace.namespace_names ['music']
You’ll notice a little bit of FluidDB’s RESTful API shows through here, because we obtain a 204 (No Content) response code on a successful deletion.
As well as child namespaces, a namespace can have child tags:
>>> namespace = Namespace( u"metaljoe" ) >>> namespace.tag_names []
Hmm, okay, let’s find a better example:
>>> namespace = Namespace( u"metaljoe/music" ) >>> namespace.tag_names ['related_bands', 'source_url', 'band_name']
Or if I want the full path for the tags:
>>> namespace.tag_paths [u'metaljoe/music/related_bands', u'metaljoe/music/source_url', u'metaljoe/music/band_name']
I’ll cover tags another time.
So there we have it, a real quick tour of FluidDB namespaces – add, delete and view namespaces, and find out what child namespaces and tags are present.