January 28, 2012
I try to learn a new programming language each year. In recent years, my bias has been towards a more functional nature with Erlang and Clojure being two very enjoyable and intellectually satisfying choices. I didn’t make much fanfare of my new language for 2011, mainly because it was born of necessity rather than a way to challenge myself. Attendees to the London Python Code Dojo will probably have heard a few things from me, not always in a flattering light.
It’s roughly a year on from when I first started doing serious things with PHP and I thought it might be an excellent time for a quick bit of reflection. One problem with Erlang and Clojure has been that I’ve not directly had to use either in a production environment, or deal with a legacy codebase written in said languages. I say directly, because I’ve been working with RabbitMQ, which is written in Erlang… but that doesn’t really count.
So, first a disclaimer and some background. It’ll seem very negative, so any fans of PHP please bear with me…
I’m not sure when I first encountered PHP, but it was still called Personal Home Page at the time. Wikipedia tells me it was called Personal Home Page Tools, but anyway. I was writing Perl at the time and it didn’t appeal to me. I rediscovered it in 2000 when I spent a day updating someone’s website that was using PHP 3. At the time I was writing Python and Zope code, and it felt horribly backward and just plain wrong. (I said this would seem very negative, bear with me…)
From then on, PHP didn’t really register with me much. Generally, my encounters with PHP were via poorly written and insecure websites, written by people who really didn’t know how to program. This tainted my view of PHP and the associated community quite considerably.
Now, working with the legacy code base, and given my prejudice already, this didn’t initially endear me to PHP. There seemed to be a lot of workarounds and convoluted functions to replicate what comes for free in Python. The old bracing wars of C were back, with different past developers having their own unique style of formatting – and not always lining things up nicely, which makes Pythonistas like me cry. Don’t even get me started on all the includes being pulled in causing all kinds of chaos, the warnings, the functions with lines of the code measured in four digits (I wish I was joking). A true maintenance nightmare.
But nestled amongst the mess were signs of a language that had changed radically from PHP 3, and was actually quite nice once you got to know it. Ever since I discovered Perl, I’ve always preferred dynamic languages because they don’t get in the way of solving problems. They’re quicker to write, easier to test, more adaptable to change, and very powerful. You lose a minor few things in the process, but gain a heck of a lot more. I still feel PHP has a little way to go, but on the whole it doesn’t get in my way too much these days – a little quirky in places, but people find Python and Clojure quirky too. And many people are totally freaked out when they see Erlang.
The PHPUnit testing framework is pretty good. Easy to convert to from Python’s unittest (they’re both part of the xUnit family), and comes with things like mocking and database testing tools as standard. Pretty cool. Reflection was a little clumsy to deal with at first, a little inelegant, but it works fine once you understand it. TDD wasn’t initially part of the company’s workflow but PHPUnit made it very easy to introduce, which was great.
There are three web frameworks that I’ve used/experienced. Symfony is being moved away from at work and I must admit I’m not overly fond of it – but this could just be down to the way it’s been used by bits of the legacy code base. Zend Framework is the main one we’re adopting, and it’s proving pretty good. We’ve had a a few niggles and frustrations, but nothing major except that the documentation seems to be great for introductions, but not very good for the detail. There have been moments when something that should be easy to figure out wasn’t because of the docs – but the great thing about open source is that you can always wade through the code to find the information you need. I think I’ve been spoilt by Django.
Finally, honourable mention goes to the third framework, which we use for some smaller client sites: Kohana. It’s an elegant, lightweight framework, although the documentation could do with some improvements (I feel a little guilty because I was going to find time to contribute some but never did). Worth a look, so giving it some free PR.
Although Python will remain my main language of choice, and I’ll be sticking to Django and Flask for my own projects, I can’t be too dismissive of PHP and PHP developers these days (although seriously guys, who thought adding goto to PHP 5.3 was a good idea?*). The language has matured into a respectable dynamic language, and frameworks like Kohana and Zend deserve particular mention. There seems to be a lot of support and encouragement within the community for good practices like automated testing and better structuring of code which is great to see.
* Although the docs do warn you of Velociraptors:
(scroll down a bit)
So there you go: PHP ain’t that bad after all.
September 26, 2011
I attended PyCon UK 2011 at the weekend, which was loosely run as an unconference. As I was about to start writing about the conference, I realised I haven’t written anything about PyCon AU last month in Sydney, which I also had the pleasure to attend. Two conferences in
two months! I’ll try to mention my Australian experiences in another blogpost, but will mention some items of relevance in this post…
Once again, John Pinner and his trusty team put on an excellent event. The TechnoCentre in Coventry provided a superb venue, and the on-site catering was very good. You couldn’t go anywhere without finding a water cooler, fine teas, coffee and other beverages. Delegates were fuelled up in the mornings with breakfast baps and pastries, offered a decent lunch (especially on the Sunday) and served a fine conference dinner on the Saturday evening.
It’s been a good year for keynotes for me: PyCon AU had Audrey Roy, Mary Gardiner and Raymond Hettinger. Audrey’s talk on diversity was particularly important, emphasising that diversity isn’t just about encouraging more women into computing. Worthy though that goal is.
PyCon UK had keynotes from Allison Randal, Laura Creighton and Lorna Jane Mitchell. Allison kicked things off with “The Fallacy of the Zero-Sum Game”, discussing why free software is the future of technology because the values of cooperation and collaboration provide a better environment for improving technology. Even though I was clutching my freedom-hating MacBook, free/open source software is something I believe in for the same reasons Allison does.
Laura Creighton is always a joy to watch in action. “Reflections on the work of Sociologist Charles Perrow, and what he can teach software developers” did seem to leave a few people in the audience scratching their heads, but I really enjoyed it. The general idea is understanding the lessons learnt by Charles Perrow on analysing “risky systems” and applying to software development. He knows a bit about this through his study of accidents such as Three Mile Island. I’ve discoverd that simple things are more complicated than I first thought, and that tightly-coupled, complex and unpredictable systems need to conform to two mutually incompatible states to be controlled and understood. Which is why they are both very risky and lead to catastrophic accidents when things go wrong: aircraft crashes, spacecraft breakups, nuclear reactor explosions. The sort of thing you don’t want to be involved in.
It’s one of the nice things about the Python community that PHP developer Lorna Jane Mitchell can be welcomed wholeheartedly. I don’t have her talk title to hand, but she conducted a wonderful, slide-free talk about contributing to open source projects and the benefits outside the project that can result: job offers, recognition, personal improvement. It was well-timed, as I was about to undertake my first conference talk…
I took the stage for the Testing Workshop alongside Michael Foord. I wasn’t quite sure what level to pitch the workshop at, or even what to cover. I’d abandoned the tutorial style I’d originally thought about, although there was a bit of legacy from that era in some of my slides. It was on researching the subject that I’d realised a fundamental issue with testing: no one really agrees on the terminology, and most books and training material for developers don’t really cover it in much breadth. I took a gamble and decided to start from basics, to cover the widest possible audience and to try and get people thinking about the types of testing and the terminology used. I also opted to cover the very basics of Python testing, ditching some material on TDD and dealing with legacy code.
As I was leading up to the presentation, I was beginning to think it might’ve been the wrong decision when I looked out at a very large audience and spotted some very knowledge people in the Python community looking back at me. Uh oh. I did give a clear warning, but no one left which was a good sign. Also, despite preparing my set-up in advance, there were some additional technical hurdles to overcome – notably, coordinating with the live streaming! At least the internets don’t heckle.
My first section covered what testing is and the types of testing. The second section was really more of an overview of testing with Python and so most of the audience knew the options already – but important to ensure everyone is aware. However, it lead to some interesting questions and responses from the audience, which was good. I then handed over to Michael who covered his Mock library and served the audience with some much-needed in-depth material. Jonathan Hartley followed-up with a talk on modifying the Django test runner, a modification I wish I’d had earlier this year while debugging a nasty issue with test fixtures and multiple databases.
Afterwards, I read some very nice tweets in support of the subject matter being presented, and I had some good bits of feedback post-talk. I’ve spoken a couple of times at the London Python Code Dojo, but have had the advantages of a smaller audience, smaller venue and a subject of my choice – not much different to when I stand up in front of a small group of people and teach them to snowboard. The venue was definitely a lot bigger, the audience a lot bigger and strangely the podium and lectern made me feel the most awkward. The first five minutes my throat was the driest it has probably ever been and I then worried I’d spill my bottle of water over the laptop and other electronics surrounding me!
In the end, I rather enjoyed it. To use a phrase from Russel Winder, I’ve picked up the bug. Even while I thought I was failing badly, I would look out across the audience and see someone smiling, nodding their head in agreement or looking like they’d just discovered something new. Which is all that matters. If you attended, I’d like to say thank you for your support and hope you came away with something of use.
Next year, I’m toying with either a talk on beginning Test Driven Development or something a bit more technical: common patterns for testing legacy code. Although I’ve been dealing with PHP legacy code recently, I’ll aim to keep it in Python. The principles are fairly similar across the dynamic languages.
Turning to other matters, teaching school kids to program was a recurring theme during the weekend. The BBC Micro featured prominently in discussions, as did the new Raspberry Pi – which is a wonderful piece of kit scheduled to cost just £25.
I discovered programming through the BBC Micro at school. I’m not sure the exact age I was, but I was probably about 5 or 6 and the Beeb had just been made available in schools. When we weren’t playing educational games, we could fire up the BASIC interpreter and do some very simple things. However, it wasn’t until my parents bought me a Commodore Plus/4 a couple of years later that things really took off for me. Doing it as a career wasn’t something that occurred to me until university, or just before. It was always a fun hobby.
Logo on the RM Nimbus came a bit later, and I remember returning to my old primary school a few years later and teaching a group of very excitable kids to draw on the screen using Logo. Kids of all abilities loved it, and my hope is that one of those kids ended up programming.
Back in my day (now I sound really old), computing wasn’t on the syllabus because computers were new, a bit of a novelty and the teachers had little or no experience of them. That was a reasonable excuse in the 80s. We live in the 21st century now, and I was horrified to discover recently that computing skills in the classroom consist of learning how to use Word or Powerpoint. The computer is this magic box that comes with games, Facebook and something to type out homework with. If anything, we’ve actually regressed.
At PyCon AU, I attended a talk on the NCSS Challenge which teaches school kids across Australia to program. It even caters to different levels of ability, from complete beginners to those who already have programming skills. You can watch the talk on the technical side from the PyCon AU YouTube channel.
That kind of scheme is just wonderful, and it makes me sad that the UK has nothing similar. Instead, we’re raising a new generation of kids who think a computer is indistinguishable from magic, the software just appears, and everything has a nice, neat Microsoft brand emblazoned across it. We’re not teaching kids to fully explore technology at all levels, to create applications themselves and become producers not consumers. It makes me wonder how many great programmers of the future are not discovering their calling earlier in life?
Anyway, thank you to all the crew at PyCon UK – the organisers, the venue staff, and my fellow delegates. See you all next year!
July 10, 2011
Back in 2004, I took a sabbatical from software development to pursue my passion for snowboarding. It was a decision that lead to a parallel “career” as a snowboard, and later ski, instructor. Aside from a few communication skills, I never really thought the two would intertwine. However, from instructing came an interest in coaching and mentoring, and from those came helpful ideas when I found myself more and more involved with a leadership role.
When I undertook my first training course, in the mountains of New South Wales, there was an important phrase I learnt: Safety, Fun and Learning. In the UK, we use the variation of Safety, Enjoyment and Learning – which gives the acronym SEL (“sell”). We’re selling a suitable framework for a pleasant and productive lesson experience. Each word leads from the next: it’s difficult to enjoy something if you’re not safe, it’s tricky to learn something if you’re not enjoying it.
Safety, Enjoyment, Learning can be applied to many things other than snowboarding or skiing. As a team lead, senior developer or software coach, we also have to be mindful of these three words whether we realise it or not. All three are required to make sure our team has a pleasant and productive experience developing software.
Let’s look at safety first. The risk of injury, avalanches, sunburn or hypothermia is pretty low in software development – if it isn’t, I’d strongly recommend moving to a different office. But safety is more than just about physical danger. I’ve worked with teams where there’s a heavy fear of failure, an atmosphere where making a mistake is a heinous crime. The problem is, people become cautious – they stay inside their comfort zone, they don’t learn or progress, their work remains static. It’s not productive at all – for the team or the customer.
Mistakes are okay. We all make them. The important thing is that people feel they can own up to making a mistake, without retribution, and then take responsibility for correcting that mistake. Mistakes are learning experiences – do them once, try not to make them again. People need to know they can put their hand up and admit they did something wrong safely, and that the team will pull together to help that person take responsibility and correct the problem.
Leading on from this is the knowledge that someone can always ask for help, without feeling they will lose respect or be ridiculed. We all have a moment of stupidity and forget something obvious, we all stare at a problem for hours because our brain has masked the critical (and obvious) cause of the problem. Asking for help is often one of the most difficult things for people to do, so we need to ensure people feel safe enough that they can stick up a hand without fear.
Enjoyment is next. I’ve been very fortunate during my working career that I’ve nearly always worked with teams where we can laugh, joke and gossip – yet still get our work done. On the rare occasions I’ve been in a team that doesn’t have that environment, you certainly notice the difference in productivity – no one really gets anything done, and certainly not to a good standard. A former boss of mine used to say that he didn’t like a quiet office, because it meant no one was working. The social interaction feeds the mood, raises the spirits even when tackling a Project From Hell, and leads to better communication and thus a higher productivity.
Work shouldn’t be a miserable experience, because it becomes a waste of time for everyone – if you don’t enjoy your job, dust off your CV and start looking for one you will enjoy. It was that kind of decision that resulted in me ending up on a mountain on the other side of the world, a decision that both changed my life in a big way. It allowed me to discover a whole new set of skills, but also help me rediscover my love of programming.
And finally learning. If you’re not learning new things each week in your job, something is wrong. It doesn’t have to be big things like a new programming language or framework – it could be a tiny optimisation, a better way of structuring your code, a new bit of business logic, an unusual hobby of the developer sat next to you. As human beings we need to keep growing and learning or we stagnate, and when we stagnate we become less productive. I know far too many programmers who know one programming language, one framework, one operating system. Some even get quite angry that they should learn something new! After all, BASIC and 6502 assembly should be all you could ever need…
I try to learn a new language every year or two: Erlang, PHP, C#, Clojure. Even if you don’t use them for any real world work, sometimes you can gain a new insight or a better way of doing things with your existing toolset. Erlang helped me with concurrency and parallelism, Clojure taught me to write more functionally, PHP makes me appreciate Python more than ever, I wish C# wasn’t tied to .NET/Mono. The definition of “learn” changes each time: I mainly aim to get a feel for the nuances and pros/cons of the language before deciding to stop or carry on.
Code dojos are great fun. Code katas provide useful exercises for developers. Regular retrospectives encourage developers to sit back and think about what they have or haven’t achieved and find new ways to do things. Pair programming provides knowledge transfer. All that gossip and laughter I mentioned earlier can lead to sudden revelations and new ideas. Teams should be mindful of new things, new techniques, making progress in their own skills and knowledge. Programmers should be encouraged to try new things, push their comfort zone wider and improve themselves.
Safety, enjoyment, learning.
Next time you’re with your team, take a good look and ask yourself if these three concepts are being employed successfully. If not, perhaps it’s time you started making some changes?
May 21, 2011
It’s been something like 11 or 12 years since I first heard of the book “Design Patterns: Elements of Reusable Object-Oriented Software”. It was around the time I was beginning to realise there might be something to this idea of object-oriented software development.
I’ve been pondering patterns ever since.
For those not aware of patterns (where have you been?), the idea stems from architecture of the physical building kind. People realised that there was a set of principles that had formed based on common solutions to common problems. You don’t stick a door there, you don’t put a window here, you need X number of columns for this, you need Y whenever you have a Z. You can probably tell why I never became an architect… but you get the idea. It encapsulated and documented a collective knowledge and experience of the process of architecture.
Humans have been building houses, fortresses and temples for thousands of years. That’s a lot of experience and knowledge we’ve accumulated about what we should and shouldn’t do. But when it comes to building software, we’ve only got a few decades under our belts. Much as we’d love to think of it as a mature and sophisticated process, it isn’t. When I left university, I was convinced software development was an engineering process – but the real world taught me that it just doesn’t fit well. It’s not a science either, despite my computer science degree. As a trained artist, I’d like to say it’s an art but that’s a simplification. And then someone mentioned a craft, and that is currently the best way to sum up what I do for a living. But even that isn’t a perfect analogy.
Anyway, I’ve diverged a bit. The “Gang of Four” made an attempt to encapsulate a set of knowledge and experience from software development in the same way architects had done, and that knowledge is present in their book “Design Patterns”. It’s a fascinating book to have on your book shelf and, in my mind, it presents two very important bits of information to programmers: a selection of solved problems and a vocabulary for programmers.
Solved problems are good. There’s no point wasting time re-inventing the wheel unless you can build a much, much better one. The Internet has given us a rich, supplemental way to seek out solutions to problems or advice on how best to approach common situations. The growth of free and open source software gives developers access to the real shared knowledge and literature of programming: the source code. We lead busy lives and the business wants results yesterday, so knowledge of what is a solved problem is good to have because it’s one less thing to worry about and we can concentrate on the other stuff.
The vocabulary aspect is one that often gets overlooked by people talking about patterns. Even before I discovered the ideas behind domain driven design, I’d been interested in the idea of a “ubiquitous language”, a shared glossary between developers, domain experts and users that would allow all three to communicate effectively. Too many software projects fail because of the lack of communication or, perhaps worse, misinterpreted communication. While patterns might not hold to all three types of people mentioned, they do provide a shared terminology for developers to talk to each other about common solutions. At least your developers all understand each other, right?
So patterns are a good thing then? Well, yes and no. “Design Patterns” is a fascinating book to have on your shelf, and that’s ultimately where it resides. I actually read it all the way through just after I bought it and went “great!”. I dipped into it maybe two or three times in the space of a few months afterwards and that was it. It now sits there nestled between “Patterns of Enterprise Application Architecture” and “Refactoring”. Don’t ask about my elaborate book ordering system.
So what’s wrong with patterns? Well, nothing intrinsically. As I said earlier, it provides a repository of shared knowledge/experience in solved problems, and it encourages a shared language between developers. Both of which are good. Trust me.
Let’s take the shared language idea first as it’s a bit of a flimsy argument. Most, but definitely not all, developers I’ve worked with either haven’t heard of patterns or don’t know them off by heart. I know I don’t remember all the names. That’s mainly an education issue, rather than a problem with patterns, but it puts a barrier between people – it can even be taken as a mild form of elitism. You can shove your Flyweight Pattern up your Chain of Responsibility, they say. Maybe.
Even amongst those of us “in the know”, a problem is rarely a pure example of one of the patterns so we end up adding extensions to what we’re saying. In fact, it’s often more comfortable slipping from pure Patternspeak to Domainspeak. It’s not a Command pattern, it’s an Order Request or a Unicorn Disposal. Or we go low-level and discuss language features. The Patternspeak abstraction sits in a middle ground between domain language (understandable by the business) and implementation language (understandable by geeks and compilers).
Defining language and process has an unusual side-effect in that it sets an, often arbitrary, limit on creativity. Much as there is still a school of thought that programming can be defined without the need for creativity, that is nonsense – it’s problem-solving, and problem-solving often cannot be described by a rigid formula or process. I laugh when I hear about ideas such as executable UML, because computer-assisted software engineering was The Next Big Thing when I was at university and we’re still waiting for it to deliver.
The risk with patterns is that developers can become restricted by them – they follow the pattern precisely, or refuse to see options beyond the documented patterns. Again, that’s not a problem with patterns as such, but the way they are used. Actually, it’s a limitation of the human mind – we tend to stay within our comfort zone and often stick to what we know, even when that knowledge doesn’t seem to be beneficial in the circumstances. We become blinkered.
Recently I heard mention of “refactoring to patterns”. I think it was on an episode of DotNetRocks, but I might be wrong. You write your code free from preconceived ideas. Strictly speaking, you write code confined by the limitations of your own experiences and knowledge (or that of your team). Actually, you write code based on the real problem at hand. Then you step back and realise you have a Facade, Memento, Syntax-Directed Translator, Service Stub or whatever, which may or may not be a help when refactoring your work to be cleaner, more elegant or more understandable. Or when discussing the problem with an “in-the-know” colleague.
The other thing I heard a couple of years ago was that patterns are language-dependent – and I’ll chip in the phrase “paradigm-dependent” too. Interesting idea. The clue was the Gang of Four’s subtitle “Elements of Reusable Object-Oriented Software”. Object-oriented development might have the mindshare these days, but C is still one of the most widespread languages and functional languages like Haskell, Erlang, Clojure and F# are rapidly becoming the cool kids on the block. The Gang’s patterns are based on their knowledge and experience of statically-typed, object-oriented languages.
Something that could be a pattern in C++ or Java, might be a simple bit of syntax in something like Python or Ruby. How do object-oriented patterns map to functional or procedural languages? What about for dynamically typed languages? Of course, they have their own patterns – I’ve seen Erlang patterns crop up on the mailing lists, decades of Lisp and Scheme pattern seep into Clojure, and even a minority of fellow Pythonistas claiming patterns don’t exist in Python. Wishful thinking guys – we just have different patterns. When put like that, the shared vocabulary and knowledge provided by patterns becomes more disperse – with maybe a tiny shared core somewhere in the murk.
In conclusion, reviewing my meandering and rambling, patterns are still a good thing because of the shared language and wisdom of solved problems. But they can be troublesome too, if used inappropriately. My copy of the book has “An Invitation” at the end which essentially says patterns are something you build up yourself, with those in the book as a starting point. It was something I’d forgotten about until I dusted off the book to sit beside me as I wrote this. I wonder how many others have forgotten this important piece of advice – or even missed it completely?
Think about the patterns in your own domain, your own choice of technology. Then think about the language you use to describe things to your fellow developers, your domain experts and your customers – are you all understanding each other? Really?
January 23, 2011
Before I start on my 2011 list, it’s time for a quick review. My list for 2010 was:
- Clojure (
- CouchDB (
- Natural Language Toolkit (NLTK) (
- PyGame (
- Twisted (
- XMPP (
I spent the year going to the London Clojure Dojo and I must admit that I do like the language and will continue to learn it. I’ve not had much reason to use it outside of the dojo, but I’m sure something will come up this year.
CouchDB is very impressive and CouchApp makes it pretty easy to develop self-hosted apps in CouchDB. I have an idea for a CouchDB app which I want to work on.
On the PyGame front, I did a fair bit of development early in the year, turning a game idea into… well, a mess. I started off a little too ambitiously and the unfinished result needs a serious bit of refactoring in order to progress further. I’ve got a simpler idea in the same style that I want to work on this year, which should give me a better idea about how to rework and finish the original game.
I spectacularly failed to look at XMPP (other than read the O’Reilly book), Twisted or NLTK (again, other than read the book).
So, for this year:
- PHP (
- Python 3 (
- Celery / RabbitMQ (
- XMPP (
- jQuery (
- Android SDK (
So, PHP is my new language for the year. What gives? Well, the company I work for uses a lot of PHP code. Even though I’ve been recruited for helping migrate PHP and Perl on the back end to Python, the front end PHP code is not going away and it would be useful to be able to roll up my sleeves and help with the maintenance and development work. Unlike Erlang or Clojure, this is just a straight case of learning the syntax, and not really the same challenge of those languages.
Ah, Python 3. The company roadmap is to finish 2011 using Python 3 in production. This is, admittedly, a bit ambitious because not all the Python code we use will be Python 3 compatible (Django springs to mind). I’ve made the personal decision that the final release of Python 3.2 will signal the transition point for my home projects, at least taking into account library support.
I’ve already begun to look at Celery from a very basic point-of-view, but this is a technology we are going to be using more heavily at work this year. It’s actually one of the many things that attracted me to joining the company late last year – because I was never going to get a chance to use this commercially at my last place.
XMPP is still on the list and the London Python Dojo will be using it this year for the inter-dojo game challenge. Looking forward to it!
Finally, time to upgrade my ancient phone to one of these new-fangled “smart” phones. I had a company iPhone for a few months, but never really used it. It’s a nice enough piece of hardware and the UI is slick as you would expect… but it did nothing for me. The Android OS is open source and seems more in tune with what I might use for a phone OS. Although I would rather like an Objective-C SDK for it Actually, it looks like Clojure and Jython are unofficially supported, albeit rather slow.
As well as the new skills above, I’ll be continuing with Clojure, PyGame and CouchDB. I’ll also be getting some commercial Django experience, continuing to explore FluidDB (FluidInYourEar might finally get a public release this year!), the Flask web framework, and hopefully refresh my Erlang skills.
Here’s to 2011 – what technologies are you planning to look at this year?
October 11, 2010
(This is based on a 15 minute talk for the London Python Code Dojo – slides available from SlideShare)
My interest in FluidDB began earlier this year when I attended a talk by Nicholas Tollervey at the London Clojure Dojo. I was expecting yet another talk about yet another non-relational database, but what I discovered was something different. The idea of a shared database storing “things” which anyone could tag with data seemed to be a rather powerful concept, yet simple and elegant. I thought it was a very cool and interesting idea.
But there was a problem.
How could people actually explore the “Fluidverse”? While people using FluidDB are building up conventions, such as the naming and content of tags, and there are tools out there to drill down through the hierarchies of tags and namespaces… there had to be an easier way to find the tags that were of interest to me. I decided I needed data in order to begin finding ways to explore… but what data?
So, while pondering the idea one evening, I was listening to the band Napalm Death and realised I had the answer. One of the things I love to do is find new bands, particularly extreme metal ones, and one way I do this is follow links between bands on Wikipedia. Wikipedia is a great source of band biographies, the content is under a Creative Commons license, and the band biographies often have lists of related bands and genres.
This seemed like a really good starting point to take data I’m interested in, build relationships between the data and give me something to start exploring with. I hacked together a scraper which used Napalm Death as a starting point and branched outwards in a “six degrees of separation” way, initially dumping the information directly into the FluidDB sandbox.
After a few runs, it made more sense to scrape to an intermediate file, and load that instead – allowing me to clean up typos, adjust names, amend tags and also allow me to regenerate the data in FluidDB’s sandbox without having to keep hitting Wikipedia. An example of the output format is as follows:
band:Burzum metaljoe/music/band_name = Burzum metaljoe/music/source_url = http://en.wikipedia.org/wiki/Burzum metaljoe/music/genre/black_metal -> Black metal metaljoe/music/genre/dark_ambient -> Dark ambient metaljoe/music/related_bands = ['Darkthrone', 'Mayhem', 'Old Funeral']
I wasn’t planning to release the source code, but have had some interest in it so I’ve decided to release it under the MIT license. You can find the code on my BitBucket account:
– note this is just the scraper code, not the loader.
With the data in place, I then needed to build something for exploring the relationships in the data. Enter “Fluid In Your Ear”, a very simple web application built around Python, Django and the excellent FOM (Fluid Object Manager) created by Ali Afshar. Given the nature of the bands, there is also a liberal application of Heavy Metal Umlauts – the power of which, courtesy of a particular Black Metal band, managed to crash the FluidDB sandbox a few times by exposing a unicode bug.
The application is deliberately very simple. I’m not a graphics genius (painting with real acrylic paints is my field), and at the moment it’s a basic core – you can browse genres and bands, and explore relationships between the two. I’ve already discovered some new bands through following the links, and re-discovered some older ones.
Due to the six-degrees nature, there is quite a lot that doesn’t fit into a metal or punk category which is quite cool. I’ve encountered a jazz musician called John Zorn who has crossed into hardcore punk and grindcore, to produce some outstanding music I would probably not have found before.
The source code is pretty grotty and the first casualty was a lack of tests. Shocking. In order to improve my confidence in the code and make it easier to refactor, I added unit tests using Django’s test harness and some functional testing using the Twill web testing framework. An example of the Twill test code is as follows:
# test missing genre go http://127.0.0.1:8000/genre/progressive_vegetarian_grindcore code 404 # test with trailing slash go http://127.0.0.1:8000/genre/jazz/ code 200 # test without trailing slash go http://127.0.0.1:8000/genre/jazz code 200 # check page contents find '<h2>Jazz</h2>' find '<div id="related_bands">' find '<li><a href="/band/Frank%20Zappa">Frank Zappa</a></li>'
So where next?
Well, first off is to get the application online so my plan was to port to Google App Engine. Unfortunately, I hit a few snags with the fact my app runs Django 1.2 and App Engine is using 1.1. I considered bundling Django in the app, but it became obvious that I’m not really using much of Django’s functionality – some URL routing and templates. The creator of FOM introduced me to Flask, a lightweight web framework, and it looks perfect for my needs. So I’m going to port to Flask and Google App Engine at the same time.
In a similar way, I want the application code to be reusable and reskinnable so people can customise and create their own starting point. Maybe someone will produce a Classical In Your Ear in the future?
Source code is available from BitBucket, if you fancy a giggle at the clumsy bits:
– released under the GNU Affero GPL.
September 20, 2010
Last year was C# and Erlang, this year was Clojure thanks in part to the excellent London Clojure Code Dojo.
(Bear with me for the next paragraph, and the subjective nature of later bits)
Clojure has been an interesting experience because I have neither a Lisp nor JVM background. I’ve found myself having to deal with prefix notation and a parenthesis overdose, as well as the unfamiliarity of the Java ecosystem. The official documentation is patchy, the language is still new and evolving, dedicated development tools are relatively immature, and there are moments when the JVM leaks through. It’s the first time I’ve felt a little out-of-my-depth when learning a new programming language, which is not necessarily a bad thing.
Yet I keep going to the dojos, I still flick through the book when I get the odd spare moment, I keep improving my development environment and I’m still umming and ahhing between Clojure and Erlang for a possible home project I want to work on next year, once my FluidDB and game ideas have been unleashed upon the world properly.
So why am I still learning Clojure?
I asked myself that question recently. I still disagree with Neal Stephenson’s comment that Lisp “is the only computer language that is beautiful”, but Clojure has helped me to understand why people like Lisp as well as clean up some of the ugliness of the language. I’m sorry guys, but Lisp has always looked pretty ugly to my eyes, despite the lack of boilerplate found in languages like, say, Java or C#. When Clojurians say that it’s Lisp reloaded, they mean it – which has upset some Lisp programmers but pleased others.
There’s a greater sense that people are using Clojure to solve real problems, evolving the language to deal with real issues rather than thought exercises from the ivory towers of academia. The community itself is generally a friendly and positive one with a good mix of people from different backgrounds, not just the Lisp and Java worlds. Being there as the language and community evolves has been fascinating and quite exciting.
Getting back to Clojure’s Lisp origins, I still believe it should be the starting point, not the direction. Clojure needs to find its own way, its own metaphors and patterns – retaining Lisp’s strengths while confidently discarding the legacy baggage.
Take macros, for example: much as I can see the power of them, I think they’re a distraction rather than a tool in this day and age. A controversial view for sure, but I guess I’ve never seen a conclusive argument for their use – indeed, the wisdom seems to be that you shouldn’t really need to use them. It feels like there should be a more contemporary or Clojure-like way of providing the power of macros, without the complexity or obfuscation. I don’t know what that way is yet – but I’m not sure others do either.
Clojure encourages immutable data structures, but acknowledges that the real world doesn’t work that way so a pragmatic language needs to handle mutable state. Clojure implements this in more controlled ways to other languages. My favourite option is software transactional memory. STM is a natural fit to someone who has used relational databases for the last ten years. It feels right for many applications in a concurrent environment, and it’s right there in the language from the start – not hacked in later or provided as a third-party module.
It might not quite fit in my brain like Python, I don’t quite grok the intricacies of the language yet, and the sooner more of Clojure is implemented in Clojure the better. Yet it still has me interested – I want to learn more about the language and write more Clojure code because deep down there’s something kinda cool about it.