PHP: One Year On
January 28, 2012
I try to learn a new programming language each year. In recent years, my bias has been towards a more functional nature with Erlang and Clojure being two very enjoyable and intellectually satisfying choices. I didn’t make much fanfare of my new language for 2011, mainly because it was born of necessity rather than a way to challenge myself. Attendees to the London Python Code Dojo will probably have heard a few things from me, not always in a flattering light.
It’s roughly a year on from when I first started doing serious things with PHP and I thought it might be an excellent time for a quick bit of reflection. One problem with Erlang and Clojure has been that I’ve not directly had to use either in a production environment, or deal with a legacy codebase written in said languages. I say directly, because I’ve been working with RabbitMQ, which is written in Erlang… but that doesn’t really count.
So, first a disclaimer and some background. It’ll seem very negative, so any fans of PHP please bear with me…
I’m not sure when I first encountered PHP, but it was still called Personal Home Page at the time. Wikipedia tells me it was called Personal Home Page Tools, but anyway. I was writing Perl at the time and it didn’t appeal to me. I rediscovered it in 2000 when I spent a day updating someone’s website that was using PHP 3. At the time I was writing Python and Zope code, and it felt horribly backward and just plain wrong. (I said this would seem very negative, bear with me…)
From then on, PHP didn’t really register with me much. Generally, my encounters with PHP were via poorly written and insecure websites, written by people who really didn’t know how to program. This tainted my view of PHP and the associated community quite considerably.
So, with that background in mind, I joined a new company at the end of 2010 which had a very complex legacy web application written in… PHP. I was thrilled, obviously. But the good news was that I was recruited to help rewrite the application from scratch, replacing the considerable backend functionality with Python and using PHP and JavaScript for the front-end. For various reasons, I ended up taking over maintenance of the existing website for a few months, which gave me a rather urgent requirement to get down and dirty with PHP.
Now, working with the legacy code base, and given my prejudice already, this didn’t initially endear me to PHP. There seemed to be a lot of workarounds and convoluted functions to replicate what comes for free in Python. The old bracing wars of C were back, with different past developers having their own unique style of formatting – and not always lining things up nicely, which makes Pythonistas like me cry. Don’t even get me started on all the includes being pulled in causing all kinds of chaos, the warnings, the functions with lines of the code measured in four digits (I wish I was joking). A true maintenance nightmare.
But nestled amongst the mess were signs of a language that had changed radically from PHP 3, and was actually quite nice once you got to know it. Ever since I discovered Perl, I’ve always preferred dynamic languages because they don’t get in the way of solving problems. They’re quicker to write, easier to test, more adaptable to change, and very powerful. You lose a minor few things in the process, but gain a heck of a lot more. I still feel PHP has a little way to go, but on the whole it doesn’t get in my way too much these days – a little quirky in places, but people find Python and Clojure quirky too. And many people are totally freaked out when they see Erlang.
The PHPUnit testing framework is pretty good. Easy to convert to from Python’s unittest (they’re both part of the xUnit family), and comes with things like mocking and database testing tools as standard. Pretty cool. Reflection was a little clumsy to deal with at first, a little inelegant, but it works fine once you understand it. TDD wasn’t initially part of the company’s workflow but PHPUnit made it very easy to introduce, which was great.
There are three web frameworks that I’ve used/experienced. Symfony is being moved away from at work and I must admit I’m not overly fond of it – but this could just be down to the way it’s been used by bits of the legacy code base. Zend Framework is the main one we’re adopting, and it’s proving pretty good. We’ve had a a few niggles and frustrations, but nothing major except that the documentation seems to be great for introductions, but not very good for the detail. There have been moments when something that should be easy to figure out wasn’t because of the docs – but the great thing about open source is that you can always wade through the code to find the information you need. I think I’ve been spoilt by Django.
Finally, honourable mention goes to the third framework, which we use for some smaller client sites: Kohana. It’s an elegant, lightweight framework, although the documentation could do with some improvements (I feel a little guilty because I was going to find time to contribute some but never did). Worth a look, so giving it some free PR.
Although Python will remain my main language of choice, and I’ll be sticking to Django and Flask for my own projects, I can’t be too dismissive of PHP and PHP developers these days (although seriously guys, who thought adding goto to PHP 5.3 was a good idea?*). The language has matured into a respectable dynamic language, and frameworks like Kohana and Zend deserve particular mention. There seems to be a lot of support and encouragement within the community for good practices like automated testing and better structuring of code which is great to see.
* Although the docs do warn you of Velociraptors:
http://php.net/manual/en/control-structures.goto.php
(scroll down a bit)
So there you go: PHP ain’t that bad after all.
As for my language for 2012, I was trying to decide between ones like Lua, Squeak, Haskell, Java and er… something else (must’ve been good!). In the end, I decided not to decide and see what happens this year. Meanwhile, I’m going to keep working on improving my PHP skills, update my JavaScript knowledge (part of a general “things have changed on the client-side since 2004″ approach) and continue learning Clojure.
Pondering Patterns
May 21, 2011
It’s been something like 11 or 12 years since I first heard of the book “Design Patterns: Elements of Reusable Object-Oriented Software”. It was around the time I was beginning to realise there might be something to this idea of object-oriented software development.
I’ve been pondering patterns ever since.
For those not aware of patterns (where have you been?), the idea stems from architecture of the physical building kind. People realised that there was a set of principles that had formed based on common solutions to common problems. You don’t stick a door there, you don’t put a window here, you need X number of columns for this, you need Y whenever you have a Z. You can probably tell why I never became an architect… but you get the idea. It encapsulated and documented a collective knowledge and experience of the process of architecture.
Humans have been building houses, fortresses and temples for thousands of years. That’s a lot of experience and knowledge we’ve accumulated about what we should and shouldn’t do. But when it comes to building software, we’ve only got a few decades under our belts. Much as we’d love to think of it as a mature and sophisticated process, it isn’t. When I left university, I was convinced software development was an engineering process – but the real world taught me that it just doesn’t fit well. It’s not a science either, despite my computer science degree. As a trained artist, I’d like to say it’s an art but that’s a simplification. And then someone mentioned a craft, and that is currently the best way to sum up what I do for a living. But even that isn’t a perfect analogy.
Anyway, I’ve diverged a bit. The “Gang of Four” made an attempt to encapsulate a set of knowledge and experience from software development in the same way architects had done, and that knowledge is present in their book “Design Patterns”. It’s a fascinating book to have on your book shelf and, in my mind, it presents two very important bits of information to programmers: a selection of solved problems and a vocabulary for programmers.
Solved problems are good. There’s no point wasting time re-inventing the wheel unless you can build a much, much better one. The Internet has given us a rich, supplemental way to seek out solutions to problems or advice on how best to approach common situations. The growth of free and open source software gives developers access to the real shared knowledge and literature of programming: the source code. We lead busy lives and the business wants results yesterday, so knowledge of what is a solved problem is good to have because it’s one less thing to worry about and we can concentrate on the other stuff.
The vocabulary aspect is one that often gets overlooked by people talking about patterns. Even before I discovered the ideas behind domain driven design, I’d been interested in the idea of a “ubiquitous language”, a shared glossary between developers, domain experts and users that would allow all three to communicate effectively. Too many software projects fail because of the lack of communication or, perhaps worse, misinterpreted communication. While patterns might not hold to all three types of people mentioned, they do provide a shared terminology for developers to talk to each other about common solutions. At least your developers all understand each other, right?
So patterns are a good thing then? Well, yes and no. “Design Patterns” is a fascinating book to have on your shelf, and that’s ultimately where it resides. I actually read it all the way through just after I bought it and went “great!”. I dipped into it maybe two or three times in the space of a few months afterwards and that was it. It now sits there nestled between “Patterns of Enterprise Application Architecture” and “Refactoring”. Don’t ask about my elaborate book ordering system.
So what’s wrong with patterns? Well, nothing intrinsically. As I said earlier, it provides a repository of shared knowledge/experience in solved problems, and it encourages a shared language between developers. Both of which are good. Trust me.
Let’s take the shared language idea first as it’s a bit of a flimsy argument. Most, but definitely not all, developers I’ve worked with either haven’t heard of patterns or don’t know them off by heart. I know I don’t remember all the names. That’s mainly an education issue, rather than a problem with patterns, but it puts a barrier between people – it can even be taken as a mild form of elitism. You can shove your Flyweight Pattern up your Chain of Responsibility, they say. Maybe.
Even amongst those of us “in the know”, a problem is rarely a pure example of one of the patterns so we end up adding extensions to what we’re saying. In fact, it’s often more comfortable slipping from pure Patternspeak to Domainspeak. It’s not a Command pattern, it’s an Order Request or a Unicorn Disposal. Or we go low-level and discuss language features. The Patternspeak abstraction sits in a middle ground between domain language (understandable by the business) and implementation language (understandable by geeks and compilers).
Defining language and process has an unusual side-effect in that it sets an, often arbitrary, limit on creativity. Much as there is still a school of thought that programming can be defined without the need for creativity, that is nonsense – it’s problem-solving, and problem-solving often cannot be described by a rigid formula or process. I laugh when I hear about ideas such as executable UML, because computer-assisted software engineering was The Next Big Thing when I was at university and we’re still waiting for it to deliver.
The risk with patterns is that developers can become restricted by them – they follow the pattern precisely, or refuse to see options beyond the documented patterns. Again, that’s not a problem with patterns as such, but the way they are used. Actually, it’s a limitation of the human mind – we tend to stay within our comfort zone and often stick to what we know, even when that knowledge doesn’t seem to be beneficial in the circumstances. We become blinkered.
Recently I heard mention of “refactoring to patterns”. I think it was on an episode of DotNetRocks, but I might be wrong. You write your code free from preconceived ideas. Strictly speaking, you write code confined by the limitations of your own experiences and knowledge (or that of your team). Actually, you write code based on the real problem at hand. Then you step back and realise you have a Facade, Memento, Syntax-Directed Translator, Service Stub or whatever, which may or may not be a help when refactoring your work to be cleaner, more elegant or more understandable. Or when discussing the problem with an “in-the-know” colleague.
The other thing I heard a couple of years ago was that patterns are language-dependent – and I’ll chip in the phrase “paradigm-dependent” too. Interesting idea. The clue was the Gang of Four’s subtitle “Elements of Reusable Object-Oriented Software”. Object-oriented development might have the mindshare these days, but C is still one of the most widespread languages and functional languages like Haskell, Erlang, Clojure and F# are rapidly becoming the cool kids on the block. The Gang’s patterns are based on their knowledge and experience of statically-typed, object-oriented languages.
Something that could be a pattern in C++ or Java, might be a simple bit of syntax in something like Python or Ruby. How do object-oriented patterns map to functional or procedural languages? What about for dynamically typed languages? Of course, they have their own patterns – I’ve seen Erlang patterns crop up on the mailing lists, decades of Lisp and Scheme pattern seep into Clojure, and even a minority of fellow Pythonistas claiming patterns don’t exist in Python. Wishful thinking guys – we just have different patterns. When put like that, the shared vocabulary and knowledge provided by patterns becomes more disperse – with maybe a tiny shared core somewhere in the murk.
In conclusion, reviewing my meandering and rambling, patterns are still a good thing because of the shared language and wisdom of solved problems. But they can be troublesome too, if used inappropriately. My copy of the book has “An Invitation” at the end which essentially says patterns are something you build up yourself, with those in the book as a starting point. It was something I’d forgotten about until I dusted off the book to sit beside me as I wrote this. I wonder how many others have forgotten this important piece of advice – or even missed it completely?
Think about the patterns in your own domain, your own choice of technology. Then think about the language you use to describe things to your fellow developers, your domain experts and your customers – are you all understanding each other? Really?
Why I’m Still Learning Clojure
September 20, 2010
Last year was C# and Erlang, this year was Clojure thanks in part to the excellent London Clojure Code Dojo.
(Bear with me for the next paragraph, and the subjective nature of later bits)
Clojure has been an interesting experience because I have neither a Lisp nor JVM background. I’ve found myself having to deal with prefix notation and a parenthesis overdose, as well as the unfamiliarity of the Java ecosystem. The official documentation is patchy, the language is still new and evolving, dedicated development tools are relatively immature, and there are moments when the JVM leaks through. It’s the first time I’ve felt a little out-of-my-depth when learning a new programming language, which is not necessarily a bad thing.
Yet I keep going to the dojos, I still flick through the book when I get the odd spare moment, I keep improving my development environment and I’m still umming and ahhing between Clojure and Erlang for a possible home project I want to work on next year, once my FluidDB and game ideas have been unleashed upon the world properly.
So why am I still learning Clojure?
I asked myself that question recently. I still disagree with Neal Stephenson’s comment that Lisp “is the only computer language that is beautiful”, but Clojure has helped me to understand why people like Lisp as well as clean up some of the ugliness of the language. I’m sorry guys, but Lisp has always looked pretty ugly to my eyes, despite the lack of boilerplate found in languages like, say, Java or C#. When Clojurians say that it’s Lisp reloaded, they mean it – which has upset some Lisp programmers but pleased others.
There’s a greater sense that people are using Clojure to solve real problems, evolving the language to deal with real issues rather than thought exercises from the ivory towers of academia. The community itself is generally a friendly and positive one with a good mix of people from different backgrounds, not just the Lisp and Java worlds. Being there as the language and community evolves has been fascinating and quite exciting.
Getting back to Clojure’s Lisp origins, I still believe it should be the starting point, not the direction. Clojure needs to find its own way, its own metaphors and patterns – retaining Lisp’s strengths while confidently discarding the legacy baggage.
Take macros, for example: much as I can see the power of them, I think they’re a distraction rather than a tool in this day and age. A controversial view for sure, but I guess I’ve never seen a conclusive argument for their use – indeed, the wisdom seems to be that you shouldn’t really need to use them. It feels like there should be a more contemporary or Clojure-like way of providing the power of macros, without the complexity or obfuscation. I don’t know what that way is yet – but I’m not sure others do either.
Clojure encourages immutable data structures, but acknowledges that the real world doesn’t work that way so a pragmatic language needs to handle mutable state. Clojure implements this in more controlled ways to other languages. My favourite option is software transactional memory. STM is a natural fit to someone who has used relational databases for the last ten years. It feels right for many applications in a concurrent environment, and it’s right there in the language from the start – not hacked in later or provided as a third-party module.
It might not quite fit in my brain like Python, I don’t quite grok the intricacies of the language yet, and the sooner more of Clojure is implemented in Clojure the better. Yet it still has me interested – I want to learn more about the language and write more Clojure code because deep down there’s something kinda cool about it.
EuroPython 2010
July 24, 2010
Well, I’m back from another great EuroPython – my thanks to John Pinner and his EuroPython crew, the presenters, Josette from O’Reilly, and all my fellow delegates. Regrettably, I might not be able to make the next EuroPython in the wonderful city of Florence, but I might be able to make it to PyCon AU in Sydney next year. Fingers crossed on that one – I might even pluck up the courage to do a talk!
Monday
Monday’s talks kicked off with Raymond Hettinger’s “Idiomatic Python”. He guaranteed everyone would learn something new and he definitely delivered. I walked away with some interesting gems of Pythonic knowledge, from basic stuff like enumerate() through to more advancing knowledge like the surprising behaviour of super().
Ezio Melotti followed after the break with a useful overview of the Python development process, which inspired me to join the Python Core sprint on Friday. I’ve already contributed my first couple of patches to Python 3.2 – the first of which has been committed to trunk already.
“How Import Works” was a run through by Brett Cannon on how, you guessed it, Python imports code. Although you’ll probably never need to modify the standard import process, it’s fascinating to discover how it all works. Best of all, new utilities in Python 3 make it much easier to customise things safely if you ever need to.
“PostgreSQL’s Python Soup” was a little disappointing, with a good chunk of the talk being a general one about relational databases. The interesting stuff about connecting to PostgreSQL came quite near the end when we were introduced to the many different connectors. And I mean many.
Monstrum is a new HTTP functional testing framework for Python 3, and was detailed in “Testing HTTP Apps with Python3″. It looks very cool, and the logo would make a great t-shirt! Continuing the web testing theme, Raymond Hettinger followed up with “Python & Selenium Testing”. Alongside an overview of Selenium, Raymond introduced Sauce Labs, a company providing commercial support for Selenium, the web testing framework, via their cloud service.
To round off the day, I attended two talks by Richard Watts and Tony Ibbs of Kynesim who presented Muddle, their open source build system which looks very cool, and KBUS which is an elegant and lightweight messaging system implemented as a Linux kernel extension.
Tuesday
The day’s talks began with a rather packed presentation by Guido on AppStats, an AppEngine monitoring tool. I must admit that I did partially go to hear him talk, as did probably a lot of other people, but also because I was interested to learn more about AppEngine. I picked up some interesting bits of information about AppEngine internals, and the tool looks fantastic. However, I did feel a bit out of my depth.
Bart Demeulenaere’s “Pyradiso Rapid Software Development” was a call to arms, a request and discussion on how to create a rapid application development framework in Python that supports the multicore world out-of-the-box.
Engaging both my computer science and art historical sides, I then attended Richard Barrett-Small’s “The Trojan Snake”. Richard works for the Victoria & Albert Museum, and has been instrumental in introducing Python and Django as a way to clean-up and replace a mess of bespoke PHP, ASP.NET and Java. An interesting look at how an organisation faced issues with bespoke applications in a variety of technologies, and found Python to be a flexible and effective solution. It’s a shame more people didn’t attend.
I was an avid reader of Michele Simionato’s blog posts, covering his experiences with Scheme, so it was great to hear him speak at EuroPython with his talk entitled “Intro to Functional Programming”. Functional programming is receiving a surge of interest amongst programmers, thanks to the search for better ways to deal with concurrency, so it was good for someone to cover other reasons to start exploring FP.
“Tickery, Pyjamas & FluidDB” by Terry Jones was a run through of Tickery, a FluidDB application for analysing Twitter friend sets. Regular readers of this blog will know that I’ve been interested in FluidDB since attending an introductory talk by Nicholas Tollervey. Definitely worth a look.
Rounding out the end of the day were Andrew Godwin’s “Fun with Databases and Django” and Russel Winder’s “Pythonic Parallelism with CSP”. Andrew’s talk provided a general overview of some of the more advanced features of the Django ORM, including Django’s interaction with non-relational databases such as MongoDB and CouchDB. Meanwhile, Russel Winder is one of the key players in pushing Python support for concurrent and parallel programming. CSP has been around for over 30 years, and now Python programmers can begin to take advantage of the ideas thanks to projects like Python-CSP and PyCSP.
Wednesday
Wednesday kicked off with Nicholas Tollervey’s “Organise a Python Code Dojo!”. Nicholas shared his experiences running the London Code Dojo – the joys, the successes… and the things that didn’t quite go to plan.
“HotPy – A comparison” was Mark Shannon’s comparison between three different VMs for Python: PyPy, Unladen Swallow and his own HotPy project which runs on top of the GVMT.
Henrik Vendelbo gave two short talks in succession. The first was “Real Time Websites with Python” which gave a simple example of the Tornado web server, released as open source by FriendFeed. The second was “Custom Runtimes with PyPy”, a very interesting talk on using PyPy to provide an easy way to bundle up a custom Python application as a self-contained app.
qooxdoo is a full-featured JavaScript framework for developing Rich Internet Applications. It’s quite an impressive piece of work, not only because of the sheer size of the code base. Managing large code bases can be a headache, especially if you also need to process that source to produce more compact, optimised versions for delivery to specific browsers. In “Python for Javascript Apps”, Thomas Herchenröder explained how they use a set of Python tools to make dealing with the code a much more pleasant experience.
“A Python’s DNA” by Erik Groeneveld illustrated an issue I’ve hit myself: how to configure a complex component-based system in a simple, maintainable way. The answer Erik came up with for Meresco is not to use XML or similar to express configurations, but use Python instead to provide a rich, flexible, and elegant way to define dependencies, data flows and configurations.
It’s not often a talk begins by telling you that the subject of the talk is now officially dead, and a new project has taken over. Kay Schlühr’s “The Trails of EasyExtend” quickly changed to “The Trails of LangScape”. LangScape evolved from a toolkit for writing language extensions to Python, but takes a much broader view with the aim of allowing easy ways to extend many different languages.
Thursday
The morning began with “Building a python web app”. Anthony Theocharis and Nathan Wright took us through their journey to find the right Python web framework to implement MediaCore. They discussed Django, TurboGears and Pylons, illustrating their experiences with each and why they settled on Pylons. The different approaches to database access and templating languages were also examined in brief, followed by how they open sourced the project. An entertaining and interesting talk, refreshing by the fact they apparently came from a non-Python background so lacked any preconceptions.
Next up, Denis Bilenko introduced gevent in his talk “gevent network library”. gevent is a network library for handling large numbers of connections efficiently and, more importantly, elegantly. Optional, seamless monkey patching of standard library modules makes it very easy to work with existing code and libraries. I also discovered Gunicorn a Python implementation of Ruby’s Unicorn webserver.
Finally, my selection of conference talks closed with “Arduino and Python” by Michael Sparks. The talk should probably have been renamed “How not to blow up your computer” as Michael crammed in a quick intro to electricity and electronics into a few slides to dispel concerns about inexpensive pieces of electronics destroying your computer. The talk was illustrated by his prototype of a rather interesting TV remote control developed at a BBC R&D workshop, built with an Arduino… and Python of course.
Friday
Big thanks to Brett Cannon and Richard Jones for putting up with me all day while I got to grips with downloading, building, testing and patching Python 3.2. It was a really enjoyable day, and very satisfying to have a patch committed to the Python source code – no matter how small.
Clojure First Impressions
January 19, 2010
I’m only a few pages into Stuart Halloway’s “Programming Clojure” book but I’m already starting to like Clojure, and by extension LISP. As with every language, it has its good points and its bad points, but so far it’s been a positive experience and is bringing the same smile to my face that Erlang brought me last year (mental note to self: need to carry on exploring Erlang).
I’ve yet to be fully convinced about the “everything is data” approach, though I am aware of the merits, and worry that being able to redefine aspects of the language through macros can be both a curse and a blessing for maintenance of code. My mind is open though, and I’m looking forward to covering Clojure’s macro capabilities, something that has been of interest to me since reading Michele Simionato’s excellent “The Adventures of a Pythonista in Schemeland“.
Clojure has taken some useful steps in cleaning up LISP’s minimal syntax: less parentheses, clearer function parameters, commas as whitespace (trust me, it actually makes sense), more readable ames for built-in functions (is that the correct terminology in the LISP/Clojure world?), things which had traditionally been mildly offputting aspects of the LISP family.
One thing that can still confuse some people is Clojure’s use of prefix notation, since I’m not aware of any other general purpose language that uses it. Most kids these days have only ever seen Algol-descended languages… even if they have no idea what the heck Algol is. It’s actually not that difficult to adjust to, but seeing something like (+ 1 2) can look odd to many… until you realise it’s not that different to saying +(1,2) or add(1,2), rather than the more usual syntax of 1 + 2.
In some ways, it’s probably more natural to many of today’s programmers than Erlang’s Prolog-derived syntax would be (I’m conveniently ignoring the more imperative aspects of Erlang). Years ago someone told me that LISP was an expression of syntax trees, and that mental image has stuck with me – after all, the syntax of LISP is about lists of data/code, or more correctly symbolic expressions. Erlang is more like Prolog’s expression of Horn Clauses.
The prefix syntax does remove the problem of precedence, which is touted by some as an advantage over other languages. While true, it does feel a little bit like cheating, because writing something like (* (+ 1 2 ) 3 ) is no different than requiring explicit use of precedence-defining parentheses in other languages: ( ( 1 + 2 ) * 3 ). However, it does improve extensibility… sort of. Infix notation is a little limited: 1 + 2 is stuck rigidly to two parameters, and the singular case of + 1 isn’t quite as much fun. How do you express 1 + 2 + 3, but using only a single + character? Clojure’s syntax allows us to use + as a summing function as well as for simple addition: (+ 1 2 3 4 5). Okay, something like sum(1,2) and sum(1,2,3,4,5) in some other languages. The thing is, Clojure defines a consistent approach, whereas most languages have special-cased some of the basic mathematical operations.
Speaking of which…
user=> (/ 1 2) 1/2
Huh? Ah, I see! Integer division results in a fraction or ratio. I can see the benefits of this evn though it does feel a little odd, but that’s probably because I’ve been conditioned by other languages that would return 1, 0.5, 0 or other interesting floating point variations. To get the expected answer, you need to use (quot 1 2) for an integer value or (/ 1 2.0) to return a floating point value. The presence of the “keyword” (symbol?) quot shows that Clojure hasn’t entirely dispensed with the abbreviations that could make older versions of LISP a little cryptic and newbie-unfriendly, but it’s a heck of an improvement.
Clojure’s use of transactions for modifying mutable data is interesting. At first glance, the simple introductory example in the book looks a little unwieldy with the calls to dosync and alter just to update a single variable. But I can see how it might shine in more complex examples waiting in later chapters.
Now back to the book…
Why I’m Learning Clojure
January 10, 2010
Recently, I posed the question of which language I should learn in 2010. The options were Squeak, Common Lisp, Clojure, Groovy… or to make a second attempt with Ruby. I must admit I was veering towards Lisp when I wrote it, but all options were still possibilities. Out of the two Lisp dialects, Common Lisp was probably my preference over Clojure but in the end I decided I would give the latter a shot.
So why Clojure?
Well, the first reason is down to it being a new language. There’s something quite interesting about trying a language in the early days. Since I’m not going to be writing production code with it for the foreseeable future, having a moving target isn’t really an issue. The community surrounding the language is still forming and there’s a chance to watch and interact with a forming and evolving Clojure ecosystem. London also has a Clojure code dojo running, aimed at introducing people to Clojure, which offers an excellent opportunity to help with my learning – and best of all, there’s an overlap with some members of the London Python community. Very handy.
Secondly, Clojure is a Lisp dialect. Lisp has been on my list of languages to learn for a long time, but I’ve always ended up looking elsewhere and my only real experiences with the Lisp world have been trivial at best, and never with “the real thing”. In some ways I guess I’m still following that path, though Clojure feels more like a proper Lisp because at heart it is.
Clojure’s primary environment is of the JVM, an area I’ve only limited experience of. There was a time when Java was appealing to me, but every time I take a second/third/umpteenth look at the language it seems to have accumulated more baggage and bloat. It’s a huge and complex language and ecosystem to get into these days, though very mature and powerful. Clojure would give me an opportunity to delve into the JVM world a bit more.
The language is also making progress in supporting the .NET world. Much as I’m definitely not a Microsoft fan, .NET has grown on me and there’s some interesting work happening with it and the open source Mono implementation. I actually rather like C#, though wish someone would take the core language and move it out of the .NET world – probably a weird idea to some. Anyway, Clojure on the CLR is an interesting idea, so adds a little to the interest because I have to mix things up with .NET developers at work.
So that’s all the fluffy stuff out of the way – what are the other reasons?
Well, Clojure is a functional programming language, which is a current interest of mine. Spurred on by my adventures with Erlang, functional programming is growing on me. It’s also dynamically typed, which has been one of the big revelations during my career, perhaps even more than object-oriented development. It compiles to JVM bytecode; being Lisp, it has read/eval/print which makes language exploration and prototyping a joy; and I can delve into a language with a proper, powerful macro capability.
Clojure is also geared up for the big issue of the now: concurrency. Erlang interested me because of the way it tackled concurrency, something that all programmers are going to have to face up to sooner or later as we live in a multicore world. I like Erlang’s approach a lot, but there are other options out there and I feel it is my duty to examine a few of the options. In particular, the use of Software Transactional Memory (STM) holds an appeal to me. Having used databases the last ten years, transactions feel quite a natural approach for many problems, but definitely not all, and incorporating them into other software domains can only be a good idea. I was considering Haskell at one point to investigate STM, but I must admit to not being a massive fan of parts of Haskell’s syntax so never pursued the idea. With Clojure, I can at last take the chance to delve into STM.
I’m not going to stop learning and exploring Erlang, far from it, but I’m looking forward to adding Clojure to my programming toolkit.
Language for 2010
December 14, 2009
I’ve been a computer language geek for a long time. Even before I read the excellent book “The Pragmatic Programmer”, I would try and learn a new language every year or two – or at least look at new languages to get ideas. This year I added C# to my repertoire and made a return to learning Erlang, which was supposed to be my original choice for the year until the demands of needing to deal with .NET forced a more pragmatic change.
Even if you never use a new language long-term, just taking a look helps sharpen skills, bring about new insights or even create a renewed appreciation for your current language of choice.
My choice for 2010 is quite tricky, so I need some help in deciding from you, dear reader. The short list is as follows:
Squeak –
http://www.squeak.org/
My Smalltalk knowledge is all read and no write, so time I actually learnt the language properly and wrote some code because it looks like such a simple and elegant language. I don’t think knowing and enjoying some Objective-C quite counts, somehow!
Squeak feels like the right way for me to learn Smalltalk. I even downloaded the software a while back, but am ashamed to say I just haven’t installed it yet.
Common LISP –
http://clisp.cons.org/
LISP has been on my list (no pun intended) for years. I’ve done a tiny bit of EMACS LISP programming and the Amiga Installer uses a LISP dialect, but neither are really sufficient for me to say “I’ve done LISP”. Eric Raymond (I believe) said LISP was probably the best programming language to learn that you will never use, and I think he’s right. LISP featured things like garbage collection and dynamic typing back in 1958, features that some might believe are relatively new additions to programming languages.
I will, however, say that I don’t agree with people like author Neal Stephenson who insist LISP is the only programming language that can be described as beautiful.
Clojure –
http://clojure.org/
Groovy –
http://groovy.codehaus.org/
I made a brief look at Java (1.1) in 1996 thanks to a foresighted university lecturer, and again in 2005 when I was unemployed after returning from a season snowboarding. The JVM is pretty cool, but Java has changed a lot since the early days and not always for the better in my opinion. Apart from those two little forays, I’ve avoided the Java ecosystem as an area of interest. Now, however, things are changing. Non-Java languages on the JVM are gaining ground, including Jython and JRuby, but also new languages like Clojure and Groovy are appearing and revitalising the Java world.
Groovy seems to have quite a bit of interest from people in the Python community and, from what I’ve seen, I can understand why. It simplifies a lot of the complexity / boilerplate of Java, adds the power of dynamic typing that smug Pythonistas such as myself take for granted, and the name is suitably non-Enterprisey for me to take notice.
Clojure, on the other hand, is a dialect of LISP for the JVM. This would allow me to look into the JVM ecosystem while learning LISP. I also note there is CLR / .NET support as well, which is interesting – though I doubt I will be able to use it to interact with legacy .NET code at work.
Ruby –
http://www.ruby-lang.org/en/
Not strictly a new language for me, but I briefly looked at Ruby on two occasions (notice a Java-like pattern here?). The first time was in 2005 for a possible Ruby developer job. They couldn’t find Ruby programmers so were looking for people with Perl and Python experience – I hadn’t heard of Ruby at the time, but it sounded interesting so I had a quick play with it. The second time was in 2007 when I read the “RESTful Web Services” book.
I must admit, I found Ruby terribly disappointing on both occasions. Please don’t flame me! A language that blends elements of Perl and Python, with some LISP-like tweaks, should be pretty cool, but it just felt syntactically awkward in places. Despite my disappointment, I can see why some people like it and it’s making more of an appearance in places I wouldn’t expect a dynamic language to be mentioned. I should really give it another go.
Now your turn…
So, what language from the list do you suggest I look at and why?
Erlang In My Head (Part 3) – Counting Characters
September 16, 2009
This is part 3 of an irregular (honest!) series of posts as I tackle learning and using Erlang whenever I have a few spare minutes. The aim is to pick a small and/or trivial task that can be completed in a short amount of time and aims to shine a little light on different bits of the language. Feedback from both seasoned and novice Erlang programmers is welcome, as are contributions from fellow Pythonistas.
The challenge this time was to take a string and count the number of characters, returning a list of characters with their counts. The character counts are to be case insensitive, and the output doesn’t need to be formatted other than enough to make some sense.
In Python, I would write a few lines of code that would iterate over the string and add the results to a dictionary which stores the counts. As with the previous challenge of reversing a string, there is a recursive solution too, but it’s not very efficient or elegant. I’m deliberately not offering Python code this time in order to concentrate more on thinking in Erlang.
When I first approached the problem, I could already see how the recursive solution would be more elegant in Erlang. You have your string, which is really a list of integers, and split it into a head and a tail. Keeping efficiency in mind, I have to resist the urge to find the counts for the tail and then add the head’s character to the counts. Instead I opt for something that is, hopefully, optimised for tail recursion.
The first thing is to write a wrapper which takes a string and calls my character counting function:
count(String) -> dict:to_list( character_count(String, dict:new()) ).
I’m casting the dictionary returned by character_count into a list to make it more readable. Erlang will, however, show integers instead of the actual characters, but if I wanted an even better way to display the results I could do it here instead. However, I’m keeping it simple because I have limited time and this code is only intended to satisfy my own curiosity.
The simplest situation is when I have an empty string:
character_count([], Counts) -> Counts.
Because there are no characters to count I have no updates to make to the supplied dictionary, so I return the accumulated Counts.
Right, that’s the empty case handled, but what if I have a non-empty string? That seems pretty likely for something that will count characters in a string! Well, first off I’m going to have a clause that takes a string and splits it into a Head and a Tail:
character_count([Head|Tail]) ->
My counts are intended to be case insensitive, so first I’m going to normalise my characters by uppercasing the Head:
Character = string:to_upper(Head),
Now I need to check to see if I already have a count for Character in the Counts dictionary. At first glance this looked like the perfect opportunity to use the Erlang if statement, until I realised it didn’t feel right: (note code is untested since it was never used)
if dict:is_key(Character, Counts) -> ...; true -> ... end
The true atom at the end bugs me a little, and others judging from various blog entries, but I understand why it is there and respect the need for it. Even ignoring this, the if expression doesn’t look or feel right. The case expression provides, in my opinion, a much more elegant expression of intent:
case dict:is_key(Character, Counts) of true -> ...; false -> ... end;
Much better!
At this point we can flesh out the code that does the real work of updating the counts:
NewCounts = case dict:is_key(Character, Counts) of true -> dict:update(Character, fun(Value) -> Value + 1 end, Counts); false -> dict:store(Character, 1, Counts) end,
Thus my case statement handles the correct manipulation of the dictionary, depending on whether our Head character was present in the Counts dictionary or not.
Manipulation is probably the wrong word, to be honest. Being immutable, we don’t actually change the dictionary. The dict:store/3 function creates a copy of the Counts dictionary with the addition of a new Character key set to 1 (since it’s the first time we’ve counted the character).
The dict:update/3 does the same thing, it returns a new dictionary which has the value/count of Character incremented. At first glance, it looked a little strange in that it takes a function to map the existing value to a new one. This caught me out first time: you don’t give it the new value! The advantage is that I didn’t need to retrieve the existing value, assign the incremented value to a new value, then call update with the new value. The downside is that a simple replacement of the value still requires a mapping function.
Now I’ve updated my counts, I can make my recursive call to obtains counts for the tail of the string:
character_count(Tail, NewCounts);
The final module looks like this, with the count/1 function being the one called initially:
-module(character_count). -export([count/1]). -import(dict, [is_key/2, new/0, store/3, to_list/1, update/3]). -import(string, [to_upper/1]). count(String) -> dict:to_list( character_count(String, dict:new()) ). character_count([Head|Tail], Counts) -> Character = string:to_upper(Head), NewCounts = case dict:is_key(Character, Counts) of true -> dict:update(Character, fun(Value) -> Value + 1 end, Counts); false -> dict:store(Character, 1, Counts) end, character_count(Tail, NewCounts); character_count([], Counts) -> Counts.
Erlang In My Head (Part 2.1) – Feedback
September 13, 2009
I’d like to thank Angel for my first piece of valuable Erlang-related feedback.
I made a newbie goof with my string reversal code, in that I hit the same limitation I had with my recursive Python example. Python isn’t optimised for tail recursion, but Erlang is: provided you write code that takes advantage of this feature.
For those that don’t know what tail recursion is, and I must admit I was hazy about it until I wrote this post, I’ll use my original code and Angel’s tail recursion optimised code for illustration. First, my attempt:
reverse_string([Head|Tail]) -> reverse_string(Tail) ++ [Head]; reverse_string([]) -> [].
It looks nice and simple, and it’ll work, but there’s one problem: it still consumes stack space. The reason is that my first clause calls itself first, then appends the result of the call to the [Head]. Each time we make a recursive call, we store state on the stack, because there’s code waiting to execute based on the result. We keep adding to the stack until we hit the end of the recursive calls, then start popping the state off as we work our way back until we’ve built our fully reversed string. Fine for short strings, not so good when reversing larger ones.
Angel’s solution is this:
reverse_string(Str) -> reverse_string(Str,[]). reverse_string([Head|Tail],Acc) -> reverse_string(Tail,Head++Acc); reverse_string([],Acc) -> Acc.
The important thing to note is that the recursive calls are always last in each clause. Hence the term “tail recursion” – the recursive call is at the tail of the code. Because there’s no code waiting on the result of subsequent calls, we gain an important optimisation: when we reach the end of the recursive calls, we can jump right back to the beginning. Erlang’s compiler is smart enough to identify this and optimise the code accordingly. Much more efficient.
Okay, let’s run through the code, both for my own benefit and for any other Erlang newbies reading this post.
reverse_string(Str) takes the string and calls reverse_string with two arguments: the string Str and an empty list. What’s the empty list for? Well, take a look at the second clause:
reverse_string([Head|Tail],Acc)
This splits the string into a Head and Tail, the same approach I took on my original version. However, the important thing is the addition of a second argument Acc, which stores our work-in-progress. The clause then makes the recursive call as the only, and thus tail, call: supplying the Tail and the concatenation of Head to Acc. As we recurse, we build up Acc into our reversed string. All good so far.
Finally, the last clause matches the situation when we have an empty string, our end condition, and simply returns whatever we’ve accumulated in Acc. Erlang knows that there is no further work to do with each of the previous calls, and can therefore skip straight back to the beginning with the result. Excellent!
Further reading:
Erlang In My Head (Part 2) – String Reversal, The Educational Way
September 12, 2009
This is the second in an irregular series of musings on Erlang, with references to Python. The main aim is to encourage me to use spare moments in the evening to look at aspects of Erlang. Hopefully my murmurings will be of interest to fellow Pythonistas (and maybe others!) as well as generate feedback from the Erlang community.
Spare time is a scarce commodity at the moment, so I decided the best way to get back into learning and using Erlang is to pick small, often trivial, tasks that can be solved in a few minutes, but which would hopefully yield enlightenment on a facet or two of the language.
One example is that contrived task of reversing a string. Both Python and Erlang have efficient in-built or library methods/functions for reversing lists and strings, but calling something like my_string.reverse() or lists:reverse() isn’t very educational.
In Python, one quick’n'dirty solution would be:
def reverse_string(my_string):
new_string = ""
for character in my_string:
new_string = character + new_strin
return new_string
I could also dust off my computer science skills and approach the problem recursively, without the need for new_string:
def reverse_string(my_string):
if my_string:
return reverse_string(my_string[1:]) + my_string[0]
return ""
It works, but it doesn’t look as tidy as the first function. There’s also a limit on the depth of recursion, the CPython default of which (I believe) would restrict string size to about 1000 characters. Not a scalable solution.
Approaching the problem in Erlang using the first, iterative, approach is hampered by the fact I can only assign a variable to a value once. It might be possible to do it iteratively, but it would probably look as awkward as the recursive approach felt in Python.
Erlang treats strings as lists of numbers. While this means string handling is not as natural as it is in Python, it does offer us some help when approaching the reversal problem. My string is a list, and I can adopt the same approach as I did for the recursive Python function: append the head of the list to a reversed copy of the tail.
reverse_string([Head|Tail]) ->
reverse_string(Tail) ++ [Head];
reverse_string([]) ->
[].
My reverse_string function has two clauses. The first clause takes a list and splits it into a Head (the first element) and a Tail (everything else). It then concatenates the result of reverse_string(Tail) to the Head. Simple!
Or not. My first attempt suffered a gotcha: I was concatenating a list to an integer, which errored. Turning Head into a single item list, [Head], allowed the two to be concatenated.
The second clause deals with the case of an empty string. When you reach the end of the string your Tail will be empty, which matches this clause and ends our recursion. If you supplied an empty string initially, this would also match. Using an empty string “” instead of [] works equally as well, but I decided to keep things consistent with the list notation used in the first clause.
With this exercise complete, I’ll stick to lists:reverse in future.
Update: See
http://metaljoe.wordpress.com/2009/09/13/erlang-in-my-head-part-2-1-feedback/