September 23, 2009
Just a quickie in case anyone isn’t aware.
If you’re an Aquamacs user, you can easily add Erlang support. I’ve currently got Erlang installed via MacPorts, so all I did was modify my Preferences.el file, found at ~/Library/Preferences/Aquamacs Emacs/.
Assuming you have MacPorts installed in the default location, just add these lines:
;; Import Erlang mode (from MacPorts)
(setq load-path (cons "/opt/local/lib/erlang/lib/tools-2.6.2/emacs/" load-path))
(setq erlang-root-dir "/opt/local/lib/erlang")
(setq exec-path (cons "/opt/local/lib/erlang/bin" exec-path))
Obviously, the reference to tools-2.6.2 assumes the version I have installed: your experience may vary, so double-check!
For those who aren’t familiar with Aquamacs, it’s a really nice port of GNU Emacs with a native OS X GUI. You can find it at http://aquamacs.org/
September 16, 2009
This is part 3 of an irregular (honest!) series of posts as I tackle learning and using Erlang whenever I have a few spare minutes. The aim is to pick a small and/or trivial task that can be completed in a short amount of time and aims to shine a little light on different bits of the language. Feedback from both seasoned and novice Erlang programmers is welcome, as are contributions from fellow Pythonistas.
The challenge this time was to take a string and count the number of characters, returning a list of characters with their counts. The character counts are to be case insensitive, and the output doesn’t need to be formatted other than enough to make some sense.
In Python, I would write a few lines of code that would iterate over the string and add the results to a dictionary which stores the counts. As with the previous challenge of reversing a string, there is a recursive solution too, but it’s not very efficient or elegant. I’m deliberately not offering Python code this time in order to concentrate more on thinking in Erlang.
When I first approached the problem, I could already see how the recursive solution would be more elegant in Erlang. You have your string, which is really a list of integers, and split it into a head and a tail. Keeping efficiency in mind, I have to resist the urge to find the counts for the tail and then add the head’s character to the counts. Instead I opt for something that is, hopefully, optimised for tail recursion.
The first thing is to write a wrapper which takes a string and calls my character counting function:
count(String) -> dict:to_list( character_count(String, dict:new()) ).
I’m casting the dictionary returned by character_count into a list to make it more readable. Erlang will, however, show integers instead of the actual characters, but if I wanted an even better way to display the results I could do it here instead. However, I’m keeping it simple because I have limited time and this code is only intended to satisfy my own curiosity.
The simplest situation is when I have an empty string:
character_count(, Counts) -> Counts.
Because there are no characters to count I have no updates to make to the supplied dictionary, so I return the accumulated Counts.
Right, that’s the empty case handled, but what if I have a non-empty string? That seems pretty likely for something that will count characters in a string! Well, first off I’m going to have a clause that takes a string and splits it into a Head and a Tail:
My counts are intended to be case insensitive, so first I’m going to normalise my characters by uppercasing the Head:
Character = string:to_upper(Head),
Now I need to check to see if I already have a count for Character in the Counts dictionary. At first glance this looked like the perfect opportunity to use the Erlang if statement, until I realised it didn’t feel right: (note code is untested since it was never used)
if dict:is_key(Character, Counts) -> ...; true -> ... end
The true atom at the end bugs me a little, and others judging from various blog entries, but I understand why it is there and respect the need for it. Even ignoring this, the if expression doesn’t look or feel right. The case expression provides, in my opinion, a much more elegant expression of intent:
case dict:is_key(Character, Counts) of true -> ...; false -> ... end;
At this point we can flesh out the code that does the real work of updating the counts:
NewCounts = case dict:is_key(Character, Counts) of true -> dict:update(Character, fun(Value) -> Value + 1 end, Counts); false -> dict:store(Character, 1, Counts) end,
Thus my case statement handles the correct manipulation of the dictionary, depending on whether our Head character was present in the Counts dictionary or not.
Manipulation is probably the wrong word, to be honest. Being immutable, we don’t actually change the dictionary. The dict:store/3 function creates a copy of the Counts dictionary with the addition of a new Character key set to 1 (since it’s the first time we’ve counted the character).
The dict:update/3 does the same thing, it returns a new dictionary which has the value/count of Character incremented. At first glance, it looked a little strange in that it takes a function to map the existing value to a new one. This caught me out first time: you don’t give it the new value! The advantage is that I didn’t need to retrieve the existing value, assign the incremented value to a new value, then call update with the new value. The downside is that a simple replacement of the value still requires a mapping function.
Now I’ve updated my counts, I can make my recursive call to obtains counts for the tail of the string:
The final module looks like this, with the count/1 function being the one called initially:
-module(character_count). -export([count/1]). -import(dict, [is_key/2, new/0, store/3, to_list/1, update/3]). -import(string, [to_upper/1]). count(String) -> dict:to_list( character_count(String, dict:new()) ). character_count([Head|Tail], Counts) -> Character = string:to_upper(Head), NewCounts = case dict:is_key(Character, Counts) of true -> dict:update(Character, fun(Value) -> Value + 1 end, Counts); false -> dict:store(Character, 1, Counts) end, character_count(Tail, NewCounts); character_count(, Counts) -> Counts.
September 15, 2009
One of the best phrases I’ve encountered in my professional programming career is “smelly code”. It’s great: it sounds childish enough to make me smile, yet also highlights an important indicator in code quality.
Like many, I first encountered the concept of code smells in Martin Fowler’s essential “Refactoring” book, although the term was coined by Kent Beck and predates the book. We’ve all seen examples, and even contributed our own aroma to code. Code duplication; excessively large classes, methods, functions, modules; unreadable code; tightly-coupled code; fragile code; overly complex code; code that circumvents encapsulation.
Over the years, I’ve picked up a nose for code smells. I get a feel for code that is starting to smell, and usually get that prick of guilt when my own code begins to emit an unpleasant fragrance, even if I don’t always know immediately how to deal with it. Deodorant comments like “# REFACTOR: this feels wrong” or “// REFACTOR: this code is ugly!” start to crop up as a warning sign that things need to be looked at, if immediate action can’t be taken.
The nose kicked in this afternoon while working on a Django view. Many smells start off with the best of intentions, and this was no exception. In order to make my tests pass, I started off with a fairly harmless bit of code that interacted with a form object. Over a period of a few tests, the code began to expand and some duplication kicked in. List comprehensions entered into the equation. The nose began to kick in. I can’t show the code, unfortunately, and I’m too lazy to concoct some example code that illustrates my point, so bear with me…
Once all the functionality I needed had been implemented and the code checked-in, I cast an eye over the code and realised it looked ugly. The warning signs were there:
- two almost-identical list comprehensions
- temporary variables used to make the list comprehensions more readable
- several lines to do something which should be simple
- the need to re-read the code more than once to “see” what it does
*facepalm* What was I thinking when I wrote that??
First step was to extract the code into a utility function, in the hope of making the view code more readable and make it easier to eliminate the duplication. I find extracting code can be an excellent first step in dealing with ugly code, but it has to be used responsibly: otherwise you end up moving the smells into another layer of abstraction.
In the process of creating the function, another smell became apparent:
- borderline intimate knowledge of the form class
This smell instantly pointed out where and how to refactor the code. The extraction was still necessary, but instead of creating a utility function in the view it was apparent that the code could be moved into two methods of the form. A quick change and a re-run of the tests confirmed that the extraction worked and the view instantly looked simpler and more readable. However, the new methods still didn’t quite feel right.
Next step was to take the list comprehensions and break them out into multi-line for loops. It might sound counter-intuitive to take code that occupies one line (albeit with preparatory code) and turn it into four or five lines of code, but the result was for the best. The intent of the code became clearer and the need for those temporary variables was removed.
All that remains is a minor whiff, a single bit of hardcoding used to identify particular fields in the form. My nose is telling me that introspection will lead me down the path of perfumed code…
Also of note:
(I’ve not tried this yet – just found it!)
September 13, 2009
I’d like to thank Angel for my first piece of valuable Erlang-related feedback.
I made a newbie goof with my string reversal code, in that I hit the same limitation I had with my recursive Python example. Python isn’t optimised for tail recursion, but Erlang is: provided you write code that takes advantage of this feature.
For those that don’t know what tail recursion is, and I must admit I was hazy about it until I wrote this post, I’ll use my original code and Angel’s tail recursion optimised code for illustration. First, my attempt:
reverse_string([Head|Tail]) -> reverse_string(Tail) ++ [Head]; reverse_string() -> .
It looks nice and simple, and it’ll work, but there’s one problem: it still consumes stack space. The reason is that my first clause calls itself first, then appends the result of the call to the [Head]. Each time we make a recursive call, we store state on the stack, because there’s code waiting to execute based on the result. We keep adding to the stack until we hit the end of the recursive calls, then start popping the state off as we work our way back until we’ve built our fully reversed string. Fine for short strings, not so good when reversing larger ones.
Angel’s solution is this:
reverse_string(Str) -> reverse_string(Str,). reverse_string([Head|Tail],Acc) -> reverse_string(Tail,Head++Acc); reverse_string(,Acc) -> Acc.
The important thing to note is that the recursive calls are always last in each clause. Hence the term “tail recursion” – the recursive call is at the tail of the code. Because there’s no code waiting on the result of subsequent calls, we gain an important optimisation: when we reach the end of the recursive calls, we can jump right back to the beginning. Erlang’s compiler is smart enough to identify this and optimise the code accordingly. Much more efficient.
Okay, let’s run through the code, both for my own benefit and for any other Erlang newbies reading this post.
reverse_string(Str) takes the string and calls reverse_string with two arguments: the string Str and an empty list. What’s the empty list for? Well, take a look at the second clause:
This splits the string into a Head and Tail, the same approach I took on my original version. However, the important thing is the addition of a second argument Acc, which stores our work-in-progress. The clause then makes the recursive call as the only, and thus tail, call: supplying the Tail and the concatenation of Head to Acc. As we recurse, we build up Acc into our reversed string. All good so far.
Finally, the last clause matches the situation when we have an empty string, our end condition, and simply returns whatever we’ve accumulated in Acc. Erlang knows that there is no further work to do with each of the previous calls, and can therefore skip straight back to the beginning with the result. Excellent!
September 12, 2009
This is the second in an irregular series of musings on Erlang, with references to Python. The main aim is to encourage me to use spare moments in the evening to look at aspects of Erlang. Hopefully my murmurings will be of interest to fellow Pythonistas (and maybe others!) as well as generate feedback from the Erlang community.
Spare time is a scarce commodity at the moment, so I decided the best way to get back into learning and using Erlang is to pick small, often trivial, tasks that can be solved in a few minutes, but which would hopefully yield enlightenment on a facet or two of the language.
One example is that contrived task of reversing a string. Both Python and Erlang have efficient in-built or library methods/functions for reversing lists and strings, but calling something like my_string.reverse() or lists:reverse() isn’t very educational.
In Python, one quick’n'dirty solution would be:
def reverse_string(my_string): new_string = "" for character in my_string: new_string = character + new_strin return new_string
I could also dust off my computer science skills and approach the problem recursively, without the need for new_string:
def reverse_string(my_string): if my_string: return reverse_string(my_string[1:]) + my_string return ""
It works, but it doesn’t look as tidy as the first function. There’s also a limit on the depth of recursion, the CPython default of which (I believe) would restrict string size to about 1000 characters. Not a scalable solution.
Approaching the problem in Erlang using the first, iterative, approach is hampered by the fact I can only assign a variable to a value once. It might be possible to do it iteratively, but it would probably look as awkward as the recursive approach felt in Python.
Erlang treats strings as lists of numbers. While this means string handling is not as natural as it is in Python, it does offer us some help when approaching the reversal problem. My string is a list, and I can adopt the same approach as I did for the recursive Python function: append the head of the list to a reversed copy of the tail.
reverse_string([Head|Tail]) -> reverse_string(Tail) ++ [Head]; reverse_string() -> .
My reverse_string function has two clauses. The first clause takes a list and splits it into a Head (the first element) and a Tail (everything else). It then concatenates the result of reverse_string(Tail) to the Head. Simple!
Or not. My first attempt suffered a gotcha: I was concatenating a list to an integer, which errored. Turning Head into a single item list, [Head], allowed the two to be concatenated.
The second clause deals with the case of an empty string. When you reach the end of the string your Tail will be empty, which matches this clause and ends our recursion. If you supplied an empty string initially, this would also match. Using an empty string “” instead of  works equally as well, but I decided to keep things consistent with the list notation used in the first clause.
With this exercise complete, I’ll stick to lists:reverse in future.
September 8, 2009
Coming from an imperative programming background, one of the earliest things to note about Erlang is that variables don’t actually vary. Erlang allows you to assign a value to a variable once, and that’s it: the assignment is permanent. I like to think of it as going from the unknown to the known, or unbound to bound if you want to be precise.
1> X. * 1: variable 'X' is unbound
Which isn’t any different to Python:
>>> X Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'X' is not defined
In both cases X has an unknown value, it isn’t bound to anything, so Erlang/Python doesn’t know what to do with it.
Thus we need to assign a value to X:
2> X = 42. 42 3> X. 42
Now Erlang is happy, and so is Python:
>>> X = 42 >>> X 42
However, that = sign has a different meaning in the two languages. In Erlang, it’s not an assignment operator but a pattern matching operator. Erlang takes a look at both sides of the “equation” and attempts to make sense of it. Since X is unknown up to that point, the only way it can reconcile the statement is to assign X the value of 42. 42 equals 42, and the world is at one with itself again.
Python takes the value 42 and applies the label X to it. I could easily move the X label to something else:
>>> X = "foo" >>> X 'foo'
What I can’t do is the same with Erlang:
4> X = "foo". ** exception error: no match of right hand side value "foo"
Erlang, quite rightly, complains because it knows that X has the value 42 (since we bound it to that earlier) and this definitely doesn’t match the string “foo”. If I try it against the value 42, I get the value 42 i.e. a match.
5> X = 42. 42
In terms of naming, variables in Erlang must begin with a capital letter, not lowercase:
6> x = 7. ** exception error: no match of right hand side value 7
The reason this failed is that Erlang has something called an “atom”, and atoms start with lowercase letters. An atom is effectively a constant that has a value of itself: it’s pre-bound, so you can’t assign it to anything.
7> x. x
The atom x is equal to x, which is why Erlang raised an exception when x was compared to 7.
Why would you want variables that can’t change? Well, for those of you who have done any concurrent programming you will know how tricky it is dealing with shared data. Functional programming languages like Erlang remove a whole class of concurrency-related problems by making data immutable. If the data is immutable, you can’t have one thread or process changing the contents of a variable while another is looking at it. Nor can a process overwrite the changes another process has just made. That means no need for locks and semaphores to protect data integrity, all of which have performance implications.
Another thing it makes easier is debugging. Once a variable is set, it stays set – defective code can’t update a variable it shouldn’t. You can also more easily prove an algorithm, and its implementation, is correct when you know that code is free of side-effects.
If you’re a Pythonista looking at this and haven’t given Erlang a go, I can recommend it as a way to try something different and expand the mind. You might even gain a fresh perspective of the way you write Python. If you’re already an Erlang programmer, I hope my understanding of the above is correct and any feedback is always welcome!
September 6, 2009
I’ve been attending management training courses the last few months, with one one-day course per month. The most recent one was on Time Management. I wouldn’t say I’m the greatest time manager, in fact a little disorganised, but luckily my public image is much better than my own mental image: people tell me that I get things done on time. Checking back, it’s true – but I always feel I can do better.
So it was great that one thing in particular grabbed me during the course: the Urgent / Important Matrix. The matrix struck me as a great way to evaluate workloads and priorities in a lightweight way, and I’m going to start using it at work to see how it pans out. I believe it originated in Stephen Covey’s “Seven Habits of Highly Effective People”, a book I haven’t read but have just added to my Amazon wish list.
Grab your current To Do list, or whatever else you use to track tasks, and go through each in turn. For each task, you assign an Urgency rating and an Importance rating. To keep it simple, your urgency is either Urgent or Not Urgent, and your importance (yep, you guessed it) is either Important or Not Important. You might find it easier to draw a large 2×2 grid with Importance on the x-axis, and Urgency on the y-axis. Write down the tasks in the appropriate box for their Urgency/Importance combination.
Now take a look at those tasks on the matrix.
At one extreme, we have the tasks that are neither urgent nor important. These are likely to be time-wasters and should be dealt with appropriately. Do, Delegate, Dump or Defer. If they’re quick tasks, or you have nothing else to do, do them: get them completed and off your list as soon as possible. Better yet, delegate the tasks to people who want to tackle thm and are able to get the work done. Otherwise, dump or defer the task. If it doesn’t need to be done, get rid of it from your list or defer until you know it can go. If it becomes important or urgent later, it’ll come back – we all know that from experience!
At the other extreme are the tasks that are both important and urgent. Your website has been Slashdotted, your data centre is currently engulfed by flames, or your new project rollout has failed spectacularly and it needs to be fixed ASAP. This section of the matrix is reserved for crises. You need to focus on dealing with those tasks straight away, but you also need to ensure two things: make sure you don’t end up firefighting the same issue again later on, and make sure other tasks don’t start creeping into this part of the matrix while you deal with a crisis task.
That leaves us two bits…
Urgent but Not Important tasks are ripe for delegation, if possible. If you can’t delegate them, they should be dealt with quickly and efficiently. Focus on resolving only the urgent aspect(s) of the task so you don’t use up valuable time needed for other tasks. Ultimately you’ll end up with a task struck off the To Do list, or one that is now no longer urgent or important.
Important but Not Urgent tasks are the interesting bit: these are usually the main tasks of your job role. I found this rather surprising, because I started off thinking most of my work was in the Urgent + Important category (my customers would certainly say they are), but it isn’t. Suddenly things seemed much clearer and more manageable. You need to keep on top of these tasks though, allocating sufficient time to get tasks done. Otherwise, they will end up rising in urgency and become crisis tasks.
If you’re struggling to find the time to do everything on your To Do list, give the matrix a try and see if it helps. You’ll probably find some tasks that can be delegated, some others that can be dumped and a few that can be struck off the list by finding a spare few minutes during a hectic day (great when you’ve hit a problem and need to take a break and do something else). More importantly, you’ll spot the ones that absolutely need to be dealt with first, and the ones that really matter to your day-to-day work.
September 1, 2009
I’ve been playing with Django for a while now, but today was the first day writing proper production code which will end up on a public-facing website. It will be the first Django application in use by the company, so there’s a lot riding on getting it right.
Django is great for writing web applications quickly and easily – whenever I hit upon a problem, I find something in Django that already does what I need, or I can implement something small in Python that fills in the gap.
One really nice feature is that it’s easy to test Django using both of Python’s in-built testing frameworks: unittest and doctest. I’m not a fan of doctests, though I can see why some people find them useful. I do love unittest though, and so it’s great that Django has provided enhancements to unittest to support things like basic browser client tests, loading test data fixtures and using a test database (sqlite is great for this).
The client doesn’t aim to replace dedicated web test systems such as Selenium, Windmill or Twill. It just offers a lightweight way to test your Django application’s functionality out-of-the-box. You can check response codes, test which template is being rendered, fire test data to forms, or check for text in returned pages. This was very useful today when fleshing out the first part of the application: a form with various fields, buttons, drop-downs and validation logic. I could write my tests first, a bit at a time, then implement the functionality to pass the tests. Supported by the confidence given by the tests, I could check the specified logic was correct, keep refactoring the code, and keep the design supple at the same time.
Of course, I will eventually need to invest in dedicated web testing, probably Selenium, but for the moment the Django TestCase gives me a very quick and flexible way to check functionality while things are changing frequently. Django rocks!