Bit Counts in Python, Erlang and Clojure
August 12, 2010
The subject came up yesterday about counting bits. I remembered the subject from university in the context of error checking, but I couldn’t remember the specifics during the conversation. It was only later that I recalled the subject of Hamming values or weights, and from there I tracked down an algorithm by Brian Kernighan (or Peter Wegner, or Derrick Lehmer). For fun, I translated the C source to Python to yield the following:
def bit_count( v ):
c = 0
while v > 0:
v &= v - 1
c += 1
return c
For a bit of practice, I then opted to convert into Erlang:
bitcount(V) -> bitcount(V, 0). bitcount(0, C) -> C; bitcount(V, C) -> bitcount( (V band (V - 1)), (C + 1)).
I could probably roll the two arity 2 clauses into one, but this is a minor habit from my old Prolog days and I always feel it looks a bit cleaner than opting for a case statement. Feel free to disagree.
Since Clojure is my “language for 2010″, and I’ve been rather slack on keeping up with it lately:
(defn bit-count
([v]
(bit-count v 0) )
([v c]
(if (zero? v)
c
(recur (bit-and v (dec v)) (inc c)))))
This feels like it could be expressed more functionally, dropping the two argument version for a compact one-argument version. This might be an interesting exercise for when I have a spare moment, unless someone out there already has a solution?
Just thought I’d share….
Things to look at in 2010
January 17, 2010
Time permitting, here are some technologies I want to learn / evaluate / play with:
I’ve already started learning Clojure as it’s my new language for the year. And by “started”, I mean I have a copy of the Programming Clojure book, have installed Clojure and written a Hello World program. I realise I have some way to go, but from little acorns…
CouchDB is a late entrant. I’ve yet to be convinced by non-relational databases, with the possible exception of document storage. For my current work, they would be extremely inappropriate except maybe for logging. However, I really want to get experience of a non-relational database under my belt – so I can properly argue where they make sense and where they don’t. CouchDB sparked my interest for three reasons: it was the first of the new generation of non-relational databases I’d heard of, it uses Erlang and it has a RESTful interface.
NLTK would be an interesting additional to my technical tool belt and I can see potential applications in my current job, where we’re finding ourselves increasingly sifting through textual data to determine various facts.
I rarely play computer games, but one game that has had me hooked for the last five years is Hearts of Iron II. I’m an armchair military historian, particular WW2 and other 20th Century warfare, so HOI2 has great appeal to me. I’ve long wanted to develop wargames in a similar style (though perhaps a little more towards the board game format), and even went so far as to writing story cards for various aspects of one game idea I’ve had. I’ve decided this year that I will make a start at coding, eventually releasing the code under an open source licence. I’m contemplating writing the code in Python, and have decided that PyGame would offer me a rich library for support (based on an excellent presentation at a recent London Python Code Dojo).
Twisted has been an interest of mine since I attended a talk at one of the PyConUK events. I even bought the, outdated, O’Reilly book on Twisted, but have never actually had need to take a serious look at it. However, things are picking up and I can see some possible areas which might be appropriate and I want to take this motivational opportunity to investigate Twisted more.
After reading the O’Reilly XMPP book a while back, XMPP is something else I want to investigate further. I actually had a play with XMPP from Python a couple of years back, but the Python library I used (I can’t remember the name offhand) but it was buggy, lacked certain functionality I needed, and wasn’t being actively maintained. Thanks to the book, I’ve discovered there are other libraries out there for Python and they look a whole lot better, as well as still actively developed.
The XMPP and Twisted investigations are sort-of related, since I’m keen to work on web service and messaging systems to support the infrastructure at work. We’ve actually started introducing SQL Server Service Broker, for better or worse, and I’m keen to find ways to interface with Service Broker in more architecture and language neutral ways i.e. without locking in to .NET, SQL Server and Windows. Optimistic?
Aquamacs Erlang Support
September 23, 2009
Just a quickie in case anyone isn’t aware.
If you’re an Aquamacs user, you can easily add Erlang support. I’ve currently got Erlang installed via MacPorts, so all I did was modify my Preferences.el file, found at ~/Library/Preferences/Aquamacs Emacs/.
Assuming you have MacPorts installed in the default location, just add these lines:
;; Import Erlang mode (from MacPorts)
(setq load-path (cons "/opt/local/lib/erlang/lib/tools-2.6.2/emacs/" load-path))
(setq erlang-root-dir "/opt/local/lib/erlang")
(setq exec-path (cons "/opt/local/lib/erlang/bin" exec-path))
(require 'erlang-start)
Obviously, the reference to tools-2.6.2 assumes the version I have installed: your experience may vary, so double-check!
For those who aren’t familiar with Aquamacs, it’s a really nice port of GNU Emacs with a native OS X GUI. You can find it at http://aquamacs.org/
Erlang In My Head (Part 3) – Counting Characters
September 16, 2009
This is part 3 of an irregular (honest!) series of posts as I tackle learning and using Erlang whenever I have a few spare minutes. The aim is to pick a small and/or trivial task that can be completed in a short amount of time and aims to shine a little light on different bits of the language. Feedback from both seasoned and novice Erlang programmers is welcome, as are contributions from fellow Pythonistas.
The challenge this time was to take a string and count the number of characters, returning a list of characters with their counts. The character counts are to be case insensitive, and the output doesn’t need to be formatted other than enough to make some sense.
In Python, I would write a few lines of code that would iterate over the string and add the results to a dictionary which stores the counts. As with the previous challenge of reversing a string, there is a recursive solution too, but it’s not very efficient or elegant. I’m deliberately not offering Python code this time in order to concentrate more on thinking in Erlang.
When I first approached the problem, I could already see how the recursive solution would be more elegant in Erlang. You have your string, which is really a list of integers, and split it into a head and a tail. Keeping efficiency in mind, I have to resist the urge to find the counts for the tail and then add the head’s character to the counts. Instead I opt for something that is, hopefully, optimised for tail recursion.
The first thing is to write a wrapper which takes a string and calls my character counting function:
count(String) -> dict:to_list( character_count(String, dict:new()) ).
I’m casting the dictionary returned by character_count into a list to make it more readable. Erlang will, however, show integers instead of the actual characters, but if I wanted an even better way to display the results I could do it here instead. However, I’m keeping it simple because I have limited time and this code is only intended to satisfy my own curiosity.
The simplest situation is when I have an empty string:
character_count([], Counts) -> Counts.
Because there are no characters to count I have no updates to make to the supplied dictionary, so I return the accumulated Counts.
Right, that’s the empty case handled, but what if I have a non-empty string? That seems pretty likely for something that will count characters in a string! Well, first off I’m going to have a clause that takes a string and splits it into a Head and a Tail:
character_count([Head|Tail]) ->
My counts are intended to be case insensitive, so first I’m going to normalise my characters by uppercasing the Head:
Character = string:to_upper(Head),
Now I need to check to see if I already have a count for Character in the Counts dictionary. At first glance this looked like the perfect opportunity to use the Erlang if statement, until I realised it didn’t feel right: (note code is untested since it was never used)
if dict:is_key(Character, Counts) -> ...; true -> ... end
The true atom at the end bugs me a little, and others judging from various blog entries, but I understand why it is there and respect the need for it. Even ignoring this, the if expression doesn’t look or feel right. The case expression provides, in my opinion, a much more elegant expression of intent:
case dict:is_key(Character, Counts) of true -> ...; false -> ... end;
Much better!
At this point we can flesh out the code that does the real work of updating the counts:
NewCounts = case dict:is_key(Character, Counts) of true -> dict:update(Character, fun(Value) -> Value + 1 end, Counts); false -> dict:store(Character, 1, Counts) end,
Thus my case statement handles the correct manipulation of the dictionary, depending on whether our Head character was present in the Counts dictionary or not.
Manipulation is probably the wrong word, to be honest. Being immutable, we don’t actually change the dictionary. The dict:store/3 function creates a copy of the Counts dictionary with the addition of a new Character key set to 1 (since it’s the first time we’ve counted the character).
The dict:update/3 does the same thing, it returns a new dictionary which has the value/count of Character incremented. At first glance, it looked a little strange in that it takes a function to map the existing value to a new one. This caught me out first time: you don’t give it the new value! The advantage is that I didn’t need to retrieve the existing value, assign the incremented value to a new value, then call update with the new value. The downside is that a simple replacement of the value still requires a mapping function.
Now I’ve updated my counts, I can make my recursive call to obtains counts for the tail of the string:
character_count(Tail, NewCounts);
The final module looks like this, with the count/1 function being the one called initially:
-module(character_count). -export([count/1]). -import(dict, [is_key/2, new/0, store/3, to_list/1, update/3]). -import(string, [to_upper/1]). count(String) -> dict:to_list( character_count(String, dict:new()) ). character_count([Head|Tail], Counts) -> Character = string:to_upper(Head), NewCounts = case dict:is_key(Character, Counts) of true -> dict:update(Character, fun(Value) -> Value + 1 end, Counts); false -> dict:store(Character, 1, Counts) end, character_count(Tail, NewCounts); character_count([], Counts) -> Counts.
Erlang In My Head (Part 2.1) – Feedback
September 13, 2009
I’d like to thank Angel for my first piece of valuable Erlang-related feedback.
I made a newbie goof with my string reversal code, in that I hit the same limitation I had with my recursive Python example. Python isn’t optimised for tail recursion, but Erlang is: provided you write code that takes advantage of this feature.
For those that don’t know what tail recursion is, and I must admit I was hazy about it until I wrote this post, I’ll use my original code and Angel’s tail recursion optimised code for illustration. First, my attempt:
reverse_string([Head|Tail]) -> reverse_string(Tail) ++ [Head]; reverse_string([]) -> [].
It looks nice and simple, and it’ll work, but there’s one problem: it still consumes stack space. The reason is that my first clause calls itself first, then appends the result of the call to the [Head]. Each time we make a recursive call, we store state on the stack, because there’s code waiting to execute based on the result. We keep adding to the stack until we hit the end of the recursive calls, then start popping the state off as we work our way back until we’ve built our fully reversed string. Fine for short strings, not so good when reversing larger ones.
Angel’s solution is this:
reverse_string(Str) -> reverse_string(Str,[]). reverse_string([Head|Tail],Acc) -> reverse_string(Tail,Head++Acc); reverse_string([],Acc) -> Acc.
The important thing to note is that the recursive calls are always last in each clause. Hence the term “tail recursion” – the recursive call is at the tail of the code. Because there’s no code waiting on the result of subsequent calls, we gain an important optimisation: when we reach the end of the recursive calls, we can jump right back to the beginning. Erlang’s compiler is smart enough to identify this and optimise the code accordingly. Much more efficient.
Okay, let’s run through the code, both for my own benefit and for any other Erlang newbies reading this post.
reverse_string(Str) takes the string and calls reverse_string with two arguments: the string Str and an empty list. What’s the empty list for? Well, take a look at the second clause:
reverse_string([Head|Tail],Acc)
This splits the string into a Head and Tail, the same approach I took on my original version. However, the important thing is the addition of a second argument Acc, which stores our work-in-progress. The clause then makes the recursive call as the only, and thus tail, call: supplying the Tail and the concatenation of Head to Acc. As we recurse, we build up Acc into our reversed string. All good so far.
Finally, the last clause matches the situation when we have an empty string, our end condition, and simply returns whatever we’ve accumulated in Acc. Erlang knows that there is no further work to do with each of the previous calls, and can therefore skip straight back to the beginning with the result. Excellent!
Further reading:
Erlang In My Head (Part 2) – String Reversal, The Educational Way
September 12, 2009
This is the second in an irregular series of musings on Erlang, with references to Python. The main aim is to encourage me to use spare moments in the evening to look at aspects of Erlang. Hopefully my murmurings will be of interest to fellow Pythonistas (and maybe others!) as well as generate feedback from the Erlang community.
Spare time is a scarce commodity at the moment, so I decided the best way to get back into learning and using Erlang is to pick small, often trivial, tasks that can be solved in a few minutes, but which would hopefully yield enlightenment on a facet or two of the language.
One example is that contrived task of reversing a string. Both Python and Erlang have efficient in-built or library methods/functions for reversing lists and strings, but calling something like my_string.reverse() or lists:reverse() isn’t very educational.
In Python, one quick’n'dirty solution would be:
def reverse_string(my_string):
new_string = ""
for character in my_string:
new_string = character + new_strin
return new_string
I could also dust off my computer science skills and approach the problem recursively, without the need for new_string:
def reverse_string(my_string):
if my_string:
return reverse_string(my_string[1:]) + my_string[0]
return ""
It works, but it doesn’t look as tidy as the first function. There’s also a limit on the depth of recursion, the CPython default of which (I believe) would restrict string size to about 1000 characters. Not a scalable solution.
Approaching the problem in Erlang using the first, iterative, approach is hampered by the fact I can only assign a variable to a value once. It might be possible to do it iteratively, but it would probably look as awkward as the recursive approach felt in Python.
Erlang treats strings as lists of numbers. While this means string handling is not as natural as it is in Python, it does offer us some help when approaching the reversal problem. My string is a list, and I can adopt the same approach as I did for the recursive Python function: append the head of the list to a reversed copy of the tail.
reverse_string([Head|Tail]) ->
reverse_string(Tail) ++ [Head];
reverse_string([]) ->
[].
My reverse_string function has two clauses. The first clause takes a list and splits it into a Head (the first element) and a Tail (everything else). It then concatenates the result of reverse_string(Tail) to the Head. Simple!
Or not. My first attempt suffered a gotcha: I was concatenating a list to an integer, which errored. Turning Head into a single item list, [Head], allowed the two to be concatenated.
The second clause deals with the case of an empty string. When you reach the end of the string your Tail will be empty, which matches this clause and ends our recursion. If you supplied an empty string initially, this would also match. Using an empty string “” instead of [] works equally as well, but I decided to keep things consistent with the list notation used in the first clause.
With this exercise complete, I’ll stick to lists:reverse in future.
Update: See http://metaljoe.wordpress.com/2009/09/13/erlang-in-my-head-part-2-1-feedback/
Erlang In My Head (Part 1)
September 8, 2009
Coming from an imperative programming background, one of the earliest things to note about Erlang is that variables don’t actually vary. Erlang allows you to assign a value to a variable once, and that’s it: the assignment is permanent. I like to think of it as going from the unknown to the known, or unbound to bound if you want to be precise.
In Erlang:
1> X. * 1: variable 'X' is unbound
Which isn’t any different to Python:
>>> X Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'X' is not defined
In both cases X has an unknown value, it isn’t bound to anything, so Erlang/Python doesn’t know what to do with it.
Thus we need to assign a value to X:
2> X = 42. 42 3> X. 42
Now Erlang is happy, and so is Python:
>>> X = 42 >>> X 42
However, that = sign has a different meaning in the two languages. In Erlang, it’s not an assignment operator but a pattern matching operator. Erlang takes a look at both sides of the “equation” and attempts to make sense of it. Since X is unknown up to that point, the only way it can reconcile the statement is to assign X the value of 42. 42 equals 42, and the world is at one with itself again.
Python takes the value 42 and applies the label X to it. I could easily move the X label to something else:
>>> X = "foo" >>> X 'foo'
What I can’t do is the same with Erlang:
4> X = "foo". ** exception error: no match of right hand side value "foo"
Erlang, quite rightly, complains because it knows that X has the value 42 (since we bound it to that earlier) and this definitely doesn’t match the string “foo”. If I try it against the value 42, I get the value 42 i.e. a match.
5> X = 42. 42
In terms of naming, variables in Erlang must begin with a capital letter, not lowercase:
6> x = 7. ** exception error: no match of right hand side value 7
The reason this failed is that Erlang has something called an “atom”, and atoms start with lowercase letters. An atom is effectively a constant that has a value of itself: it’s pre-bound, so you can’t assign it to anything.
7> x. x
The atom x is equal to x, which is why Erlang raised an exception when x was compared to 7.
Why would you want variables that can’t change? Well, for those of you who have done any concurrent programming you will know how tricky it is dealing with shared data. Functional programming languages like Erlang remove a whole class of concurrency-related problems by making data immutable. If the data is immutable, you can’t have one thread or process changing the contents of a variable while another is looking at it. Nor can a process overwrite the changes another process has just made. That means no need for locks and semaphores to protect data integrity, all of which have performance implications.
Another thing it makes easier is debugging. Once a variable is set, it stays set – defective code can’t update a variable it shouldn’t. You can also more easily prove an algorithm, and its implementation, is correct when you know that code is free of side-effects.
If you’re a Pythonista looking at this and haven’t given Erlang a go, I can recommend it as a way to try something different and expand the mind. You might even gain a fresh perspective of the way you write Python. If you’re already an Erlang programmer, I hope my understanding of the above is correct and any feedback is always welcome!
Erlang
August 26, 2009
If there’s one language right now that I really want to master, it’s Erlang.
A couple of years ago I was, like many, starting to realise the world is multicore. Concurrency is the big area any serious programmer will have to start dealing with sooner or later, if they haven’t already. Now, the problem for me is that multicore is a hardware / operating system thing. As a humble application coder, I shouldn’t need to concern myself with the underlying hardware and whether I’m running on a single processor or a dual quad-core somethingorother. I decided after leaving academia that hardware had pretty much become commodity and I should devote my time more towards the latest in software techniques.
Where am I going with this? Well, I still believe that for many situations, the developer shouldn’t need to know or care about what their code is running: a single core or many, something x86-like, something PowerPC or ARM-like. The hardware should be abstracted away, handled by the OS and the implementation language / compiler. Of course, that presents a problem because that kind of decision isn’t always easy to do behind the scenes.
Even though the concurrency debate has been raging since the 60s (maybe earlier), only now are many developers becoming exposed to the situation. Let’s face it, we’re not all well-prepared and we’re often reinventing the wheel or following dodgy avenues like threads that some consider the 21st Century equivalent of goto.
Meanwhile, the functional programming guys and gals have been sitting on the sidelines looking a bit smug. Side-effect-free programming tackles some of the nasty issues us imperative coders have to deal with, and makes concurrency a much more pleasant prospect. Functional programming is getting a long overdue resurgence in interest. Languages like Erlang, Haskell, F#, R or Scheme seem to be cropping up more and more, which can only be a good thing. I realised it was about time I dusted off my academic knowledge and made a serious effort to understand functional programming and see how I could apply the knowledge to concurrency.
My interest in Erlang came, funnily enough, from the Python community. I heard it mentioned many times in blogs, mailing lists and even in source code, but the big decider was at PyConUK last year. Erlang came up so much in conversation with other delegates and during Mark Shuttleworth’s keynote speech that I decided I really needed to take a look. I bought Joe Armstrong’s excellent “Programming Erlang” book, installed Erlang from MacPorts, and settled down for some quality programming over the Christmas holidays. It wasn’t quite the uninterrupted time I wanted, but it was enough to play around with the language and get a feel for things.
For me, Erlang’s approach to concurrency feels quite natural. I can express concurrency in a way that doesn’t bog me down with the low-level details like creating threads / processes or handling data integrity. The “Actor” model approach fits nicely in my mind and is conceptually analogous to object-oriented programming. Credit to the Wikipedia article on the Actor Model for helping me express that analogy!
The big surprises, however, were in areas outside concurrent programming. With a background in high-reliability systes, it’s no surprise the language has excellent support for writing fault-tolerant code and for inserting new code into a running system. It’s quite elegantly done and a real joy – I’d love to use this capability for projects in my daytime job, but it’s not going to be possible to push Erlang into the company for the foreseeable future.
While not quite CPAN or the Python standard library, Erlang comes (as part of the OTP) with a decent enough standard library and sites like CEAN fills in a few gaps. Someone has even written a library for Python and Erlang to talk to each other! I wasn’t expecting as much library support out-of-the-box, for some reason, so it was an unexpected bonus. I haven’t really explored these much, so I might end up cursing some deficiencies in future!
So anyway, writing this has spurred me into trying to find the spare time to write more Erlang code – the only way I’m going to improve my knowledge and one day master the language is practise, practise, practise!
Further reading: