This is part 3 of an irregular (honest!) series of posts as I tackle learning and using Erlang whenever I have a few spare minutes. The aim is to pick a small and/or trivial task that can be completed in a short amount of time and aims to shine a little light on different bits of the language. Feedback from both seasoned and novice Erlang programmers is welcome, as are contributions from fellow Pythonistas.
The challenge this time was to take a string and count the number of characters, returning a list of characters with their counts. The character counts are to be case insensitive, and the output doesn’t need to be formatted other than enough to make some sense.
In Python, I would write a few lines of code that would iterate over the string and add the results to a dictionary which stores the counts. As with the previous challenge of reversing a string, there is a recursive solution too, but it’s not very efficient or elegant. I’m deliberately not offering Python code this time in order to concentrate more on thinking in Erlang.
When I first approached the problem, I could already see how the recursive solution would be more elegant in Erlang. You have your string, which is really a list of integers, and split it into a head and a tail. Keeping efficiency in mind, I have to resist the urge to find the counts for the tail and then add the head’s character to the counts. Instead I opt for something that is, hopefully, optimised for tail recursion.
The first thing is to write a wrapper which takes a string and calls my character counting function:
count(String) -> dict:to_list( character_count(String, dict:new()) ).
I’m casting the dictionary returned by character_count into a list to make it more readable. Erlang will, however, show integers instead of the actual characters, but if I wanted an even better way to display the results I could do it here instead. However, I’m keeping it simple because I have limited time and this code is only intended to satisfy my own curiosity.
The simplest situation is when I have an empty string:
character_count(, Counts) -> Counts.
Because there are no characters to count I have no updates to make to the supplied dictionary, so I return the accumulated Counts.
Right, that’s the empty case handled, but what if I have a non-empty string? That seems pretty likely for something that will count characters in a string! Well, first off I’m going to have a clause that takes a string and splits it into a Head and a Tail:
My counts are intended to be case insensitive, so first I’m going to normalise my characters by uppercasing the Head:
Character = string:to_upper(Head),
Now I need to check to see if I already have a count for Character in the Counts dictionary. At first glance this looked like the perfect opportunity to use the Erlang if statement, until I realised it didn’t feel right: (note code is untested since it was never used)
if dict:is_key(Character, Counts) -> ...; true -> ... end
The true atom at the end bugs me a little, and others judging from various blog entries, but I understand why it is there and respect the need for it. Even ignoring this, the if expression doesn’t look or feel right. The case expression provides, in my opinion, a much more elegant expression of intent:
case dict:is_key(Character, Counts) of true -> ...; false -> ... end;
At this point we can flesh out the code that does the real work of updating the counts:
NewCounts = case dict:is_key(Character, Counts) of true -> dict:update(Character, fun(Value) -> Value + 1 end, Counts); false -> dict:store(Character, 1, Counts) end,
Thus my case statement handles the correct manipulation of the dictionary, depending on whether our Head character was present in the Counts dictionary or not.
Manipulation is probably the wrong word, to be honest. Being immutable, we don’t actually change the dictionary. The dict:store/3 function creates a copy of the Counts dictionary with the addition of a new Character key set to 1 (since it’s the first time we’ve counted the character).
The dict:update/3 does the same thing, it returns a new dictionary which has the value/count of Character incremented. At first glance, it looked a little strange in that it takes a function to map the existing value to a new one. This caught me out first time: you don’t give it the new value! The advantage is that I didn’t need to retrieve the existing value, assign the incremented value to a new value, then call update with the new value. The downside is that a simple replacement of the value still requires a mapping function.
Now I’ve updated my counts, I can make my recursive call to obtains counts for the tail of the string:
The final module looks like this, with the count/1 function being the one called initially:
-module(character_count). -export([count/1]). -import(dict, [is_key/2, new/0, store/3, to_list/1, update/3]). -import(string, [to_upper/1]). count(String) -> dict:to_list( character_count(String, dict:new()) ). character_count([Head|Tail], Counts) -> Character = string:to_upper(Head), NewCounts = case dict:is_key(Character, Counts) of true -> dict:update(Character, fun(Value) -> Value + 1 end, Counts); false -> dict:store(Character, 1, Counts) end, character_count(Tail, NewCounts); character_count(, Counts) -> Counts.