(This is based on a 15 minute talk for the London Python Code Dojo – slides available from SlideShare)

My interest in FluidDB began earlier this year when I attended a talk by Nicholas Tollervey at the London Clojure Dojo. I was expecting yet another talk about yet another non-relational database, but what I discovered was something different. The idea of a shared database storing “things” which anyone could tag with data seemed to be a rather powerful concept, yet simple and elegant. I thought it was a very cool and interesting idea.

But there was a problem.

How could people actually explore the “Fluidverse”? While people using FluidDB are building up conventions, such as the naming and content of tags, and there are tools out there to drill down through the hierarchies of tags and namespaces… there had to be an easier way to find the tags that were of interest to me. I decided I needed data in order to begin finding ways to explore… but what data?

So, while pondering the idea one evening, I was listening to the band Napalm Death and realised I had the answer. One of the things I love to do is find new bands, particularly extreme metal ones, and one way I do this is follow links between bands on Wikipedia. Wikipedia is a great source of band biographies, the content is under a Creative Commons license, and the band biographies often have lists of related bands and genres.

This seemed like a really good starting point to take data I’m interested in, build relationships between the data and give me something to start exploring with. I hacked together a scraper which used Napalm Death as a starting point and branched outwards in a “six degrees of separation” way, initially dumping the information directly into the FluidDB sandbox.

After a few runs, it made more sense to scrape to an intermediate file, and load that instead – allowing me to clean up typos, adjust names, amend tags and also allow me to regenerate the data in FluidDB’s sandbox without having to keep hitting Wikipedia. An example of the output format is as follows:

band:Burzum
  metaljoe/music/band_name = Burzum
  metaljoe/music/source_url = http://en.wikipedia.org/wiki/Burzum
  metaljoe/music/genre/black_metal -> Black metal
  metaljoe/music/genre/dark_ambient -> Dark ambient
  metaljoe/music/related_bands = ['Darkthrone', 'Mayhem', 'Old Funeral']

I wasn’t planning to release the source code, but have had some interest in it so I’ve decided to release it under the MIT license. You can find the code on my BitBucket account: http://bitbucket.org/metaljoe/fluidinyourear-scraper – note this is just the scraper code, not the loader.

With the data in place, I then needed to build something for exploring the relationships in the data. Enter “Fluid In Your Ear”, a very simple web application built around Python, Django and the excellent FOM (Fluid Object Manager) created by Ali Afshar. Given the nature of the bands, there is also a liberal application of Heavy Metal Umlauts – the power of which, courtesy of a particular Black Metal band, managed to crash the FluidDB sandbox a few times by exposing a unicode bug.

The application is deliberately very simple. I’m not a graphics genius (painting with real acrylic paints is my field), and at the moment it’s a basic core – you can browse genres and bands, and explore relationships between the two. I’ve already discovered some new bands through following the links, and re-discovered some older ones.

Due to the six-degrees nature, there is quite a lot that doesn’t fit into a metal or punk category which is quite cool. I’ve encountered a jazz musician called John Zorn who has crossed into hardcore punk and grindcore, to produce some outstanding music I would probably not have found before.

The source code is pretty grotty and the first casualty was a lack of tests. Shocking. In order to improve my confidence in the code and make it easier to refactor, I added unit tests using Django’s test harness and some functional testing using the Twill web testing framework. An example of the Twill test code is as follows:

# test missing genre
go http://127.0.0.1:8000/genre/progressive_vegetarian_grindcore
code 404

# test with trailing slash
go http://127.0.0.1:8000/genre/jazz/
code 200

# test without trailing slash
go http://127.0.0.1:8000/genre/jazz
code 200

# check page contents
find '<h2>Jazz</h2>'
find '<div id="related_bands">'
find '<li><a href="/band/Frank%20Zappa">Frank Zappa</a></li>'

So where next?

Well, first off is to get the application online so my plan was to port to Google App Engine. Unfortunately, I hit a few snags with the fact my app runs Django 1.2 and App Engine is using 1.1. I considered bundling Django in the app, but it became obvious that I’m not really using much of Django’s functionality – some URL routing and templates. The creator of FOM introduced me to Flask, a lightweight web framework, and it looks perfect for my needs. So I’m going to port to Flask and Google App Engine at the same time.

Another thing I want to do is add a JavaScript “social” layer over the top, allowing some of the richness of FluidDB to shine through and allow the addition of functionality not originally envisaged. I’m also hoping people will tag bands with ratings, annotations and the like with a hope to making recommendations possible.

In a similar way, I want the application code to be reusable and reskinnable so people can customise and create their own starting point. Maybe someone will produce a Classical In Your Ear in the future?

Source code is available from BitBucket, if you fancy a giggle at the clumsy bits: http://bitbucket.org/metaljoe/fluidinyourear – released under the GNU Affero GPL.

Out of the Comfort Zone

September 11, 2010

One of the best ways to learn new skills is to be out of our comfort zone. When we’re out of our comfort zone, the mind and body attempts to deal with the situation – we become more alert and adaptable, we have to think a bit more, we have to pay attention to what’s going on. After all, we can’t rely on the situation being safe and predictable because we just don’t know what might happen.

When I’m not programming, I love snowboarding. I’m fortunate in that I’m just a few minutes away from an indoor snow slope. It might not be a proper mountain, but I can go ride any day of the year. I’ve been snowboarding for years and am pretty good because I’ve taken the plunge and put myself out of my comfort zone on many occasions. I’ve progressed up the instructor levels, I’ve been on performance courses with top pros, I’ve done a bit of proper off-piste, I’ve even raced boardercross (despite being an injury wuss).

One notable occasion was enrolling on an instructor course in Australia a few years ago. I had no intention to teach, just to improve my technique. In the end, I found myself having to improve my skills in front of people who were much better than me, and do something I feared most: public speaking.

All I wanted was the ground to swallow me up, but I didn’t have that luxury – I had to push myself harder than everyone else, and I had to just do it. Stand up in front of a group of strangers and teach them to snowboard. Keep up with people who had many more days on snow than me. Some had been instructors before. Nothing makes you learn fast like trying to match people way above your skill level.

I still get nervous before every lesson, but I found I enjoyed it a lot. It feels like it exercises different parts of my brain to those I use when programming or even snowboarding for my own enjoyment. The brain specialists might disagree with me, but you get what I mean.

However, there is a limit to how far we can go outside our comfort zone before the benefits disappear. And they can disappear very quickly. I’d never dream of taking a complete beginner to the top of a black run. They would be so far out of the zone that fear would overwhelm any opportunity to learn – it would also be incredibly dangerous and probably put them off ever going on a mountain again.

One of the things I love about code dojos, like the Python and Clojure dojos I attend each month in London, is that they provide a safe environment to move out of the comfort zone and learn new skills. Mistakes are all part of the learning process, yet we often don’t embrace them. I’m a firm believer that people learn most when things go wrong, because we’re taken out of that nice, comfortable, step-by-step process. When we make a mistake, we have to start thinking again to correct the mistake.

The trouble is that people don’t like to admit they’ve made a mistake because they feel they’ve lost credibility against their peers. But their peers are in the same situation. If no one wants to make a mistake, no one learns because of that fear of public failure and embarrassment. Dojos are great because everyone is there to learn new things and make mistakes in the process. Any mistakes you make stay in the dojo – it’s not production code, nor is it intended to be.

So where am I going with this? Well, my next out-of-comfort-zone situation is taking the skill I learnt as an instructor and bringing it into my full-time career: public speaking. Teaching beginner snowboarders is, relatively speaking, easy because they mostly don’t know anything about the subject. If I make a mistake, no one will notice. I get a big fear teaching other instructors because they know the subject, they know when you make a mistake.

But that’s a good thing right? They can let you know the mistake and help you fix it. It also forces you to practice and improve so you don’t make those mistakes – pushing that comfort zone further out.

I want to do the same thing with technical talks. The idea of standing up in front of other programmers and talking tech fills me with dread. Yet I know it’s exactly the same situation I was in when I first stood on an Australian mountain and tried to explain part of a lesson to a group of experienced snowboarders. After a few such experiences, I ended up loving it because it taught something new to my audience and because it helped me to improve my own understanding of the subject.

So my intention is to try to present a few talks over the coming months at events like the dojos, where I can make mistakes in safety. My goal is to present at one of the Python conferences. I was talking to one of the organisers of PyCon AU when I was at EuroPython this year, and he mentioned about presenting, which is where the whole process was set in motion. I’d like to attend PyCon AU next year, combining it with a visit to my brother in Sydney. Presenting would be both great fun and terrifyingly scary at the same time, but if the Aussies in the audience are as supportive of a nervous Pom as those Aussie instructors-to-be were, I should be just fine.

Tips, tricks and encouragement would be very welcome.

Python Code Kata 4

April 10, 2010

I’ve started working through Dave Thomas’s Code Katas, as previous readers to this blog will be aware. I tackled the previous kata (Code Kata 2: Karate Chop) in Clojure as a way to help learn Clojure. I was planning to attempt the next kata in either Clojure or Erlang, but decided I would implement a solution using my usual language: Python.

The kata is in three parts, and I had to resist the urge not to read ahead and spoil the process. You can find the details of the kata at http://codekata.pragprog.com/2007/01/kata_four_data_.html

Part 1 involved reading values from a weather data file. The data in the file looked like a plain text conversion of an HTML page and had a bit of surrounding fluff that needed to be ignored in order to parse the tabular data. Through sheer laziness, I opted to use a regular expression to match lines with the data I was looking for and scoop the data out.

Part 2 was a similar task, but with football data. In this case, using the regular expression was more of a pain so I decided to split each line on whitespace and use only those rows that could be split into 12 items. All the rows I was looking for matched this criteria and it was quick to do. Unlike the first program, I ended up with more fields than I was planning to need.

Now comes the interesting bit: part 3. The task was to take the two programs and factor out the common parts.

To what extent did the design decisions you made when writing the original programs make it easier or harder to factor out common code?

In both cases, the code was in two functions. The first function parsed the data file, returning a list of valid rows. The second function took the list and identified the row which had the smallest difference of two values. There was a definite duplication of code in both, with differences in logic. The parsing functions iterated over the files the same, but the data extraction differed sufficiently. The other function was much more similar between the two programs.

Take the two file parser functions. For part 1:

def parse_file( filename ):
    data_file = open( filename, "r" )

    data = []
    for line in data_file:
        match = DATA_LINE_RE.match( line )
        if match:
            dy  = int(match.group(1))
            mxt = int(match.group(2))
            mnt = int(match.group(3))

            data.append( (dy, mxt, mnt) )

    return data

and part 2:

def parse_file( filename ):
    data_file = open( filename, "r" )

    data = []
    for line in data_file:
        columns = WHITESPACE_RE.split( line )
        if len(columns) == 12:
            data.append( columns )

    return data

The core of both functions can be distilled down to this:

def parse_file( filename, extraction_function ):
    data_file = open( filename, "r" )

    data = []
    for line in data_file:
        extracted_data = extraction_function( line )
        if extracted_data:
            data.append( extracted_data )

    return data

The domain-specific logic is now encapsulated in a function passed to parse_file. I can easily substitute new logic for future problems by passing in a different function. The extraction function should return either a tuple/list or a value that equates to false if no data could be extracted. I’ve used None in my code but an empty list would also suffice if it made sense for the problem at hand.

The second function in part 1 looks like this:

def print_day_with_smallest_spread( data ):
    smallest_day    = None
    smallest_spread = None

    for dy, mxt, mnt in data:
        spread = mxt - mnt
        if smallest_spread > spread or smallest_spread is None:
            smallest_day = dy
            smallest_spread = spread

    print smallest_day

In part 2 it looks like this:

def print_team_with_smallest_goal_difference( data ):
    smallest_team      = None
    smallest_goal_diff = None

    for row in data:
        goal_diff = abs( int(row[GOALS_FOR]) - int(row[GOALS_AGAINST]) )
        if goal_diff < smallest_goal_diff or smallest_goal_diff is None:
            smallest_team      = row[TEAM_NAME]
            smallest_goal_diff = goal_diff

    print smallest_team

There are some small differences, but the logic is essentially the same for both. The use of absolute values in the goal difference is a notable deviation from that used in the first procedure. However, while above what is required for the weather data it is also valid. I toyed with the idea of leaving this, but the change would mean improved commonality and make the factoring out easier: without changing the result. This was the deciding point and I made the change.

The extraction of values from the data list was the only other differing point, but both were doing the same thing. Some little tweaks resulted in a procedure suitable for both:

def get_item_with_smallest_difference( data, label_index, value1_index, value2_index ):
    smallest_label = None
    smallest_diff  = None

    for row in data:
        diff = abs( int(row[value1_index]) - int(row[value2_index]) )
        if smallest_diff > diff or smallest_diff is None:
            smallest_label = row[label_index]
            smallest_diff  = diff

    return smallest_label

Was the way you wrote the second program influenced by writing the first?

Definitely. It’s difficult not to be influenced by code you’ve written, or seen, before – especially when you have implemented solutions to two similar problems in close proximity. I was about to approach the second part in the same way as the first, using a regular expression to extract the data I needed. However, I realised that this would yield a large and unwieldy expression, so I decided to go for an approach I had briefly considered for the first part: split each line on whitespace to yield a list of values. I ended up with values I wasn’t going to use, but it was a cleaner approach for that particular problem.

Is factoring out as much common code as possible always a good thing? Did the readability of the programs suffer because of this requirement? How about the maintainability?

It’s usually a good thing, but in software development there are no such things as rigid rules. Readability of the refactored programs has some plus/minus points. The core of the problem was made more obvious from the refactoring, but the naming of the function to determine the smallest difference became more generic. That generic name is perfectly fine, but it might’ve made more sense to wrap the call in procedures named better for the specific problem domain.  The idea of constants, to aid readability, in the second solution (to identify fields we’re interested in) was transferred to the first solution which was a useful touch.

Overall, maintainability improved slightly. The duplicated code is factored out and can easily be reused for writing code to solve other problems of a similar nature. If a bug is found in one program, there’s a good chance it’s in the common code and thus can be fixed in both easily. Likewise, I could make performance improvements that would be applied to both easily. By splitting the domain-specific logic out from the generic core of the solution, I made the code easier to test by providing a discrete function for the logic, as well as a way to inject a fake or mock function into the parsing code.

Kata solutions for this and the previous kata are available under the MIT License from BitBucket: http://bitbucket.org/metaljoe/metaljoe_codekata/

Lying ill in bed probably isn’t the most conducive environment for programming. I’ll spare you the full medical details, but in a strange way it put me in an evil mood… the kind of mood that seemed fitting to give macros a go.

I’ve never really seen the point of macros, but then again my experience of them has been the watered-down variety featured in languages like C. Reading up on macros in Scheme and Clojure more recently, macros come across as predominantly a way to do evil things. I say predominantly, because I accept that there are very rare occasions when a macro makes profound sense. Please note that the fifth solution I am about to present is definitely not an example of a good reason.

Last week, I was re-reading Jon Bentley’s excellent Programming Pearls book. I had been trying to avoid reading anything about binary searches while working on the code katas, so I picked a completely inappropriate book because binary searches are featured quite a bit. With my brain now sullied by a classic binary search algorithm, I went and undertook the fourth solution and tried to be a bit radical. It failed.

For the fifth one, I decided not to come up with my own solution but try a couple of things. The first was to implement Bentley’s algorithm in Clojure, and the second was to try and make it more compact through the use of macros – as a cheeky excuse to give them a try. The algorithm is already pretty elegant anyway, so I wasn’t sure what I could improve.

As ever, the code is on BitBucket at: http://bitbucket.org/metaljoe/metaljoe_codekata/

First thing was to implement the original algorithm in Clojure: (edited from changeset 9b3ef06116fd for clarity)

(defn chop [ item list ]
  (loop [ lower 0
          upper (- (count list) 1) ]
    (let [ mid (quot (+ lower upper) 2) ]
      (if (> lower upper)
        -1
        (if (< (get list mid) item)
          (recur (+ mid 1) upper )
          (if (= (get list mid) item)
            mid
            (recur lower (- mid 1))))))))

Okay, the first thing that stands out is the let, which holds the value of mid. Wouldn’t it be great to get rid of that? However, copying the code for mid back into the references would bloat the code out. I could use a function, but I’d be passing three parameters and it wouldn’t aid readability much. I would probably leave it for production code, but since I have a macro hammer in my hand, this looks like a macro nail waiting to be banged.

(defmacro mid []
  (list 'quot '(+ lower upper) 2) )

This will replace mid with the appropriate code in the macro. I create a list, all Clojure code is a list, and turn off parsing for quot and the lower + upper addition. If I don’t do that, they will both be evaluated in that context and will fail because neither are available in the namespace. I feel bad about these references, but this is just an exercise. I also realised at this point that list already exists in Clojure, and I should’ve used a different name for the argument to chop!

Once I’ve created the macro, I need to adjust the references to mid in the code. The references need to be wrapped in parentheses to avoid an error.

The next observation is that “(get list mid)” is repeated twice. It hasn’t quite reached the three-strikes rule, but it is duplication. I could set a variable to hold the value of the list at mid, or I could write a function to do it. The former would add an extra line to the code, and I would like to reduce the amount of variables if possible, especially as I’ve just gotten rid of one. The latter option wouldn’t offer much advantage because I’d still be passing list and mid to a function, and probably one with a less concise name than “get”. Basically, it’s another contrived situation for me to try a macro – though I would not do this in production code.

(defmacro get-mid []
  (list 'get 'list '(mid)))

This follows the same pattern as the mid macro – and even calls said macro.

(defn chop [ item list ]
  (loop [ lower 0
          upper (- (count list) 1) ]
    (if (> lower upper)
      -1
      (if (< (get-mid) item)
        (recur (+ (mid) 1) upper )
        (if (= (get-mid) item)
          (mid)
          (recur lower (- (mid) 1)))))))

Not a massive improvement to the code readability, and I’m not entirely happy with get-mid as a name. The nicest thing is that the let statement has been removed.

How about dealing with the statement to initialise the upper variable? Shunning a more obvious function…

(defmacro get-upper []
  (list '- '(count list) 1 ))

Okay, time for one more thing… well, two actually. The recur statements are fine, but perhaps we could make them more declarative?

(defmacro look-left []
  (list 'recur 'lower '(- (mid) 1)))

(defmacro look-right []
  (list 'recur '(+ (mid) 1) 'upper ))

This changes the loop code to read:

(loop [ lower 0
        upper (get-upper) ]
  (if (> lower upper)
    -1
    (if (< (get-mid) item)
      (look-right)
      (if (= (get-mid) item)
        (mid)
        (look-left))))))

In fact, while typing this up, it looks odd now because it’s defining two variables that are apparently never updated.

First off, let’s remove the last visible references to lower and upper, using (surprise!) a suitably named macro…

(defmacro search-exhausted []
  (list '> 'lower 'upper))

Now time to deal with that loop statement. Well, my plan was to create a macro like this:

(defmacro do-search [ & body ]
  `(loop [ ~'lower 0
           ~'upper (get-upper) ] ~@body
     {:lower ~'lower :upper ~'upper }))

This macro is based on Stuart Halloway’s evil-bench example from the Programming Clojure book code, which is very evil. But since this kata solution is all about evilness, I only feel slightly guilty.

The aim was to replace the loop with code that looks like this:

(do-search
  (if (search-exhausted)
    -1
    (if (< (get-mid) item)
      (look-right)
      (if (= (get-mid) item)
        (mid)
        (look-left))))))

Unfortunately, this raises the following exception:

Exception in thread "main" java.lang.UnsupportedOperationException: Can only recur from tail position (code_kata_2e.clj:64)

The macroexpand and macroexpand-1 functions weren’t giving me anything useful, probably because they were overwhelmed by nastiness. Oh well, not to matter as I don’t exactly grok macros just yet and I’m happy overall with what I’ve learnt this time. The code is on BitBucket, so if anyone with better macro development skills can point out the problem, please let me know.

“The programmer builds from pure thought-stuff: concepts and very flexible representations thereof. Because the medium is tractable, we expect few difficulties in implementation; hence our pervasive optimism. Because our ideas are faulty, we have bugs; hence our optimism is unjustified.”

- Frederick Brooks, The Mythical Man-Month

I was on an evening train home when, re-reading The Mythical Man Month for the first time in almost ten years, I came across the above words. It summed up my fourth attempt at the Karate Chop code kata: because our ideas are faulty, we have bugs.

My fourth algorithm idea was faulty, I just didn’t quite want to admit it. My ideas were faulty because of lack of mental preparation. I was tired (I’ve not been sleeping well for a while now), distracted, and in a hurry to get past the Karate Chop kata and move onto another challenge. I didn’t think things through when I started, I just fired up Aquamacs and began to type.

The idea was simple enough – strip the algorithm down of a lot of state and just “bounce” from position to position within the list. Starting from the 0th element, we determine an offset which is equal to half the length of the list. The offset is added to the starting point to determine the position to check.

If we don’t match, we halve the offset and either add or subtract the offset from the current position to yield our new position, depending on whether the item we checked was higher or lower in value than the one we’re searching for.

It sounds simple enough, and plausible enough, so I really should’ve run a quick check in my head or on a sticky note to verify it actually worked. The fact I don’t recall seeing this process before should’ve tipped me off to needing to double-check it.

Thing were fine to begin with, except for one failing test. It was the first off-by-one error I had encountered so far during the Karate Chop implementations, I’m rather proud to say, so it seemed to be quite a simple thing to fix. Oh was I wrong: a simple change resulted in multiple failed tests. Hasty debug statements (I’m guessing I’ll need to learn a Java debugging tool for Clojure development) began to highlight much weirdness with each loop through the search. Values didn’t seem to tally to the expected values, and the search position was often wildly off.

A hasty bit of refactoring to eliminate duplication and side-effects as possible causes didn’t offer much improvement, though I learnt to apply some interesting bits of Clojure in the process. In the end, I had a pretty good idea that the algorithm was flawed, but I kept convincing myself that things like padding the list or adding special cases would massage it into place.

Some days you just have to down tools, stand back, learn from your experiences and move on. The idea started off simple but started to grow… and smell.

In the current form, the algorithm fails on three tests:

(assert (=  0 (chop 1 [1 3 5]) ))
(assert (=  2 (chop 5 [1 3 5]) ))
(assert (=  0 (chop 1 [1 3 5 7]) ))

The bit that stood out straight away was that they were all tests for a value on the boundary of the list. Analysis of the code determined that the search algorithm never reaches the first and last elements in the list. And padding the list doesn’t help either.

In the first test case, our initial offset is 1 since our list consists of three elements, and integer division of three by two yields 1.

Position 1 of the list (3) is not what we’re looking for, so we halve the offset (integer halving of 1 yields 0). 3 is greater than the value we’re looking for, so we subtract 0 from the current position i.e. we stay where we are. The algorithm detects the zero offset, and evaluates to not-found. See the problem?

Searching for 5 in the same list has the same problem, except we try to add the zero offset to our current position with identical result.

What they both should do is try an offset of 1 again, but that wouldn’t involving halving the offset each time.

The four element list has an initial offset of 2, which yields the value 5 at list index 2. We know to check the lower half of the list, so we halve the offset to yield 1 and then subtract that from the current position. Our new position yields the value 3, which means we have to search left again. Alas, our offset of 1 is halved to 0 and the search terminates. We never get to check that first element in the list.

Going through the problem on a sticky note diagnosed the problem really easily, and working through for 5, 6 and 7 element lists made me realise that the maths would need a reworking to cover those edge cases. Typing up my experiences just now, I’ve even had a thought about one way to solve the problem without getting too complex… but I will revisit another time.

All-in-all, this was actually the most interesting experience so far during the kata. We often learn a great deal from our mistakes than our successes, failure can be good for us sometimes – it takes us out of our comfort zone, which is where true learning happens.

As ever, source for this and the previous solutions is available from http://bitbucket.org/metaljoe/metaljoe_codekata/ under the MIT license. If you want to know more about the Karate Chop code kata, head over to http://codekata.pragprog.com/2007/01/kata_two_karate.html and take a look.

This is my third Clojure-based solution to the Karate Chop kata detailed at http://codekata.pragprog.com/2007/01/kata_two_karate.html

The write up is going to be a little shorter than before, because I’m pretty tired as I type this. As always, source code to the solution is available from BitBucket at http://bitbucket.org/metaljoe/metaljoe_codekata/ under the MIT license and feedback is definitely always welcome.

For this solution, the aim was to produce a concurrent version of the code. Surprisingly, this proved to be quite tricky to write compared to implementing an equivalent concurrent solution in a language like Python or Erlang. The main problem was that the “Programming Clojure” book apparently missed a few important details about Agents (Agents can’t call Agents, shutdown-agents needs to be called for cleaning up). I got rather frustrated with the documentation and examples available, with the code raising assorted semi-cryptic Agent-related exceptions that weren’t adequately explained anywhere.

After a considerable amount of time on Google, and a drastic scaling back of my intended implementation, I finally created a working solution this evening.

The original aim was to keep dividing the list in two, spawning a new Agent/thread to investigate each new list by repeating the splits. This would continue until a match was found or an empty list reached. Not very efficient or elegant, but a way to incorporate an element of concurrency into the kata.

Since Agents can’t call Agents, the first major hurdle I had, I ended up running just two Agents: one to check the left half of the initial list, the other to check the right. The spawning of further Agents was eliminated in favour of an inefficient recursive check of each half of these lists, and so on. I opted to keep this style of checking, to “simulate” what my original implementation would have operated like.

This worked well, but the script would not quit until killed with Control-C. After another search through the book and docs, I found a reference to calling shutdown-agents. This only worked when called at the end of the script, not within the function calling the agents for various reasons.

I also took the opportunity to try and make a little more legible approach, as well as try out the split-at function.

Now I need sleep.

This is the second attempt at a Clojure-based solution to the Karate Chop kata detailed at http://codekata.pragprog.com/2007/01/kata_two_karate.html

Source code is available from BitBucket at http://bitbucket.org/metaljoe/metaljoe_codekata/ and is offered under the liberal MIT License. Since the aim of a kata is to learn and become a better programmer, feedback is always welcome.

Since the previous solution became a recursive one, my aim for this attempt was to work on an iterative implementation. Concerned about the rather ugly feel to the recursive code, I was also looking to produce something a bit more compact, a bit more elegant.

The code actually came to me pretty quickly compared to the first attempt. One advantage is that the problem domain is clearer on the second attempt, my brain has had time to consider the pitfalls and map out a more concise understanding of the problem. The other factor is that writing my own Clojure code, as opposed to quick REPL exploration or fill-in-the-blank exercises, has become more natural. I have begun to start thinking in Clojure. Note that I said “have begun to” – I still have more work to do on that front.

As far as implementation went, it was pretty smooth until I hit one snag: an infinite loop. It took a while to figure out what was going on, until I spotted a glaring issue. If the slice/partition of the list I’m looking at reaches one element, I check it for a match. If it didn’t match, I would loop around and search it again. An embarrassing newbie error, but a useful mistake to make. I added the appropriate check and all the tests passed.

After completing the implementation I spotted the second mistake when I reviewed the code later on. I was testing for an empty list on each iteration. While it has no impact on the output, it’s an unnecessary check for the implementation – if the list does suddenly become empty, we have bigger problems!

One thing that did intrigue me was that with almost no effort, the solution was much smaller and more readable than the first approach. I usually find the opposite with recursive versus iterative code, but in the case it’s probably because the problem seems more natural expressed iteratively. Or maybe my recursive approach just sucked :-)

Here’s the completed algorithm, with tests removed and the code tidied for clarity. See the Mercurial repository at BitBucket for the original source.

(defn chop [item list]
  (if (empty? list)
    -1
    (loop [ min 0 mid (quot (count list) 2) max (count list)]
      (if (= item (get list mid))
        mid
        (if (or (= mid min) (= mid max))
          -1
          (if (< item (get list mid))
            (recur min (quot (+ mid min) 2) mid)       ; look left
            (recur mid (quot (+ mid max) 2) max))))))) ; look right

If I broke my rule to revisit the code, I’d change a few minor things for readability. For example: why did I put those two comments there, and not elsewhere? The loop initialisation looks a bit of a blur at first glance, the or statement might need to be wrapped in an intention-revealing function name. However, the code is good enough for the task at hand and that’s all that really matters.

I thought it might be interesting to work through the code katas created by Pragmatic Programmer Dave Thomas to assist with my Clojure learning. If you’re looking for Kata One, don’t worry about searching – I started with Kata Two as I needed an actual coding exercise.

The instructions for the Kata can be found at: http://codekata.pragprog.com/2007/01/kata_two_karate.html

The task is to implement five different binary search algorithms, one per day. I don’t actually have five consecutive days to do this, so don’t expect the five parts to be written about each day. There will be gaps.

It’s been about 15 years since I wrote a binary search algorithm, using the language Modula-2. I was toying with the idea of going off and reading up on binary searches, but I decided the best way to approach it was from first principles – only allowing myself access to a Clojure reference manual. After all, I go on about the importance of learning basic algorithms even in this day and age… so about time I put my money where my mouth is!

The trickiest part of the kata wasn’t so much coming up with the algorithm, an iterative binary search is easy, but expressing the algorithm using Clojure’s syntax. That’s mainly the inexperience with the language showing, but the resulting code didn’t feel too elegant. In particular, the parentheses became more of a hindrance than a help – they start to look plain ugly, get in the way and I don’t think I’ve ever relied on my IDE’s parenthesis matching so much. I still don’t agree with Neal Stephenson that LISP is the only language which can truly be described as beautiful… despite all the power and flexibility it can yield in the right hands.

I may not have reached Clojure enlightenment yet, but gradually I built up a rhythm with the coding. Once the tests passed, I decided to stop rather than try to rework the code into something that looked and felt better. Take the lessons learnt and apply to the next algorithm – the work for this one has already been done.

First step with the coding was to create the tests. Although not a fan of Python’s doc tests, I opted to use Clojure’s equivalent for quickness – and I’ve decided I’m not a massive fan of them either. Moan, moan, moan. ;-)

(defn
 #^{:test (fn []
 (assert (= -1 (chop 3 []) ))
 (assert (= -1 (chop 3 [1]) ))
 (assert (=  0 (chop 1 [1]) ))

 (assert (=  0 (chop 1 [1 3 5]) ))
 (assert (=  1 (chop 3 [1 3 5]) ))
 (assert (=  2 (chop 5 [1 3 5]) ))
 (assert (= -1 (chop 0 [1 3 5]) ))
 (assert (= -1 (chop 2 [1 3 5]) ))
 (assert (= -1 (chop 4 [1 3 5]) ))
 (assert (= -1 (chop 6 [1 3 5]) ))

 (assert (=  0 (chop 1 [1 3 5 7]) ))
 (assert (=  1 (chop 3 [1 3 5 7]) ))
 (assert (=  2 (chop 5 [1 3 5 7]) ))
 (assert (=  3 (chop 7 [1 3 5 7]) ))
 (assert (= -1 (chop 0 [1 3 5 7]) ))
 (assert (= -1 (chop 2 [1 3 5 7]) ))
 (assert (= -1 (chop 4 [1 3 5 7]) ))
 (assert (= -1 (chop 6 [1 3 5 7]) ))
 (assert (= -1 (chop 8 [1 3 5 7]) )) ) }
 chop
 [item list]
 ...

I made the chop function a wrapper around the real action, meaning I could reuse it for different algorithms easily as well as keep the implementation separate. I opted to call a function called bsearch, passing the list of values, the value to search for, and a start and end index to define the “partition” under examination. The initial start and end indexes set the partition to be the whole of the list – since we have no idea where the value is to begin with.

(bsearch list item 0 (count list))

The initial check in bsearch is for an empty list. If we have nothing to search, we know for sure that the value doesn’t exist in it.

Next, we take our partition, or slice as I called it in the code, for examination. If the slice contains one item, we have a simple case to check: either that item is the one we’re looking for, or it isn’t.

If we have more than one item, our proper binary search comes into play. We find the mid-point and check the value of that item. As with the single item list, it’s either the value we want, or it isn’t. If it isn’t, we split the slice on that point and search the “left half” (values lower than the mid value) or the “right half” (values higher than the mid value) accordingly.

Although my intention was to do an iterative algorithm, Clojure’s syntax nudged me into a more recursive approach. For the tests supplied, blowing the stack is unlikely but for larger lists the algorithm won’t be the most efficient approach.

A couple of helper functions first:

(defn is-single-item? [ list ]
 (= 1 (count list)))

(defn mid-index [ list ]
 (quot (count list) 2))

These were just to make a couple of spots in the code slightly more readable at a glance.

And the bsearch function in all its nested glory:

(defn bsearch [ list value slice-start slice-end ]
 (if (empty? list)
  -1
  (let [ slice (subvec list slice-start slice-end)]
   (if (is-single-item? slice)
    (if (= value (first slice) )
     slice-start
     -1)
    (let [ mid (mid-index slice) ]
     (let [ mid-value (get slice mid) ]
      (if (= value mid-value)
       (+ slice-start mid)
       (if (< value mid-value)
        (bsearch list value slice-start (- slice-end mid))
        (bsearch list value (+ slice-start mid) slice-end)))))))))

Check out the multiple closing parentheses. That just looks wrong, but I see lots of example Clojure and LISP code like that. When writing the code, I ended up writing the parentheses in alignment for readability, then when I finish coding I reposition the closing parentheses to match the style above. I can’t help but think it’s a code smell indicator.

Certainly an interesting exercise and my first go at writing Clojure code from scratch, not following along with the Programming Clojure book. Definitely has room for improvement.

Now, how am I going to write my second binary search algorithm…?

London Python Code Dojo #6

February 5, 2010

Last night was the sixth London Python Code Dojo, and the third I have attended. Organised, as ever, by Nicholas Tollervey, hosted by Fry-IT and attended by many.

Unlike the previous dojo, the evening was run with one laptop hooked up to a projector and with a pair of programmers “on stage” for ten minutes to undertake a task. The original plan was to code for a maximum of ten minutes or until a unit test passed, then the driver-programmer would step down. Somewhere along the line we lost sight of testing aims. We also lost half the group to a discussion at the back of the room, while the front rows were engrossed in the problems at hand. Volunteers to join the pair on stage were a little sparse, and I must admit I held back a bit longer than I should have – once I was up, I had a throughly enjoyable ten minutes as Ciaran’s co-pilot, and was then very sorry when my own ten minutes in the hot seat were up!

So, what happened?

Firstly, the tasks of the evening were based around integrating the best core components of the previous dojo’s many solutions. A data file format and parser from one solution, the Cmd-based command line code from another, that sort of thing. While an interesting exercise in itself, it perhaps lacked the need for the creative approaches used last month. Work was primarily a copy and paste job, followed by some smoothing of the edges, with little need to sit down and think through a higher plan of action. The test-driven development idea got lost along the way too, and I admit furthering that loss once it had happened.

Secondly, group size had its effect. I don’t have a head count, but we filled the room as usual so probably about 20 people. It was interesting that those at the back gradually drifted into their own discussions and even ended up on laptops not looking at what was going on at the front. Even with a projector, it can be difficult to see what was going on and listen in easily to the conversations at the front between driver, co-pilot and hecklers. We use the term “hecklers” in an affectionate way in the dojo – there’s no malice or negativity involved with comments from the audience. I wonder if those who wrote the code being transplanted at a given time lost interest, since they knew the code already?

Extending from group size was group dynamics. With smaller teams like last week, it was easier for people to contribute: less intimidating than being in front of everyone, easier to have your say, easier to be engaged more. However, the single laptop approach does prevent hogging of the keyboard, and we’re all working on the same thing. Swings and roundabouts.

We finished with a wrap up of the situation and where we can progress for the next dojo. The team-based approach is going to make a return, and the delegation of different tasks to different teams sounds like fun – we’re still all working on the same final goal (an adventure game), the same code base, but we’re breaking up the chunks of work. That in itself will bring in some interesting extra skills such as integration and inter-team communication.

I think it was an important learning process, and one we needed to make this early in the project. Sometimes you need a setback to make you understand the dynamics of a situation and achieve a breakthrough. When I teach snowboarding, I often find those who go through a bit of a regression phase, a bit of a setback, come out with a much better technique and understanding than those who breeze through the lesson. We learn more from our mistakes than our successes.

Despite the problems, it was still a good evening – I had fun, met loads of cool people, got taken out of my comfort zone, learnt some new things. That’s a positive result in my opinion. I was rather excitable and involved in what was going on, so didn’t really notice what was happening until I was in the hotseat and could look out on the group.

The next London Python Code Dojo will be on Thursday 4th March. If you’re a Pythonista in London, come along and take a look – there’s free pizza and beer/Coke too!

A Night At The Dojo

January 8, 2010

Yesterday evening I attended my second London Python Code Dojo, and the first where any coding was involved.

After devouring the free pizza and beer (or Coke in my case, being a teetotaller), we opted to split into equal-sized groups and tackle the rather interesting problem of building a text adventure game. Now, building such a game in the space of an hour is a little unrealistic so the decision was made to attempt the simplest thing that would work: namely, navigating around our game world using the four directions of the compass.

Despite the fact my team didn’t complete the task, due to some hasty debugging in the final few minutes of the hour, I must admit that we still produced something to be proud of. We took a couple of different approaches to the other teams, which contributed to holding us back but at the same time yielded a rather satisfying result.

Whereas the others were opting for pre-generated game maps, some teams even splitting into dedicated data and game engine teams, we decided to go with the idea of creating our map on the fly. This meant we didn’t have to bother creating any initial game data other than a starting room in our dungeon, with random doors leading off this room. As our hero progressed through the doors, we would create new rooms leading off the door – taking care to ensure the rooms matched up if we backtracked or went in a circle. Okay, a little more complex than it needed to be, but a rather bold and enjoyable solution.

Despite having pair programmed before, I did have a sudden brain failure when I found myself in the hot seat. It was the first time I’d paired up with people I didn’t know before (even though a colleague of mine was also in the same team as me), and more importantly it wasn’t so much a pair programming session as a group programming session. It was also interesting being in front of an unfamiliar development environment – I’m not a fan of VI at the best of times, and my brain totally gave up on me with the key bindings. Embarrassing! I was definitely out of my comfort zone… which is actually no bad thing, because it forces me to think and deal with the situation.

It was great mixing with different developers, bouncing ideas off new people, being taken out of my comfort zone and tackling something I wouldn’t ordinarily have thought about working on. At the end of the evening we all demoed our work (or traceback) and ran through the code really quickly. You get to see all the different approaches, pick up interesting ideas and learn about other people’s coding styles. I also learnt about the Python Cmd module! (And was a bit surprised to find a lot of people there knew it already – I felt a bit of a n00b)

Code dojos are great fun, and an excellent way to meet fellow developers, hone your skills, learn new ones and get yourself out of the comfort zone / rut.

My thanks to organiser Nicholas Tollervey, the guys at Fry-IT for providing the room (and essential pizza), and everyone who attended. Hopefully see you all at the next one! (Well, probably the next London Pyssup first)

Further reading:

Follow

Get every new post delivered to your Inbox.

Join 428 other followers