Monday, November 14, 2011

Centroids

Here's another one from my clustering code, slightly modified for explanatory purposes. This post was spurred by a conversation with a couple of friends when we were talking about programming, who said that programming languages mostly had the same things (i.e. everything old is new again). While I'm not completely in opposition to that, I think that some of the newer languages (read Ruby) do have a lot to offer when it comes to compactness.

Here the problem is to find the centroid of a number of points. Basically, we want the averages of all the first values of the points, the second values of the points, etc. Let's show an example ...

If we have [[x0, y0, z0], [x1, y1, z1], ... [xn, yn, zn]] as our points then the centroid would be the point given by [sum(x's) / n, sum(y's)/n, sum(z's) / n] (and here I'm too lazy to figure out how to create the capital Sigma for sum). Think about how you might solve this in your favorite programming language. In C/C++ anyway, you'd need to know in advance how many points and how many dimensions (the example above uses three, but it could be more or less). In Java, you don't need that, but I don't think you could do something like this ...



def centroid(x)
x[0].zip(*x[1..x.length-1]).map { |v| v.inject(:+) / v.length.to_f }
end


OK, so what's going on here? First up we're going to take the first element of the array (above the [x0, y0, z0]) and zip it with the rest of the array with *x[1..x.length-1]. This should give us, once again from the example above [[x0, x1, ...xn], [y0, y1, ... yn], [z0, z1 ... zn]]. So basically, a new array with all the x's gathered together, the y's, etc. Now comes map. Here, we'll take each element, which is itself an array and then use inject to run through each of the items, say [x0, x1, ... xn] and sum them together. Finally, we'll divide by the length of the element to get an average. If it's hard to see this, break it into pieces and then run it through using irb. It should be a bit easier to see there.

So, what's the point here. Well, essentially you could this in any programming language or in fact any Turing machine, but the language itself can make it easier freeing you to do other things or harder.

Let me know if you have any questions or comments.

Friday, November 11, 2011

Monty Hall Problem

A friend of mine got asked this question in an interview yesterday and I thought it'd make an interesting programming problem. It goes something like this ... Suppose you're on a game show and are offered the choice of one of three doors. Behind one of the doors is a car and behind the other two are goats. Let's say you choose door number three (see Jimmy Buffet song). At this point Monty Hall shows you that behind door number one is a goat and offers to let you switch to door number two. The question is, should you do it? The easy (and incorrect answer) is that it doesn't matter. The correct answer is that, yes you should. Why? I'll let you look it up or you can just run the following program ...


possible_doors = [:goat, :goat, :car].permutation.to_a
bad_choice = 0
tries = 100000
1.upto tries do |mc|
test_doors = possible_doors[Random.rand(possible_doors.length)]
bad_choice += 1 if test_doors[Random.rand(test_doors.length)] == :car
end
puts "bad_choice percent = #{bad_choice.to_f / tries.to_f * 100} good_choice percent = #{100.0 - (bad_choice.to_f / tries.to_f * 100)}"


First up, we get a list of all the possible ways the doors could be arranged and set the number of times that switching would be a bad choice to 0. We'll run this 100000 times in a Monte (not Monty) Carlo loop. In the loop we get one of the possible arrangements of test doors randomly selected and then increment the bad_choice variable if the door we pick contains the car (in other words switching from where we are would be a "bad choice"). When we run this program we see that switching is a bad choice only about 33% of the time and a good choice about 66% of the time (one might guess that these are really 33.333333...% and 66.6666666%). So, then answer in this case is that yes you should switch.

Let me know if you have questions or comments.

Tuesday, November 1, 2011

Ruby Distance Calculation

I was just looking at a clustering algorithm and needed a distance calculation between two points in n-dimensional space. If you remember your high school geometry (or even earlier) the distance between two points in the xy plane is the sqrt((x1 - x2)**2 + (y1-y2)**2). Where the two points are represented by (x1, y1) and (x2, y2). For more dimensions, just add values inside the sqrt as in (z1-z2)**2 for a third dimension. Given that here's a one line function that I came up with


def distance(a, b)
Math.sqrt(a.zip(b).inject(0) { |d, c| d + (c[0] - c[1]) ** 2 })
end


It's a bit tricky, so let's go through it. First we have the Math.sqrt which will just take the square root of whatever is inside it. Next the a.zip(b) will take two arrays (here the input parameters) and take the first values of each of the arrays, create a new array from them and then add that to the output array (see also this post for more information on zip). Same with the second values of the arrays and so on. For example if we have [x1, y1, z1].zip([x2, y2, z2]) this will return [[x1, x2], [y1, y2], [z1, z2]] (hopefully, this looks like something we can use to you). Next we're going to use inject to run through this array with a starting value of 0 (d) and add to it the square of the difference of the two values in the array. Finally as noted before, take the square root and return.

If you have any problems with this, break it up into multiple lines and check the intermediate values. As always, let me know if you have any questions.