Thursday, August 11, 2011

Hett's Problem

Sorry for not writing for a while, I've been busy looking for a new job. If you've got one, you can contact me here or at slabounty at large search company that starts with "g".

Anyway ... over at Programming Praxis there's a problem via PrologSite concerning lists. Here's the problem statement ...
1.28 (**) Sorting a list of lists according to length of sublists
a) We suppose that a list (InList) contains elements that are lists themselves. The objective is to sort the elements of InList according to their length. E.g. short lists first, longer lists later, or vice versa.

Example:
?- lsort([[a,b,c],[d,e],[f,g,h],[d,e],[i,j,k,l],[m,n],[o]],L).
L = [[o], [d, e], [d, e], [m, n], [a, b, c], [f, g, h], [i, j, k, l]]

b) Again, we suppose that a list (InList) contains elements that are lists themselves. But this time the objective is to sort the elements of InList according to their length frequency; i.e. in the default, where sorting is done ascendingly, lists with rare lengths are placed first, others with a more frequent length come later.

Example:
?- lfsort([[a,b,c],[d,e],[f,g,h],[d,e],[i,j,k,l],[m,n],[o]],L).
L = [[i, j, k, l], [o], [a, b, c], [f, g, h], [d, e], [d, e], [m, n]]

Note that in the above example, the first two lists in the result L have length 4 and 1, both lengths appear just once. The third and forth list have length 3; there are two list of this length. And finally, the last three lists have length 2. This is the most frequent length.


So how can we solve these two problems in Ruby? The first one is pretty much trivial. Here's the code ...


list = [%w[a, b, c], %w[d], %w[e, f], %w[g, h, i, j, k], %w[l], %w[m, n, o]]
list_sort_length = list.sort {|a, b| a.length <=> b.length}
p list_sort_length


All we're going to do is use the option to sort that takes a block. The block will get two values and instead of the default, we're going to use the values length. Run this and you should see ...
[["l"], ["d"], ["e,", "f"], ["a,", "b,", "c"], ["m,", "n,", "o"], ["g,", "h,", "i,", "j,", "k"]].

The next piece is a bit trickier. Here, we can't do just a one-liner (that I could see anyway). Here's the code ...


hist = Hash.new{|h, k| h[k] = []}
list.each { |l| hist[l.length] << l }
list_sort_hist = []
hist.sort {|a,b| a.length <=> b.length}.each { |key, value| value.each {|e| list_sort_hist << e } }
p list_sort_hist


We start out creating the histogram hash and passing a block so that each element is initialized with an empty array. Then, we work through the list and add each element to an array at the appropriate histogram hash location. Next, create the empty sorted histogram array. Finally, we're going to sort the histogram the same way that we did in the earlier problem (we can do this because they both are Enumerable. We take the result of that and do and each for every item in the histogram. For each of the values (which remember are arrays), we add them to the list_sort_hist array. Finally, we print that out. If it makes it easier to see, split that long line in two. First create a sorted histogram array and then for each value loop through the value and add it to the list_sort_hist array.

Let me know if you have any comments, questions, or jobs.

3 comments:

  1. There is a sub-optimal one-liner. If you want to sort only by the given element’s length’s frequency, you can do

    list.map { |e| [list.map(&:size).count(e.size), e] }.sort.map(&:last)

    – this maps every element of the list to a duple consisting of its size’s frequency and itself, then sorts (on frequency, as it’s the first part of the duple) and finally fetches the second parts of the duples (the elements we’re interested in).

    If you also want to group same-length elements next to each other, you need to do the same with triples of [frequency, length, element]:

    list.map { |e| [list.map(&:size).count(e.size), e.size, e] }.sort.map(&:last)

    To optimise it (and lose the one-linerness) you’d need to factor out the size histogram.

    ReplyDelete
  2. …or, obviously (stupid me!), just use #sort_by:

    # just length’s frequency
    list.sort_by { |e| list.map(&:size).count(e.size) }

    # also length (as secondary sorting)
    list.sort_by { |e| [list.map(&:size).count(e.size), e.size] }

    ReplyDelete
  3. Nice! Thanks for showing a nicer way to do this. I love these one-liners and wasn't really seeing how to do this one in one line.

    ReplyDelete