Sunday, October 20, 2013

Letter Frequency in Text and Rails

The other day a coworker and I were discussing keyboard layouts and this led into whether or not the character frequency (how often various letters are used) in a Rails application would be the same as it would be in normal text (though obviously, the code will have many more and more varied special characters). I thought that it would be but, he didn't think that would be the case. Since I had a bit of free time this weekend, I thought I'd write a bit of code to figure it out. For the text, I used Project Gutenberg's copy of Moby Dick and for the code, I used a smallish Rails project of mine.

Here's the code and results ...

def histogram(freqeuncy)
freqeuncy.each do |c,v|
puts "#{c}: #{'*'*(v*100).to_i}"
end
end
File.open(ARGV[0], 'r') do |f|
freqeuncy = {}
('A'..'Z').each { |c| freqeuncy[c] = 0 }
total_characters = 0
f.each_char do |c|
if c.upcase =~ /[A-Z]/
total_characters += 1
freqeuncy[c.upcase] += 1
end
end
percentage_freqeuncy = freqeuncy.map { |c, v| [c, v.to_f / total_characters.to_f] }
histogram(percentage_freqeuncy)
end
A: ********
B: *
C: **
D: ****
E: ************
F: **
G: **
H: ******
I: ******
J:
K:
L: ****
M: **
N: ******
O: *******
P: *
Q:
R: *****
S: ******
T: *********
U: **
V:
W: **
X:
Y: *
Z:
A: ******
B: *
C: ***
D: *****
E: *************
F: **
G: *
H: **
I: ******
J:
K:
L: *****
M: **
N: ******
O: *******
P: **
Q:
R: *******
S: *******
T: ********
U: ***
V:
W: *
X:
Y:
Z:


As you can see, the histograms are pretty similar. For this particular Rails project, I was using HAML, so I'm not sure if there'd be any differences if you used erb or not. Also, as I noted, this was a fairly small project.

The code runs against a single file that's passed in on the command line. So run it like this ...

ruby letter_freqency.rb moby10b.txt > moby_results.txt

for example. For a rails project, I cat'd all the files together and then ran the code like this ...

cat `find . -iname \*.rb -or -iname \*.haml` > rails_files.txt
ruby letter_freqency.rb rails_files.txt > rails_results.txt

Let me know if you try this on one of your projects and post the results in the comments.

No comments:

Post a Comment