Tuesday, February 11, 2014

What If The Romans Had Vim?


Note This is much more about VimL than MDCLXVI strings; if this code doesn’t rape your cat, you’re welcome.

Call it a kata if you will; I decided over breakfast to have a go at writing a Roman Numeral to Arabic Number converter in VimL. It’d been sufficiently long since I’d written such a converter in any language, and I’d never done one in VimL, so I thought, why not?

I prefer to think away from the console, so I grabbed a notepad and pencil and started doodling a solution based on how I think about parsing Roman Numerals in wet-ware. Roughly, that looked a bit like:

to_arabic(str)
  new stack
  for each roman_letter in str
    val = arabic_value(roman_letter)
    if val < stack.tos()
      stack.push(val - pop())
    elseif val == stack.tos()
      stack.push(val + pop())
    else
      stack.push(val)
    end
  end
  return sum(stack)
end

Note Yes, I am aware that might expose potentially embarrassing belief systems I might be operating under regarding life, the universe and everything. Meh.

I did some mental as well as paper-based run-throughs of the algorithm to convince myself that it would work and then implemented it in VimL. I don’t have the VimL solution to show you because after getting a working version I began refactoring it as I saw opportunities and generalities lurking within the code.

Here is what I ended up with:


function! Roman()
  let obj = {}
  let obj.values = {'M':1000, 'D':500, 'C':100, 'L':50, 'X':10, 'V':5, 'I':1 }

  func obj.to_arabic(str) dict
    let self.prior = 0
    return eval(join(map(
          \ reverse(map(split(a:str, '\zs'), 'self.values[v:val]'))
          \, 'self.calc(v:val)'), '+'))
  endfunc

  func obj.calc(v) dict
    let [n, self.prior] = [(a:v < self.prior ? -a:v : a:v), a:v]
    return n
  endfunc

  return obj
endfunction

So while my pseudo-code used a stack and processed the string in original order, the final algorithm reversed the string and just tracked the prior value each time through the loop (map()).

Some explanations of the weirder VimL:
  • I’m using VimL’s Object Oriented approach which is more like javascript’s than C++'s. The Roman() function is actually an object generator. Each such object then has a to_arabic(roman_string) method.
  • I’m also using VimL’s functional-ish approach to processing data lists. Reading such constructs is usually better from the inside out, which begins with:
    • The split(a:str, '\zs') yields a LIST OF individual LETTERS. Unfortunately, the explanation at :help /\zs doesn’t include the idiomatic use shown here of exploding strings out to a list of their component characters.
    • The map( LIST OF LETTERS , 'self.values[v:val]') converts individual Roman Numerals to equivalent Arabic numbers. This, therefore, returns a LIST OF NUMBERS. VimL’s map() function takes the list first and an expression to be evaluated against each element in turn (the resulting list returned by map() is the modification of each element through this evaluated expression). A simple example might help; this generates squares:

        echo map(range(1,10), 'v:val * v:val')
      The archaic looking `v:val` is VimL's Way of exposing access to the value of
      the element. If it helps, this would be equivalent to the Ruby code:

        (1..10).map {|val| val * val}
      Note map() in VimL is destructive, not that that matters here, though.
    • The map( LIST OF NUMBERS , 'self.calc(v:val)') inverts numbers in the list if they precede bigger numbers. Remember the original string order of letters is reversed in this algorithm.
    • The eval(join( LIST OF NUMBERS , '+')) sums the list.
All in all… I think this algorithm does what it’s supposed to do, and it does so with less aggression than solutions I googled for afterwards. o_O

At least the first comment in that last one tries to head the complainant in the right direction early.

The other thing I wanted to show today is micro-tests. Of course, being able to refactor sloppy code into a better solution is made practical with tests.

We all know and love testing, and Vim has decent plugins for doing it properly when you have to (i.e. writing your own plugin.) However, for quick’n'dirty jobs like this, I usually just drop a small inline test at the end of the script:


if expand('%:p') == expand('<sfile>:p')
  let fail = 0
  for t in [
        \  [1, 'I']
        \, [2, 'II']
        \, [3, 'III']
        \, [1992, 'MCMXCII']
        \, [1999, 'MCMXCIX']
        \]
    if t[0] != Roman().to_arabic(t[1])
      echo 'Fail: ' . t[0]
      let fail = 1
    endif
  endfor
  if ! fail
    echo "Ok"
  endif
endif

Note
  • The if expand('%:p') == expand('<sfile>:p') trick is to allow this script to be `:source`d from another script without triggering the unit-tests. I didn’t need this here because roman numeracy has been out of fashion for a while now, but I thought I would show this here as an added bonus.
  • I’ve shown few tests here to keep the post brief. My real tests for this little algorithm number in the fifties. I bet even then I’ve overlooked some poignant design aspect of Roman Numerals. This was meant to be an exercise in the general approach rather than aiming to provide an exhaustively accurate turn-key solution. One notable absence in that vein is the lack of data validation to ensure the roman numeral string is not of Celtic descent.
  • For those with the inability to read between the lines (osse), that last paragraph means: if I have flubbed some VimL, I’d love to hear from you; if I’ve merely upset dead Romans and their counting system, I don’t so much care.
Update: The testing code has been extracted out into a tiny little plugin called vim-u-test if you're interested in experimenting with it.