Monday, April 16, 2012

Lengthly

It's time again for another comparison of approaches in Vim. This time, we're looking at the question:

How do you sort a file by line length?

Our Dataset

  one
  two
  three
  four
  five
  six
  seven
  eight
  nine
  ten
  a line with twenty nine chars
  one with 11

The Clumsy Ex

Optionally select just the range to work on, either visually or explicitly.  Here, I will assume the whole file is to be line-lengthly sorted.

  :%s/^/\=len(getline('.')) . " "
  :sort n
  :%s/^\d\+ //

Yielding

  one
  two
  six
  ten
  four
  five
  nine
  three
  seven
  eight
  one with 11
  a line with twenty nine chars

NOTE: This is clumsy for two reasons:
  1. It requires messing with the buffer text. Three separate operations are needed to complete this single logical task.
  2. The sort considers only the numeric value and not the subsequent textual values, resulting in lines indeed sorted by line length, but left in an apparently random disarray otherwise.
Can we have it all? Can we sort by line length, and then subsequently alphabetically, all in the one operation?

Let VimL Light Your Way

First, we'll need a comparator. The sort() function does so on the textual representation of the data passed to it by default. This can be overridden by passing the name of a function as the second argument. Such a named function looks like this:

  function! Lengthly(i1, i2)
    let li1 = len(a:i1)
    let li2 = len(a:i2)
    return li1 == li2 ? (a:i1 == a:i2 ? 0 : (a:i1 > a:i2 ? 1 : -1)) : li1 > li2 ? 1 : -1
  endfunction

Or the more verbose equivalent (with less eye-bleeding ?: statements):

  function! Lengthly2(i1, i2)
    let i1 = a:i1
    let i2 = a:i2
    let len_i1 = len(i1)
    let len_i2 = len(i2)
    if len_i1 == len_i2
      if i1 == i2
        return 0
      else
        if i1 > i2
          return 1
        else
          return -1
        endif
      endif
    else
      if len_i1 > len_i2
        return 1
      else
        return -1
      endif
    endif
  endfunction

*shudder* - while the ?: mess above is arguably hard to read, I find the more verbose form even more distracting.

NOTE: I have a penchant for naming my sort comparators as adverbs like that - I like how they read in the subsequent   sort([], 'Lengthly')   call.

So, armed with our new comparator, let's sort some lines.

Across the whole file (as in the earlier example):

  :call setline(1, sort(getline(1, '$'), 'Lengthly'))

Over a visually selected range (using explicit range end markers):

  :<c-u>call setline("'<", sort(getline("'<", "'>"), 'Lengthly'))

NOTE: The   <c-u>   is there to clear the   '<,'>   visual range markers Vim helpfully (though, unnecessarily in this case) inserts for us when pressing   :   while a visual selection is in effect.

Over a visually selected range (using the selection register):

  :<c-u>call setline("'<", sort(split(@*, "\n"), 'Lengthly'))

NOTE: You will need   :set clipboard=autoselect   to have the @* register auto-populated with the current visual selection.

The Result

  one
  six
  ten
  two
  five
  four
  nine
  eight
  seven
  three
  one with 11
  a line with twenty nine chars

Beautiful! No messy manipulations of the lines (inserting and deleting line lengths on each line) before and after sorting AND not only are the lines sorted by line length but they're in alphabetical order within their length groups too. VimL, I <3 you.

Technically, I could have cobbled together a long-winded vanilla VimL function, but 30 minutes into this post, I suddenly got very bored.

Gotchas for the weary

I had pasted (an admittedly utterly goatse) version of the VimL solution up to #vim before I thought about blogging this. It was an overly complicated spaghetti mess of maps and sort and other assorted sins. Despite its debatable beauty, it was just plain WRONG. It failed to respect sort()'s preference to sort textually, resulting in files that had 1 and then 11 and then 2 line lengths in a decidedly unappealing jaggy edge.

Lesson Learned: Test on larger data sets than the English numbers one to ten. :-p

As always, it's not the destination with these sort of posts, but rather the journey. Hope you enjoyed the ride. Now... you can walk home.

Before You Go

I generate my test data with a little map I keep handy for just this purpose:

   :nnoremap <silent> <leader>T :<c-u>call append('.', map(range(v:count1), 'NumToNumber((1+v:val))'))<CR>

With the keychord   25<leader>T   I generate 25 lines of test data beneath my cursor. Sure, as mentioned just above, testing on textual representations of English numbers can sometimes be limiting, but this is the first time I've noticed the bite. The magical function   NumToNumber()   comes from my firstly plugin which contains such gems as:
  • NumToNumber(num)        " 1 -> one
  • NumToOrd(num)           " 1 -> 1st
  • NumToOrdinal(num)       " 1 -> first
  • NumberToNum(engnum)     " one -> 1
  • NumberToOrd(engnum)     " one -> 1st
  • NumberToOrdinal(engnum) " one -> first
  • OrdToNum(ord)           " 1st -> 1
  • OrdToNumber(ord)        " 1st -> one
  • OrdToOrdinal(ord)       " 1st -> first
  • OrdinalToNum(engnum)    " first -> 1
  • OrdinalToNumber(engnum) " first -> one
  • OrdinalToOrd(engnum)    " first -> 1st
Enjoy. :-)