How do you sort a file by line length?
Our Dataset
one
two
three
four
five
six
seven
eight
nine
ten
a line with twenty nine chars
one with 11
The Clumsy Ex
Optionally select just the range to work on, either visually or explicitly. Here, I will assume the whole file is to be line-lengthly sorted.
:%s/^/\=len(getline('.')) . " " :sort n :%s/^\d\+ //
Yielding
one
two
six
ten
four
five
nine
three
seven
eight
one with 11
a line with twenty nine chars
NOTE: This is clumsy for two reasons:
- It requires messing with the buffer text. Three separate operations are needed to complete this single logical task.
- The sort considers only the numeric value and not the subsequent textual values, resulting in lines indeed sorted by line length, but left in an apparently random disarray otherwise.
Let VimL Light Your Way
First, we'll need a comparator. The sort() function does so on the textual representation of the data passed to it by default. This can be overridden by passing the name of a function as the second argument. Such a named function looks like this:
function! Lengthly(i1, i2) let li1 = len(a:i1) let li2 = len(a:i2) return li1 == li2 ? (a:i1 == a:i2 ? 0 : (a:i1 > a:i2 ? 1 : -1)) : li1 > li2 ? 1 : -1 endfunction
Or the more verbose equivalent (with less eye-bleeding ?: statements):
function! Lengthly2(i1, i2) let i1 = a:i1 let i2 = a:i2 let len_i1 = len(i1) let len_i2 = len(i2) if len_i1 == len_i2 if i1 == i2 return 0 else if i1 > i2 return 1 else return -1 endif endif else if len_i1 > len_i2 return 1 else return -1 endif endif endfunction
*shudder* - while the ?: mess above is arguably hard to read, I find the more verbose form even more distracting.
NOTE: I have a penchant for naming my sort comparators as adverbs like that - I like how they read in the subsequent sort([], 'Lengthly') call.
So, armed with our new comparator, let's sort some lines.
Across the whole file (as in the earlier example):
:call setline(1, sort(getline(1, '$'), 'Lengthly'))
Over a visually selected range (using explicit range end markers):
:<c-u>call setline("'<", sort(getline("'<", "'>"), 'Lengthly'))
NOTE: The <c-u> is there to clear the '<,'> visual range markers Vim helpfully (though, unnecessarily in this case) inserts for us when pressing : while a visual selection is in effect.
Over a visually selected range (using the selection register):
:<c-u>call setline("'<", sort(split(@*, "\n"), 'Lengthly'))
NOTE: You will need :set clipboard=autoselect to have the @* register auto-populated with the current visual selection.
The Result
one
six
ten
two
five
four
nine
eight
seven
three
one with 11
a line with twenty nine chars
Beautiful! No messy manipulations of the lines (inserting and deleting line lengths on each line) before and after sorting AND not only are the lines sorted by line length but they're in alphabetical order within their length groups too. VimL, I <3 you.
Technically, I could have cobbled together a long-winded vanilla VimL function, but 30 minutes into this post, I suddenly got very bored.
Gotchas for the weary
I had pasted (an admittedly utterly goatse) version of the VimL solution up to #vim before I thought about blogging this. It was an overly complicated spaghetti mess of maps and sort and other assorted sins. Despite its debatable beauty, it was just plain WRONG. It failed to respect sort()'s preference to sort textually, resulting in files that had 1 and then 11 and then 2 line lengths in a decidedly unappealing jaggy edge.
Lesson Learned: Test on larger data sets than the English numbers one to ten. :-p
As always, it's not the destination with these sort of posts, but rather the journey. Hope you enjoyed the ride. Now... you can walk home.
Before You Go
I generate my test data with a little map I keep handy for just this purpose:
:nnoremap <silent> <leader>T :<c-u>call append('.', map(range(v:count1), 'NumToNumber((1+v:val))'))<CR>
With the keychord 25<leader>T I generate 25 lines of test data beneath my cursor. Sure, as mentioned just above, testing on textual representations of English numbers can sometimes be limiting, but this is the first time I've noticed the bite. The magical function NumToNumber() comes from my firstly plugin which contains such gems as:
- NumToNumber(num) " 1 -> one
- NumToOrd(num) " 1 -> 1st
- NumToOrdinal(num) " 1 -> first
- NumberToNum(engnum) " one -> 1
- NumberToOrd(engnum) " one -> 1st
- NumberToOrdinal(engnum) " one -> first
- OrdToNum(ord) " 1st -> 1
- OrdToNumber(ord) " 1st -> one
- OrdToOrdinal(ord) " 1st -> first
- OrdinalToNum(engnum) " first -> 1
- OrdinalToNumber(engnum) " first -> one
- OrdinalToOrd(engnum) " first -> 1st
No comments:
Post a Comment