Monday, January 28, 2013

Say it with VimL

I like Ruby. She’s a beautiful language. So expressive and elegant and yummy. VimL is Vim’s scripting language. While she may not win the sort of aesthetic awards Ruby deserves, she certainly is expressive and elegant in her own way.
Gregory Brown recently showed a handful of idioms for elegantly working with text and files in Ruby. I thought I’d show their VimL analogues here.

idioms for text processing

1. Multiline Matches

Vim has its own regular expression flavour. It’s a bit shocking to PCRE lovers at first — it uses an older, more arcane syntax that can reduce the most ardent Perler to pitiful puling instead. Sulk as they may though, Vim’s regex flavour got here before PCRE and isn’t going anywhere fast. The good news is that Vim’s regex flavour is quite strong — equally up to the machinations of PCRE in almost all aspects (and certainly so in all that count within the context of editing text). I digress — this is not the place to wage that war.
Gregory showed the PCRE idiom of using the /s flag<*> to enable DOTALL mode which allows the . atom to match newlines (as well as its default match-any-character behaviour.) Vim uses \_. to achieve this result:

echo matchlist("foo\nbar\nbaz\nquux", 'foo\n\(\_.*\)quux')[1]

<*> The astute reader will have noticed my sleight play there. Ruby’s flavour of PCRE uses the /m flag to mean what the rest of the PCRE speaking world knows /s to do. I must admit, I was scratching my head when I first read Gregory’s article thinking he’d given the wrong example to suit the /m flag. Thanks goes to kotigid on #regex for pointing me at the Ruby regex page.

2. matchlist()

While Vim doesn’t have the global match variables ($1 et al) that Gregory is recommending avoidance of, it does have his preferred method baked right in. The matchlist() function returns a list containing the whole match as the zeroth element and any submatches from index 1 onwards.

echo matchlist("---\na\nb\nc\n---\n", '^\(---\s*\n\_.\{-}\n\?\)\(---\s*\n\?\)')

3. Extended Regular Expression Syntax

The /x flag in PCRE allows complex regular expressions to be spread out over multiple lines with embedded comments for easier readability and clarity. We can approximate that in VimL:

let PHONE_NUMBER_PATTERN = substitute(substitute('
      \ ^
      \ \%(
      \   \(\d\)           # prefix_digit
      \   [\ \-\.]\?       # optional_separator
      \ \)\?
      \ \%(
      \   (\?\(\d\{3}\))\? # area_code
      \ [\ \-\.]           # separator
      \ \)\?
      \ \(\d\{3}\)         # trunk
      \ [\ \-\.]           # separator
      \ \(\d\{4}\)         # line
      \ \%(:\ \?x\?        # optional_space_or_x
      \   \(\d\+\)         # extension
      \ \)\?
      \ $', '# \S\+', '', 'g'), '\\\@<! ', '', 'g')

echo string(PHONE_NUMBER_PATTERN)
let a_phone_number = '1-234-567-0987:1234'

echo matchlist(a_phone_number, PHONE_NUMBER_PATTERN)

4. Using join()

VimL doesn’t have string interpolation like Ruby.
Given a dictionary (a.k.a associative array, or hash) such as:

let filedata = {'year' : 2013, 'month' : 1, 'day' : 28}

To include variable values in strings we have to use catenation:

echo filedata["year"] . '/' . filedata["month"]. '/' .  filedata["day"]

Of course, the join() trick Gregory showed also works in VimL:

echo join([filedata["year"], filedata["month"], filedata["day"] ], "/")

Another approach in both languages would be to use a printf string:

echo printf("%d/%d/%d",filedata["year"], filedata["month"], filedata["day"])

On the down side, this only works when you know how many fields you need to print, and you’re forced to insert the / characters manually. On the up side, you can easily format the values to show, for example, leading zeros in the day and month fields:

echo printf("%d/%02d/%02d",filedata["year"], filedata["month"], filedata["day"])

idioms for working with files and folders

1. Filenames

Ruby has File.dirname, File.basename, and File.extname for munging filenames. Vim uses the expand() function to do this. Ruby’s FILE (available in Vim as % and VimL as expand('%')) is expressed in Vim as:

echo expand('%:p')

The :p there is called a Modifier in the :help expand() docs. The :p modifier means full path.

To get the dirname only (called the head in vimspeak):

echo expand('%:p:h')

To get the basename (called tail):

echo expand('%:p:t')

Which will return the basename.extension form of the filename. To get just the basename with no extension, use the :r (root) modifier to strip off one level of extension:

echo expand('%:p:t:r')

To get the extension (mnemonically equivalent in vimspeak):

echo expand('%:p:e')

2. Pathname Objects

The closest analogue in Vim to Pathname objects is the fnamemodify() function which provides the same filename manipulations as the expand() function above. You can find more functions like this in :help file-functions.

3. Reading and Writing Files

Vim has two builtin functions for reading and writing files: readfile(fname) (which returns a list of lines) and writefile(list, fname). Semantically simple interfaces.

4. Dir.mktmpdir

Vim doesn’t have a mktmpdir() function but more importantly, VimL doesn’t have the beautiful code blocks of Ruby. As such, we have to use a more procedural idiom of manually creating a temporary directory (with :help tempname()), doing what we want in it and finally remembering to clean up after ourselves. Ruby wins here.

Reflections

Gregory’s intent behind his article was to lead the misguided Rubiest away from using needlessly laborious low-level functions for achieving what can be more beautifully expressed using idiomatic Ruby and elegant thinking. My intent with this article is twofold: firstly to show that not only can VimL easily do the sort of text and file manipulations Gregory showed, but also in most cases just as elegantly (read: semantically simple). Sure, VimL’s actual syntax in places might make your skin crawl, but once you overcome that and appreciate the deeper aesthetics, VimL doesn’t deserve the derision it receives as being a gnarled and impotent language.

No comments:

Post a Comment