Tuesday, May 8, 2012

Vim Can Haz Math

I got asked today how I might tackle the problem of calculating the arithmetic difference between values in a series. Take for example this mythical data file:

date: 2012 01 01
weight: 90kg
date: 2012 02 01
weight: 87.5kg
date: 2012 03 01
weight: 85.4kg
date: 2012 04 01
weight: 83.2kg
date: 2012 05 01
weight: 80kg

With the desired outcome of:

date: 2012 01 01
weight: 90kg
date: 2012 02 01
weight: 87.5kg (-2.5)
date: 2012 03 01
weight: 85.4kg (-2.1)
date: 2012 04 01
weight: 83.2kg (-2.2)
date: 2012 05 01
weight: 80kg (-3.2)

This is what I used:

:g/weight:/ copy . | silent! ?weight:??? copy . | - join | s/// | s//-/ | s/kg//g | s/.*/\='('.string(eval(submatch(0))).')'/
:g/weight:/ join

Slow Motion Replay
If that makes perfect sense to you and you're even wondering why I was so verbose in some places, then I have nothing left to teach you here. Otherwise, let's break this down to show what's going on:

Note:

  • Vim uses the | character as a command separator, like the ; in C.
  • Vim's :ex mode has a notion of the current line which many commands use as their default source or target address.
1. :g// is Vim's global command. It finds all lines in the buffer matching the specified pattern, in this case: /weight:/. On each match, it sets the current line to the line of the match and runs the series of | separated commands given to it.

Note: Some of the internal commands within this sequence can alter the current line, affecting the subsequent command's notion of where the current line is.

2. copy . duplicates the given address range to below the current line. With no explicit address given then the implied source address is the current line. This effectively duplicates the weight: found by the :g// command below itself. It also sets the current line to the destination address, so the next command in our chain will implicitly operate on the duplicated line as its current line.

3. silent! is used in this pattern in case the user has :set nowrapscan (which would cause the ?weight:??? to fail on the first matching line in the file, prematurely terminating our :g// command. If the user has :set wrapscan enabled, then the last weight: in the file will be found here. Either way, cleaning up the first line is a manual exercise for our hapless user in this tutorial.

4. ?weight:? searches back from the current line to the prior match of weight:. Because our current line was reset by the copy command to be the duplicated line, then a single search backwards will merely find the line we copied from. That's not good enough. We want the next one back again from that one. However, the ?? command, when chained, sets the current line again, so a subsequent ?? immediately after will find the next prior weight: line.  When a search (either forward with // or backward with ??) is used without an explicit search term, Vim uses the prior search term implicitly, so ?? means search backwards for... weight: (because that's what we last searched for, in the ?weight:? command to start with).

Question for the attentive: Why did I give the explicit pattern ?weight:? in that command and not rely on the prior implicit search pattern?

5. Remember that copy . duplicates the given address range to below the current line. The given address in this case was explicitly provided by the search in #4, which points at the actual prior weight: . The current line is the duplicated weight line from step #2, so this command will copy the second prior weight line to below the duplicated weight line. Also remember that the copy command resets the current line to that of the destination address. It might be instructive to see what a sample of the file would look like if we halted our command here. That is, if we were to run the command:

:g/weight:/ copy . | silent! ?weight:??? copy .

We would get:


... 
date: 2012 02 01
weight: 87.5kg
weight: 87.5kg
weight: 90kg               <-- current line
date: 2012 03 01
...

That is assuming you have :set nowrapscan . However, if you have :set wrapscan , you will actually see:

... 
date: 2012 02 01
weight: 87.5kg
weight: 87.5kg
weight: 80kg               <-- current line
date: 2012 03 01

...

Note: That rogue 80kg comes from the fact that the first match of the :g// command is at the top of the file so the subsequent ?weight??? wrapped backwards around the file, finding the bottom-most entry instead — which is 80kg in our sample data file.

But we're not done, so let's continue on. Here's the whole :g// command again:

:g/weight:/ copy . | silent! ?weight:??? copy . | - join | s/// | s//-/ | s/kg//g | s/.*/\='('.string(eval(submatch(0))).')'/

Remember that our current line is as indicated in the samples above.

6. - join moves back a line (from the current line!) and joins the following line to the current line. The result of step #6 from the three weight: lines shown in step #5 is:

weight: 87.5kg
weight: 87.5kg weight: 90kg     <-- current line

It's really instructive for you to run this partial command up to this point yourself to see the goblins I'm ignoring by deliberately choosing the second weight: match within the file. It's only a white lie that will all be cleared up later anyway, so I don't feel too bad about it.

The remaining 4 commands in the chain are substitutions which strip unwanted non-numeric pieces from the line, shape it into an arithmetic subtraction expression and evaluate it to produce the arithmetic difference between the two numbers. The following lines show the result of each command in turn applied to the result of step #6.

the s/// results in:

 87.5kg weight: 90kg

the s//-/ results in:

 87.5kg - 90kg

the s/kg//g results in:

 87.5 - 90

and, finally, the s/.*/\='('.string(eval(submatch(0))).')'/ results in:

(-2.5)

At this stage, the file looks like this:

date: 2012 01 01
weight: 90kg
weight: 90kg
date: 2012 02 01
weight: 87.5kg
(-2.5)
date: 2012 03 01
weight: 85.4kg
(-2.1)
date: 2012 04 01
weight: 83.2kg
(-2.2)
date: 2012 05 01
weight: 80kg
(-3.2)

We want the differences at the end of the preceding weight: lines, so we'll use another :g// command for that:

:g/weight:/ join

Which leaves us almost done. The only bugbear remaining is the duplicated first weight in the file:

weight: 90kg weight: 90kg

Clean that up manually.

Reflection

I could have approached this in a number of different ways, but to my mind, this seemed to be the quickest and easiest approach. Other approaches might include using a macro instead of a global command, or writing a full-blown VimL script.

I tend to build these things up piecemeal, testing as I go. I follow the same methodology when constructing SQL commands. Run a partial to prove to yourself that it's good so far. Press the u key to undo and add your next chunk. Repeat until you're done.



Writing up this article to explain the solution took twenty times longer than scratching the solution out for the requester in the first place.

Oh, the answer to my earlier question? Did you figure it out? The explicit pattern given in s/kg//g forces me to be explicit again at the start of the chained commands within the :g// command.