Saturday, March 17, 2012

Paste Me A Column

Have you ever needed to paste one chunk of text linewise besides another column? The *nix tool, paste, does this nicely and is the recommended way to solve this problem (and is demonstrated in solution #2). In this article, I attempt to compare various approaches to solving this problem in Vim.

Two Data Sets

Data Set 1: (possibly in a file called data_set_1.txt)

  if
  don't
  say

Data Set 2: (possibly in a file called data_set_2.txt)

  you
  know
  so

(goal): (possibly clobbering data_set_1.txt)

  if    you
  don't know
  say   so

(actual tab separators are not necessary — a single space is sufficient)

NOTE: I am assuming that all data-sets are equal in length (number of lines)

Solutions

1. (insert mode) — hand editing the column of data

Certainly the lowest-tech solution considered here, and for small data sets, probably the fastest solution. For larger or complicated data sets, though, this solution is just not workable. It's mentioned here only for completeness.

The following solutions all assume that the current buffer holds data_set_1.txt and the alternate buffer holds data_set_2.txt

2. (external command) — :!paste

  :%!paste - #

NOTE: The - tells the paste command to read data set 1 from the standard input stream — which is being sent to paste by Vim with the   :%   part of that command line (in a range, % means every line in the buffer).

If you don't want to load data_set_2.txt into the alternate buffer, you could instead use:

  :%!paste - data_set_2.txt

You'd want a really good reason not to be using this solution. It's the shortest, fastest, most reliable solution presented here. If your operating system doesn't offer you a paste command, then you can download yourself a real one for free.

This solution requires at least the second data set to be saved in a file and not just represented as a clump of lines further down in the same file. In that situation, you can either cut them out to a separate file and use the :!paste command, or look at some of the following approaches.

3. (normal mode commands only) — using blockwise cut & paste

  ctrl-6                  " switch to the alternate buffer (data set 2)
  1G 0                    " move to line 1, column 1
  ctrl-v G $ y            " blockwise yank data set 2
  ctrl-6                  " switch back to buffer holding data set 1
  1G                      " move to first line
  A<space><esc>           " append as many spaces as necessary past
                          " longest entry in left column
  p                       " (blockwise) paste

Next to hand-typing the data in, this is the most labour-intensive approach, especially if you have many paste operations to do. Some of the later examples have a costlier setup (having to type more initially), but will execute faster over anything more than a very few paste jobs. If you only have to do one, this solution might not be too bad. Another problem with this solution is having to ensure that you paste past the longest entry in the left column by padding with spaces beforehand.  This is clumsy, frustrating and prone to cause blunders if not boredom.

Data Set 2 in a Register or Variable


The following examples assume that data set 2 has been read into a register or variable, as in:

  let data = readfile('data_set_2.txt')

OR, if the data was in a chunk in the original file, like:

  you
  know
  so

you could:

  "{visually select the lines using shift-v}
  y
  :let data = split(@", "\n")

4. (ex) - substitute

  :%s/$/\="\t".data[line('.')-1]

This solution requires that the data sets are the only things in the buffers and that they start at the first line.

A slight modification to the range allows an alternate start of the data set. In all the cases below, the range starts on line 6:

Using either a visual selection over the range:

  :'<,'>s/$/\="\t".data[line('.')-6]

Or an explicit range:

  :6,$s/$/\="\t".data[line('.')-6]

5. (ex) — global substitute

  :let x = 0 | .,$g/./s/$/\="\t".data[x]/ | let x += 1

The beauty of the :global command over the plain :substitute command is that it handles the | (bar — command separator in Vim) differently. The :global command sees the | separated pieces as part of its argument — executing the series of commands on each line matched in the global pattern. (The :substitute command does not treat | as part of its argument and is therefore terminated/separated by the |, allowing another command to execute after the :substitute has finished (across all of the lines it was issued against in its range).

In this example, a variable (x) is used to index the data array instead of the line('.') kludge used in the :%s/// solution #4.

Data Set 2 below Data Set 1 in same file


The following examples assume that the two data-sets are in the same file, separated by a single blank line.

6. (ex) — global move

  "{position the cursor on the first line of data set 2}
  :let x = 1 | .,$g/^/exe "move " . x | let x += 2
  :g/./normal J

This example assumes that data set 1 starts on line 1. Change the let x = 1 to the line number starting data set 1 if that's not the case. Also, I am assuming that the two data sets are the only thing in the file — you can adjust the ranges or global patterns to suit otherwise.

7. (macro) — normal commands on meth

  let @m="}jdd\<c-o>pkJj"

and then:

  "{navigate to the first line of the top-most data-set}
  3@m                     " execute the macro 3 times

Of course, you'd replace 3 there with the actual number of times you needed. If you don't know the exact number, or you're lazy to type it, you can fudge it with an obviously oversized alternative, as in:

  999@m                   " automatically stops on error

NOTE: My choice of   m   as the macro register there was totally arbitrary. You have 26 named registers (a-z) to do with as you choose.

Also, although I show the macro assignment as a let statement here, it is typically not crafted as such. The more usual approach when creating macros is to bang away at the set of normal commands needed like a monkey chained to a typewriter. Eventually you will mash out your Shakespearean macro masterpiece. The   q   command in Vim is used to record macros for later playback with the   @   command. A typical way to record a macro into register m might look like:


  qmq
  qm}jdd^OpkJjq

Where the first line clears register m and the second begins, sets and ends the recording — with q stuff q respectively. That hideous   ^O in there represents the   ctrl-o   needed to jump backwards in the position list, and is an actual control character. For this reason, explicitly let-ting macro expressions as shown above is clearer and more portable.

Thanks to osse for his cleaner macro example than I'd originally chosen.

So much for insert, external, normal and ex vim approaches. How about VimL?

8. (vanilla viml)- procedural approach

Vim doesn't have an explicit zip array function that merges successive source elements into a result array, as in [a,b,c].zip(1,2,3) ->
[[a,1],[b,2],[c,3]]. If it did, we could use it to solve our problem here. So, let's write one:

  function! Zip(a, b, sep)
    let i = 0
    let r = []
    let n = min([len(a:a), len(a:b)])
    while i < n
      call add(r, join([a:a[i], a:b[i]], a:sep))
      let i+= 1
    endwhile
    return r
  endfunction

Assuming that the current buffer holds only data set 1 and the alternate buffer holds only data set 2, then:

  :call setline(1, Zip(getline(1,'$'), readfile(expand('#')), "\t"))

9. (newVimL) — because you just knew this had to be here :-)

It turns out, when you're dealing with functionalish languages, they do provide more wow-factor than the first glance over the docs would suggest. Awesomeness is lurking beneath innocuous looking, humble little function names that you'd just glaze right over if they weren't pointed out to you. We're going to look at the functional power-house, map().

  "{navigate to the blank line separating the two data sets}
  :call setline(1, map(getline(1,line('.')-1), 'v:val."\t".getline(line(".")+1,"$")[v:key]'))

Again, I assume here that our file contains only the two data sets and that the first one starts on line 1 and the second set starts after a blank line after the first set.

This code will not delete the second data set — effectively a copy of it was made beside the first data set. In some cases it might be desirable to have this unaltered copy of the second set to remain in the file, but if not, just delete it after the column paste is done.

This map has an ugly wart at the moment — it's inefficient. The replacement expression re-calculates   getline(line('.')+1,"$"   for every line in data set 1. Yuck. We can optimise that away using the same data array trick we did earlier, yielding the cleaner and more efficient:

  :call setline(1, map(getline(1,'$'), 'v:val."\t".data[v:key]'))

NOTE: This version is not affected by the location of the cursor as the previous version was.

If you delete into that data array then you clean up your duplicate copy of data set 2 problem from the outset.

Price of Tea

If we knew from the outset that shelling out to the external paste command was the best way to do this, why did we bother with all of the other approaches? For one thing, we have the paste command for this example — your next hairball might not have a *nix command lurking beneath the shell to help you. Secondly, it's the journey, not the destination. Having walked this way and picked at the various shrubs along the path we have collected a nice potpourri of Vim buds to burn at our leisure on a quiet Sunday afternoon, or torch in anger when faced with the next gnarly editing requirement our tempestuous day-jobs throw at us.

Prior Preparation Prevents Piss-Poor Performance.

Vim on.