Friday, November 12, 2010

Fun with process substitution

I was introduced to process substitution recently at work, it's a great way to avoid temporary files for some simple cases.

For example, say I wanted to know what the column header changes were between two files (same data, different code used to extract them). The files are tab delimited and I have a script I use (frequently) that prints out the index and name for the headers in an input file - columnHeaders.sh.

So, if I want to see what's different between two files, in the past I'd create two output files using my columnHeaders.sh script and then use diff, kompare or comm to compare them.

You can eliminate the temporary files using a technique called process substitution.

> diff <(columnHeaders.sh file1) <(columnHeaders.sh file2)
2,3c2,3
> 0 blarg
> 1 blorg
--
< 0 blorg
< 1 blarg
In this case we see two columns have been swapped between file1 and file2.

Take a look at the Advanced Bash Scripting Guide for more examples.

The columnHeaders.sh script basically does this (but allows user specified delimiters):
> head -1 <input file> | perl -F'\t' -lane 'print $n++,"\t$_" for @F'

No comments:

Post a Comment