Friday, January 25, 2013

commandlinefu.com

I just rediscovered this excellent site. Well worth the occasional browse:

There are some real gems in there! For example:

> python -m SimpleHTTPServer
Serve current directory tree at http://$HOSTNAME:8000/
Useful for tiding up your workspace whilst keeping jobs running:
> disown -a && exit
Close shell keeping all subprocess running
I do love a bit of process substitution:
> diff <(sort file1) <(sort file2)
diff two unsorted files without creating temporary files
Handy:
> rm !(*.foo|*.bar|*.baz)
Delete all files in a folder that don't match a certain file extension
And a "I should have thought of this; it's so obvious now!" trick:
> some_very_long_and_complex_command # label

Easy and fast access to often executed commands that are very long
and complex. When using reverse-i-search you have to type some 
part of the command that you want to retrieve. However, if the
command is very complex it might be difficult to recall the parts
that will uniquely identify this command. Using the above trick
it's possible to label your commands and access them easily by 
pressing ^R and typing the label (should be short and descriptive).

Thursday, January 24, 2013

R: formatting numbers for output

I often produce tables of numbers with a lot of significant digits after the decimal point. It's confusing to look at 20 sig [fd]igs, especially in a large table of results, so I tend to format my output to make it more concise and easier to read.

The function 'format' is pretty good for this. Note that the 'format' call returns a character vector (which is fine if you're only going to write the number to file or the console).

Here's a simple example just using a randomly generated number:

> num <- rnorm(1, mean=10)
> num
[1] 10.24339
We call format with digits=4 (show 4 significant digits) and nsmall=4 (display at least 4 digits after the decimal - for real/complex numbers in non-scientific format):
> format(num, digits=4, nsmall=4)
[1] "10.2434"
You can see that the format command rounds the numbers. This uses the IEC 60559 standard - 'go to the even digit'. So 0.5 is rounded to 0 and 1.5 is rounded to 2...

Of course, if you're used to sprintf style commands then you can also use the sprintf function for this:

> sprintf("%.4f", num)
[1] "10.2434"

Wednesday, January 2, 2013

Simple parallel processing with xargs

We've all been there - looking in a nicely tidied up directory, full of archived data - hundreds of lovely data files all gzipped or (better) bzip2'ed; but now you want to use them and you have to uncompress them all... "if only I could use all n CPUs on my local machine to do this!": enter 'xargs'!

It's as simple as:
> ls *.bz2 | xargs -n 1 -P 6 bunzip2
This will set off bunzip2 on all bz2 files in the current directory.

The '-n 1' flag tells xargs to only provide one argument (file) per command line; the '-P 6' tells xargs how many concurrent processes to run.