Wednesday, July 29, 2009

Perl subroutine references

I just found this useful for generating a dispatch/lookup table. You can create a hash table with the values being references to subroutines.

Here's a simple example:

> perl -le '$a = sub { 5 * $_[0]};print &{$a}(23);'
> 115

A better example is to use this as a look-up table. Here's another simple example:

%table = (
"+" => sub { $_[0] + $_[1] },
"-" => sub { $_[0] - $_[1] },
"*" => sub { $_[0] * $_[1] },
"/" => sub { $_[0] / $_[1] },
);

print &{$table{"+"}}(12,24), "\n";
print &{$table{"-"}}(24,12)), "\n";

This would output 36 and 12 respectively. This is very handy for more complex processing and parsing (which is what I'm using it for).

Tuesday, July 21, 2009

Command line R

Sometimes I just want to print out a histogram from a file, or create a simple summary of some numeric data. It's good to be able to just bash off an R script from the command line to do this for you.

There are a couple of easy ways to invoke R to run in 'batch' mode (i.e. non-interactive):
1. R CMD BATCH <scriptname>
2. R --vanilla --slave <scriptname>

For 1. the output is saved in a file named <scriptname>.Rout, in 2. the output comes to STDOUT - which, for me, is much more useful.

Also, you can use the standard shell 'tricks' to create quick scripts without having to save a script file:

e.g.

1. Create a numeric summary of the input data (in this case for a file with the format - "name,value"):

R --vanilla --slave <<< "d=read.table('data.scores', sep=',');summary(d);q()"
V1
Min. :1.333
1st Qu.:4.037
Median :4.651
Mean :4.634
3rd Qu.:5.282
Max. :8.000


2. Create a histogram for the data and save to a PNG (don't forget to escape any special shell characters).

R --vanilla --slave <<< "d=read.table('data.scores', sep=',');png('data.png');hist(d\$V1);q()"

Thursday, July 16, 2009

Wednesday, July 15, 2009

converting postscript to an image

This is something I've meant to learn how to do for ages. I needed to do this to produce summaries for a shit load of PDFs (containing related molecules) and wanted to have an image of a structure to represent each PDF. Anyway, I print structures as postscript files, so I needed a way to create a PNG of the first structure for each PDF.

I saw a few solutions on line which used ghostview to first output a jpeg and then 'mogrify' to cut down the hunormous result to something useful. It turns out you can just do this:

convert molecule.ps -trim molecule.png

This creates a png from the ps file. Blimey! Really simple!

The -trim tag is a great addition as well, since a direct conversion results in a lot of extra whitespace.

running command line perl within a bash script

Maybe it's just me, but this is something I've struggled with on a couple of occasions.

Since I can't give any of the examples I'm actually working on, I'll have to use something which is less obviously useful.

The simple version is to use bash to loop through some files and use Perl to print out the filename (as stored by the bash script):


for file in `ls *.label`;
do perl -le "\$a=1;print \"${file} \$a\"";
done


The important things to make this work are:
1. The double quotes used for the Perl script. This enables the variable interpretation by bash.
2. Escape sequences for the double quotes used in the Perl script - the double quotes are necessary to make Perl interpret the Perl variables.
3. Escape sequences for the Perl variables - this distinguishes the Perl variables from the bash ones.

The final script I used was a lot more complicated than this and was used to generate a series of files. I guess I can put the script in as, with no context, it has little meaning.


for si in `seq 38 47`;
do perl -F',' -lane "if(@F==4)
{
print \"\$F[0]\\t\$h{\$F[0]}:\$F[1],\$F[2]\"
if \$F[3] == $si and \!\$i{\$F[0]}++;
}
else{
@F=split/\t/;
\$h{\$F[1]} = \$F[0];
}" file1 file2 > output_${si}.tsv;
done


It would have been nicer just to write a separate Perl script and call that, really. But there you go.