Monday, February 18, 2013

OpenBabel: Convert SDF to SMILES and keep the data!

This seems like it should be the default option when converting from an SDF to a SMILES file - keep the damned data! Well, in openbabel (at least the version I'm running) this is not the default. If you want to keep the SD data you have to specify each tag as part of the '--append' argument. e.g.
> babel test1.sdf --append "cLogD7.4 cLogP model_score1 model_score2 some_other_property" test1.smi
In order to end up with a tab delimited file (my favourite) then you have to prefix the argument to 'append' with the desired character. I used "Ctrl-v <tab>" to get a tab in my string. Seems odd that tabs wouldn't be the default delimiter since there's still a tab used to separate the SMILES string from the molecule name in the standard conversion.

Friday, February 15, 2013

simple parallel processing with make

I've used xargs a fair bit for some simple, local, parallel processing. I was just recently reminded of another clever and simple solution that a friend came up with.

I'll let the code do the talking; here's the basic bash script (stored as an executable):

if [[ $# -ne 1 ]]; then
   echo "Usage: cat commands.txt | $(basename $0) <num processes>"
   exit 1

(while read line; do
  echo -e "$((++i)):\n\t$line";
echo "all:" $(seq 1 $i)) | make -B -j $1 -f <(cat -) all
This uses a couple of clever tricks. I especially like the use of process substitution in the make command (substituting the 'cat -' for the input makefile).

This approach allows the commands in commands.txt to redirect their own output as they need to (using '>', '2>', '&>', etc.)

Monday, February 11, 2013

Bash: while [[ -e /proc/$process_id ]]

Sometimes you need to keep track of the amount of memory a process is taking up (or CPU, or something else). The /proc filesystem contains a subdirectory for each running process; the directory is named with the pid of the process. You can use this fact along with a basic file test and a while loop to track a process for as long as it lives.
> &

> process_id=$(ps -o "%p %c" | grep "some_interesting_job" | cut -f 1 -d ' ');\
 while [[ -e /proc/$process_id ]];\
 do ps -o "%z";\
 sleep 5;\
This will report on the virtual memory size (in KiB - see 'ps' manpage for more details) that the process is taking up. The while loop will terminate when the process completes (or is killed).