Wednesday, April 6, 2011

View a summary of filetypes in a dir

Sometimes I fill up directories with lots of files of lots of types (different extensions). It's informative to be able to see what the distribution of types is in a dir. I tend to use this view during project cleanups to identify files that can be deleted or moved/consolidated. Anyway, here's a quick way of doing it:

for file in $(ls *.*); do base=$(basename $file); echo ${base##*\.};done | sort | uniq -c | sort -n -r
  37 tsv
  18 pdf
  17 txt
   7 err
   5 log
   5 png

Here I see that there are some log files that can be removed (if no longer needed).

Sunday, February 13, 2011

Reading stdin from a pipe for command-line R shenanigans

I think I rely on command-line tomfoolery a little bit too much. Today I wanted to pipe some data into R and have the commands to run defined on the command line as well. This is the kind of thing I do with Perl or bash to get some quick answers and I'd love to add R to the repertoire.

So, I tried a number of things all of which failed. This is how I got it to work for me:
> perl -le 'printf "%.4f\t%.4f\n", rand(), rand() for 1 .. 20' \
 | R --vanilla --slave -e\
 "data=read.delim(pipe('cat /dev/stdin'), header=F);\
  cor.test(data\$V1, data\$V2)"

You have to remember to escape any special characters in the R script ($ in this case).

Wednesday, January 12, 2011

R: using lattice graphs in functions

I like the lattice graphics package; it produces nice looking and easy to generate graphs. However, I was bitten the other day when a function I'd written to generate a load of density plots wasn't creating the output files I was expecting. They just weren't there at all. After a little bit of unsuccessful poking around, I gave up and generated them manually (it was a stressful day with tight deadlines).

I went back to this issue today and it turns out that the lattice functions 'do not create plots'. You have to print the object created from the lattice function in order to see/output the graph.

This happens automatically on the command line, which is why I was so confused since the code would work fine outside of a function. If you want your graph to be outputted within a function you have to explicitly print it.

e.g.:

function write_density_plot(data, name)
{
    filename = paste(name, '_density_distribution.png', sep="")
    png(filename)
    print(densityplot(data, main=paste(name, 'score distribution'))
    dev.off()
}

filedata = read.delim('myfile', header=T)
write_density_plot(filedata$blorg, 'blorg')
write_density_plot(filedata$blarg, 'blarg')
write_density_plot(filedata$wib, 'wib')

Sunday, November 28, 2010

EMC2, G64 and sharp corners

I've cut a number of things on my CNC machine using EMC2. I've been perplexed by the fact that some of the corners in my test shapes were not sharp, but were rounded off.

I've just been designing and cutting some new leadnut holders and found that the hexagonal depression I was cutting was also plagued with these curved corners (not good when you need a hex nut to fit in them). It turns out this is to do with the G64 command in the G-Code which tells the trajectory planner to sacrifice path following accuracy in order to keep the feed rate up (i.e. cut/round corners). A good description is available in the EMC2 documentation section covering trajectory control.

I manually edited my G-code (this one coming from CamBam, but past ones were from Inkscape and dxf2gcode) so that the G64 command now read "G64 P0.001" and the hexagon cut with much sharper corners.

Friday, November 12, 2010

Fun with process substitution

I was introduced to process substitution recently at work, it's a great way to avoid temporary files for some simple cases.

For example, say I wanted to know what the column header changes were between two files (same data, different code used to extract them). The files are tab delimited and I have a script I use (frequently) that prints out the index and name for the headers in an input file - columnHeaders.sh.

So, if I want to see what's different between two files, in the past I'd create two output files using my columnHeaders.sh script and then use diff, kompare or comm to compare them.

You can eliminate the temporary files using a technique called process substitution.

> diff <(columnHeaders.sh file1) <(columnHeaders.sh file2)
2,3c2,3
> 0 blarg
> 1 blorg
--
< 0 blorg
< 1 blarg
In this case we see two columns have been swapped between file1 and file2.

Take a look at the Advanced Bash Scripting Guide for more examples.

The columnHeaders.sh script basically does this (but allows user specified delimiters):
> head -1 <input file> | perl -F'\t' -lane 'print $n++,"\t$_" for @F'

Friday, September 17, 2010

Bash command line parameter parsing (getopts)

If I only need a single optional parameter I'll often just check to see if the input positional parameter of interest is set (e.g. script that accepts one required field and one optional field):

if [[ ! -n "$2" ]];
then
echo "I received the second parameter:$2"
fi

But if you want to do something a bit more complex, getopts is your friend.

For example, say you want to have the user input their first name, last name and a "keep my data private" flag you could do something like this:

while getopts f:l:p flag
do
case $flag in
f)
FIRSTNAME=$OPTARG;;
l)
LASTNAME=$OPTARG;;
p)
PRIVATE=1;;
?)
echo "$0 -f <first name> -l <last name> -p"
echo -e "\t-p [flag] keep my data private"
exit
done

The getopts command is fairly straightforward (man getopts for more details). If an option requires an argument then a colon is placed after it's letter designation ('f' and 'l' in the above example).

You can check for required parameters by looking at which variables were set:

if [[-z "$FIRSTNAME" || -z "$LASTNAME" ]];
then
echo "missing required parameter"
fi

Wrap that all up into a neat script with a subroutine that outputs a usage statement and you're home free:

function usage_and_exit()
{
echo "$0 -f <first name> -l <last name> -p"
echo -e "\t-p [flag] keep my data private"
exit
}

while getopts f:l:p flag
do
case $flag in
f)
FIRSTNAME=$OPTARG;;
l)
LASTNAME=$OPTARG;;
p)
PRIVATE=1;;
?)
usage_and_exit;;
done

if [[ -z "$FIRSTNAME" || -z "$LASTNAME" ]];
then
echo "missing a required parameter (firstname and lastname are required)"
usage_and_exit
fi

if [[ $PRIVATE -ne 0 ]];
then
echo "protecting private data"
fi

Wednesday, August 18, 2010

MySQL: the mystery of unsetable global variables

I just tried updating our mysql server to accept very long connections. I have a ton of jobs running, so I wanted to set the wait_timeout variable (via the mysql shell) to something reasonable for these jobs. The default of 8 hours is not sufficient in some rare cases so I tried to set the timeout higher:

mysql> SHOW VARIABLES LIKE 'wait_timeout';
+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| wait_timeout | 28800 |
+--------------------------+-------+
1 row in set (0.00 sec)

mysql> SET GLOBAL wait_timeout=86400;
mysql> SHOW VARIABLES LIKE 'wait_timeout';
+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| wait_timeout | 28800 |
+--------------------------+-------+
1 row in set (0.00 sec)

What? Why wasn't it set?

Well, the reason is that the SHOW VARIABLES command defaults to the session variables. So, the local session wait_timeout is still 2880, but the global wait_timeout was actually updated correctly:

mysql> SHOW GLOBAL VARIABLES LIKE 'wait_timeout';
+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| wait_timeout | 86400 |
+--------------------------+-------+
1 row in set (0.00 sec)