GeekBrainDump: 2009

Thursday, December 24, 2009

updating windows vista file permissions with Icacls

I'm not much of a windows guy, but I have a vista machine at home as the main PC. I recently bought a NAS to backup important files and found that my backup jobs were failing due to file permission issues (there are multiple accounts on the PC). I wanted to do the equivalent of chmod +r (or even 777) on a few files but didn't want the hassle of using windows explorer to adjust the file perms one at a time (I'd already tried changing permissions for the root folder and applying these to the contained files and sub-dirs, but it didn't work. I guess it was something to do with the fact that the permissions weren't uniform for the files in the dirs).

Anyway, it seems you use Icacls for updating file permissions and you can do it recursively with the /T flag. So for a single file you do this:


> Icacls <filename> /grant <user>:<perm>

e.g. This grants full control to user 'Paul' on file 'test.txt'


c:\> Icacls test.txt /grant Paul:F

and for a dir you do this:


c:\Icacls <dirname> /T /grant <user>:<perm>

e.g.


c:\> Icacls c:\testdir /grant Paul:F

Monday, December 21, 2009

Mysql multiple counts on one line (reports)

I like to poll the contents of multiple tables over time in order to track the progress of certain processing tasks I do. I used to perform individual counts for each table and kept doing that out of habit. There's a much nicer way to do this and have the report in a single result. I tend to keep track of the elapsed time using the unix_timestamp() function (this returns the number of elapsed seconds since Jan 1st 1970 as an unsigned integer).

First set up the initial variables (those to compare against - this is time 0)


PaulBo@test_db:SELECT @time:=UNIX_TIMESTAMP(), @apples:=(SELECT COUNT(*) FROM apples), @oranges:=(SELECT COUNT(*) FROM oranges);

Then, let some time pass and poll the tables for changes:


PaulBo@test_db:SELECT UNIX_TIMESTAMP() - @time as elapsed_time, (SELECT COUNT(*) FROM apples) - @apples as d_apples, (SELECT COUNT(*) FROM oranges) - @oranges as d_oranges;
+--------------+----------+-----------+
| elapsed_time | d_apples | d_oranges |
+--------------+----------+-----------+
|          435 |      230 |     12887 |
+--------------+----------+-----------+
1 row in 1 set (0.00 sec)

Friday, December 11, 2009

Redirecting the output from a "here document"

This had me confused for a few minutes, so I thought I'd post.

It's common to use "here documents" to simplify input to a program in a script.

e.g.


> cat <<EOF
> This is a random number:
> $RANDOM
> EOF
This is a random number:
9948

But what if you want to capture the output of this? The naive attempt would be to redirect after the second EOF, but this is incorrect as the termination string has to be on a line all by itself.

This is the answer:


> cat <<EOF > /tmp/data.txt
> This is a random number:
> $RANDOM
> EOF
> cat /tmp/data.txt
This is a random number:
24669

samples from large datasets in R

I have a dataset I want to plot (say 5,000,000 data-points). This can be very slow to plot in R, so you want to take a sample of this data instead.

Say I have a tab delimited file with two columns, say 'time' and 'count'. The columns have these as headers. There are 5M rows and I'd like a simple overview of the count over time.


> data = read.delim('filename', header=T) #read in the tsv (tab separated value) data file
> s = length(data$time) # calculate the number of data-points
> n = 1000 # this is my sample size
> N = sort(sample(1:s, n)) #create a set of indices sampled from the vector 1:s
> plot(data$time[N], data$count[N]) # use the indices to sample from the set

The magic is in the sort(sample(1:s, n)). This takes n samples from the space 1 -> s. Unless a probability vector is provided, each element in the input vector (1:s) has an equal probability of being selected. We sort the output of sample so that the indices are in the correct order to plot. Actually I just tried this without the sort and it seems the plot() function sorts the input vectors anyway.

Thursday, December 3, 2009

Infinite loop in bash

I needed an infinite loop for polling a database table for a while (killed with Ctrl-C).


while [ 1 ];
 do mysql -u paul -h test_database -ppass test_paul data_table <<< "show processlist";
 sleep 10;
done

Friday, November 20, 2009

Visual Diff

I just came across 'kompare'. It's a great way to track minor differences between plain text files. Very pretty, side-by-side, visualization of the changes.

The display uses 3 colours:

Green - new text on the left
Blue - new text on the right
Red - changed text/area

See the wikipedia page for more details.

Friday, October 23, 2009

Using curl to POST data

I wanted to automate some load testing at work, this required sending some post requests to a running server in order to increase the amount of 'work' being done. I wanted to use 'curl' as it's a linux command line app and so very easy to incorporate in a script.

Anyway, this is how you post data with curl:


curl -d "param1=value1¶m2=value2" http://myhost.com/server.cgi

Friday, October 2, 2009

Inserting binary values in the mysql shell

I needed to insert some dummy values into a table which had a bit field as well as a blob field, both of which are "NOT NULL". You just need to have the 'b' prefix on the data, like so:


INSERT INTO faketable(blob1, name, bitfield) values(b'001011101', "Beeblebrox", b'1');

You can also print out the binary data in a way that's not going to ruin your terminal using the BIN(), OCT() and HEX() functions:


SELECT name, HEX(blob1), BIN(bitfield) from faketable;
+--------------------+--------------+----------------+
| name               | HEX(blob1) | BIN(bitfield) |
+--------------------+--------------+----------------+
| Beeblebrox         | 005D         | 1              | 
+--------------------+--------------+----------------+

Monday, September 14, 2009

R - percentiles

I just needed to find some percentile thresholds and thought that it'd be easy with R. Well, it is!

So, load the data


data = seq(0,1,0.01);

Then you can get quantiles like so:


quantile(data);

   0%  25%  50%  75% 100%
0.00 0.25 0.50 0.75 1.00

and percentiles by using the vector argument to quantile (in this case I'm looking at the 0th to 1st percentile):


   0%  0.1%  0.2%  0.3%  0.4%  0.5%  0.6%  0.7%  0.8%  0.9%    1%
0.000 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.010

Friday, September 11, 2009

screen

Ever lost a terminal which had an active job running? Ever wanted to check in on what you'd been doing from home *and* wanted or needed the session history as well?

'screen' is your friend. If you're starting work that's important or that you may want to check from another machine it's a real time-saver. Have a look at the man page for more details than you can shake a stick at.

I've only been using it for a little while, so the only caveat I've found so far is that I have to remember to give the sessions a reasonable name before I start them. Otherwise, if you've got a lot of sessions, it's a time waster trying to find the right one. The '-t' flag sets the title for the default shell; more importantly the '-S' flag sets the session title. I've been using the same string for both (will probably set up an alias to deal with the duplication soon).

e.g.


shell1> screen -t data_loading -S data_loading
shell2> screen -ls
There are screens on:
        21802.data_loading    (Attached)
        20237.pts-10.hamilton   (Attached)
        1376.variance_analysis  (Attached)
        6639.pts-8.hamilton     (Attached)
        9881.cluster_jobs        (Attached)
5 Sockets in /var/run/screen/S-paul.

You can see from the above listing that there are 5 sessions (all currently attached). There are two listings there with no session title and I have no idea what I'm doing in them... but for the other three it's obvious (at least to me).

Monday, August 31, 2009

Code formatting for blogger

This post is very out of date now - I recommend using GitHub and Gists for adding code to blog posts.

... older post below ...
I've been meaning to look into this for a while. I'd like to be able to insert nicely formatted code into posts.

I found the follwing blog post - getting-code-formatting-with-syntax-highlighting-to-work-on-blogger and figured I'd give it a try.

Unfortunately following the instructions in the above post didn't work for me and I had to waste some time on the project site to find out what to do. This post on puffyandmishu was very helpful. I ended up with the following code in my blogger template:

1. In the <head> tag:

<script language='javascript' src='http://alexgorbatchev.com/pub/sh/current/scripts/shCore.js'/>  
<script language='javascript' src='http://alexgorbatchev.com/pub/sh/current/scripts/shBrushCpp.js'/>
<script language='javascript' src='http://alexgorbatchev.com/pub/sh/current/scripts/shBrushBash.js'/>  
<script language='javascript' src='http://alexgorbatchev.com/pub/sh/current/scripts/shBrushJava.js'/>
<script language='javascript' src='http://alexgorbatchev.com/pub/sh/current/scripts/shBrushSql.js'/>
<script language='javascript' src='http://alexgorbatchev.com/pub/sh/current/scripts/shBrushPerl.js'/>
<link href='http://alexgorbatchev.com/pub/sh/current/styles/shCore.css' rel='stylesheet' type='text/css'/>
<link href='http://alexgorbatchev.com/pub/sh/current/styles/shThemeDefault.css' rel='stylesheet' type='text/css'/>

2. at the bottom of the <body> tag:

<script language='javascript'>  
SyntaxHighlighter.config.bloggerMode = true;
SyntaxHighlighter.all()  
</script>

Let's test it out:

C - <pre class="brush:c">

int main(void)
{
    uint8_t i = 0;
    char c = 'a';
    while(1)
    {
        i++;
    }
}

Bash - <pre class="brush:bash">

for i in `ls *.c`;
  do echo $i;
done

Sql - <pre class="brush:sql">

SELECT name FROM mydb.test_table WHERE id IN (1,2,3,4,5);
INSERT INTO mydb.test_table VALUES(1) WHERE name = "test";

Java - <pre class="brush:java">

private static final class MyClass extends YourClass implements Cloneable
{
    public int mMyInt = 145;
    private String mString = "blarg";
    protected double mDouble = 23.45;

    public void main(String[] pArgs) throws Exception
    {
        System.out.println("Hi there");
    }
}

Perl - <pre class="brush:perl">

use warnings;
use strict;

my $i = 10;
for my $j (0 .. $i)
{
    print "This is line $j.  How lovely!\n";
}
warn "this does nothing.";

Line wrapping off <pre class="brush: plain; wrap-lines: false">

This is an example of a long line that shouldn't wrap.  Dwi eisio ddweud rhiwbeth yn Cymraeg ond dwi wedi anghofio popeth.  How long can we make these bars?

Now lets try it with Perl.

perl -F'\t' -lane 'print if $h{$F[0]}++;END{print "$_\t$h{$_}" for keys %h}' averyververververlongfilename.tsv

Friday, August 21, 2009

AVR delay_ms

When programming the AVR uC you have to remember that the delay_ms and delay_us functions will tie in the whole floating point library unless they are passed constants. This will always make the code too bloated to fit in anything less that 3K. So, it's right out for the ATtiny13 (1K Flash memory). I've been bitten by this twice now and the incidences were far enough appart that it took me 45 mins the second time to trace what was going on...

I was trying this:


delay_ms(delay);
if(i==0)
delay--;

but should have been increasing/decreasing the delay time by wrapping the delay call in a loop of variable length instead.

Legends in R plots

I've just spent about 45 mins trying to get a decent results plot from R. I wanted to include a legend for clarity and this gave me some trouble. It turns out the co-ordinate system was using values from the x and y plot rather than pixels... weird. Happily, for simple plots, there's an easy way to position the legend. From the R man page:

The location may also be specified by setting 'x' to a single
keyword from the list '"bottomright"', '"bottom"', '"bottomleft"',
'"left"', '"topleft"', '"top"', '"topright"', '"right"' and
'"center"'. This places the legend on the inside of the plot frame
at the given location.

So, I ended up with something like this:

R --vanilla --slave <<<"d=read.table('data1.csv', sep=',', header=F);e=read.table('data2.csv', header=F, sep=',');png('results.png');dh=hist(d\$V2, plot=F);eh=hist(e\$V2, plot=F);xlim=range(dh\$mids, eh\$mids);ylim=range(dh\$counts, eh\$counts);plot(dh, main='', xlab='value', col='orange', xlim=xlim, ylim=ylim);plot(eh, col='lightblue', add=T);legend('topright', legend=c('label1', 'label2'), fill=c('orange', 'lightblue'));q()"

Pre-calculating the range for the x and y axes (using the range operators) prevents you from cutting off data from one of the plots.

Thursday, August 6, 2009

MySQL bash shell one-liners

This is a useful thing to know if you're ever trying to use data in files to grab data from a MySQL database: You can feed in SQL query strings to the mysql client in the bash/shell standard way:


bash> mysql -u paul -h myhost -pmypass test_paul <<< "SELECT id from a_table where a_name='boo-yah';"

Which means that you can put this in a shell script (or Perl script, or whatever). For example, to print out the addresses associated with a set of names:


for name in `cat names.list`;
   do address=`mysql -u paul -h myhost -pmypass test_paul <<< "SELECT address from a_table where a_name='$name';"`;     echo "$name -> $address";
   done

Wednesday, July 29, 2009

Perl subroutine references

I just found this useful for generating a dispatch/lookup table. You can create a hash table with the values being references to subroutines.

Here's a simple example:


> perl -le '$a = sub { 5 * $_[0]};print &{$a}(23);'
> 115

A better example is to use this as a look-up table. Here's another simple example:


%table = (
 "+" => sub { $_[0] + $_[1] },
 "-" => sub { $_[0] - $_[1] },
 "*" => sub { $_[0] * $_[1] },
 "/" => sub { $_[0] / $_[1] },
);

print &{$table{"+"}}(12,24), "\n";
print &{$table{"-"}}(24,12)), "\n";

This would output 36 and 12 respectively. This is very handy for more complex processing and parsing (which is what I'm using it for).

Tuesday, July 21, 2009

Command line R

Sometimes I just want to print out a histogram from a file, or create a simple summary of some numeric data. It's good to be able to just bash off an R script from the command line to do this for you.

There are a couple of easy ways to invoke R to run in 'batch' mode (i.e. non-interactive):
1. R CMD BATCH <scriptname>
2. R --vanilla --slave <scriptname>

For 1. the output is saved in a file named <scriptname>.Rout, in 2. the output comes to STDOUT - which, for me, is much more useful.

Also, you can use the standard shell 'tricks' to create quick scripts without having to save a script file:

e.g.

1. Create a numeric summary of the input data (in this case for a file with the format - "name,value"):


R --vanilla --slave <<< "d=read.table('data.scores', sep=',');summary(d);q()"
       V1       
 Min.   :1.333  
 1st Qu.:4.037  
 Median :4.651  
 Mean   :4.634  
 3rd Qu.:5.282  
 Max.   :8.000

2. Create a histogram for the data and save to a PNG (don't forget to escape any special shell characters).


 R --vanilla --slave <<< "d=read.table('data.scores', sep=',');png('data.png');hist(d\$V1);q()"

Thursday, July 16, 2009

Never tried SMS blogging before... Worse than Twitter?

Wednesday, July 15, 2009

converting postscript to an image

This is something I've meant to learn how to do for ages. I needed to do this to produce summaries for a shit load of PDFs (containing related molecules) and wanted to have an image of a structure to represent each PDF. Anyway, I print structures as postscript files, so I needed a way to create a PNG of the first structure for each PDF.

I saw a few solutions on line which used ghostview to first output a jpeg and then 'mogrify' to cut down the hunormous result to something useful. It turns out you can just do this:

convert molecule.ps -trim molecule.png

This creates a png from the ps file. Blimey! Really simple!

The -trim tag is a great addition as well, since a direct conversion results in a lot of extra whitespace.

running command line perl within a bash script

Maybe it's just me, but this is something I've struggled with on a couple of occasions.

Since I can't give any of the examples I'm actually working on, I'll have to use something which is less obviously useful.

The simple version is to use bash to loop through some files and use Perl to print out the filename (as stored by the bash script):


for file in `ls *.label`;
   do perl -le "\$a=1;print \"${file} \$a\"";
done

The important things to make this work are:
1. The double quotes used for the Perl script. This enables the variable interpretation by bash.
2. Escape sequences for the double quotes used in the Perl script - the double quotes are necessary to make Perl interpret the Perl variables.
3. Escape sequences for the Perl variables - this distinguishes the Perl variables from the bash ones.

The final script I used was a lot more complicated than this and was used to generate a series of files. I guess I can put the script in as, with no context, it has little meaning.


for si in `seq 38 47`;
 do perl -F',' -lane "if(@F==4)
   {
      print \"\$F[0]\\t\$h{\$F[0]}:\$F[1],\$F[2]\"
        if \$F[3] == $si and \!\$i{\$F[0]}++;
   }
   else{
      @F=split/\t/;
      \$h{\$F[1]} = \$F[0];
   }" file1 file2 > output_${si}.tsv;
done

It would have been nicer just to write a separate Perl script and call that, really. But there you go.

Thursday, June 18, 2009

using find to catch up on what's changed

I came back from a two week vacation and needed to find out what had been updated/added/changed in that time in some shared project dirs. "find" was my friend. The following command gives a list of all files changed within the last two weeks.

find . -ctime -14

'-ctime -14' will be true for all files/dirs that have been changed in the last 14*24 hours. You can alter the behaviour by changing the modifier on the numerical input:


+n     for greater than n,
-n     for less than n,
 n     for exactly n.

Monday, May 18, 2009

jvisualvm instead of kill -3

Whenever I need to see what's going on under the hood of a process I issue a kill -3 to get a stack-trace. Well, I was just introduced to jvisualvm! This is a great way to see what's going on with a running java process.

Changing the mysql prompt

This is such an obvious thing to do, why didn't I do it before?

I'm constantly jumping into and between multiple databases as multiple users (test, standard, mysql) it becomes a little dangerous and I'm finding myself checking where I am and who I am:
SELECT user();
SELECT database();

but you can change the standard mysql prompt to display the user and database! Just edit the my.cnf file and place the following under [mysql]:

prompt=\\u@\\d:

Now my prompt looks like this:
paul@test:
or
root@main:

If you don't have permissions to alter the my.cnf directly, then you could change your personal .bashrc (or equivalent) to contain something like this:

alias mysql='mysql --prompt="\u@\d:"'

This has saved me a lot of paranoia (and probably some headaches).

Friday, May 1, 2009

bash substring substitution

I just came across this today, it's something that I wish I'd known years ago (I think it just shows that I need to read more manuals), I've been emulating this feature with Perl and regexes or with the basename/dirname commands.

It's very simple; today I wanted to run through a set of files with a particular file extension, do some processing and then print the output to another file named the same as the original but with a different extension.

So, say the files are .tsv (tab delimited) and I want to create a .csv (comma delimited) one from the first two columns, you can do it like this:


for file in `ls *.tsv`;
  do cut -f1,2 --output-delimiter=, \
  $file > ${file/.tsv/.csv};
done

Sweet, eh?

Friday, April 24, 2009

Histograms in gnuplot

I'm pretty new to gnuplot, but have found it to be a handy tool for simple plots.

Today I needed to generate a simple histogram. Rather than creating a permanent script I've just been piping a string into gnuplot from echo.

echo "set terminal png;\
 set output 'timings.png';\
 set style fill solid 0.5 border -1;\
 set xlabel 'chunks';\
 set ylabel 'time to complete (m)';\
 set key autotitle columnhead;\
 set title 'timings on 20 nodes';\
 set auto x;\
 plot 'timings.tsv' u 2:xticlabels(1) with boxes lt 2;"\
 |  gnuplot

and out pops a nice histogram (sorry, can't post this example as it's work related).

The input file had column headers and the first column was used as the xlabels (see the xticlabels call above?)

Thursday, April 23, 2009

Stacktrace from a running process

There have been a few occasions where I've been very confused as to why a process is taking a long time to run (usualy Java processes). It's often handy to request a stacktrace from the running process; you can do this without terminating it by issuing 'kill -3 <pid>'. Good, eh?

Unix Join with tabs

This one had me scratching my head for a while yesterday. When you use the unix 'join' command with the default separators (whitespace) then you can join tab delimited files, but the output is space delimited.

I tried specifying the join character with -t "\t" or -t '\t' and even the despearate -t\t, but none of that works. Turns out you have to use an *actual* quoted tab character (which I found through a quick google search, the solution was here, thank you JJinuxLand).

You can insert a tab on the command line with the following key combo "ctrl-v <tab>".

Update: the easier way of doing this is to use the special quoting construct $'string' e.g. join -t $'\t'
See the "QUOTING" section of the bash man page for details.

print "Hello World!"

Hi,

I'm intending for this blog to just be a brain export of any useful/interesting or confusing things I come across in the course of my work. It'll probably contain items on MySQL, relational databases in general, Unix/Linux, Perl, Java, bash, svn, eclipse and the like. It may also end up having some interesting or surprising biology/biochemistry/bioinformatics items as well.

Paul