GeekBrainDump

PaulBo

I've cut a number of things on my CNC machine using EMC2. I've been perplexed by the fact that some of the corners in my test shapes were not sharp, but were rounded off.

I've just been designing and cutting some new leadnut holders and found that the hexagonal depression I was cutting was also plagued with these curved corners (not good when you need a hex nut to fit in them). It turns out this is to do with the G64 command in the G-Code which tells the trajectory planner to sacrifice path following accuracy in order to keep the feed rate up (i.e. cut/round corners). A good description is available in the EMC2 documentation section covering trajectory control.

I manually edited my G-code (this one coming from CamBam, but past ones were from Inkscape and dxf2gcode) so that the G64 command now read "G64 P0.001" and the hexagon cut with much sharper corners.

PaulBo

I was introduced to process substitution recently at work, it's a great way to avoid temporary files for some simple cases.

For example, say I wanted to know what the column header changes were between two files (same data, different code used to extract them). The files are tab delimited and I have a script I use (frequently) that prints out the index and name for the headers in an input file - columnHeaders.sh.

So, if I want to see what's different between two files, in the past I'd create two output files using my columnHeaders.sh script and then use diff, kompare or comm to compare them.

You can eliminate the temporary files using a technique called process substitution.

> diff <(columnHeaders.sh file1) <(columnHeaders.sh file2)
2,3c2,3
> 0 blarg
> 1 blorg
--
< 0 blorg
< 1 blarg

In this case we see two columns have been swapped between file1 and file2.

Take a look at the Advanced Bash Scripting Guide for more examples.

The columnHeaders.sh script basically does this (but allows user specified delimiters):

> head -1 <input file> | perl -F'\t' -lane 'print $n++,"\t$_" for @F'

PaulBo

If I only need a single optional parameter I'll often just check to see if the input positional parameter of interest is set (e.g. script that accepts one required field and one optional field):


if [[ ! -n "$2" ]];
then
    echo "I received the second parameter:$2"
fi

But if you want to do something a bit more complex, getopts is your friend.

For example, say you want to have the user input their first name, last name and a "keep my data private" flag you could do something like this:


while getopts f:l:p flag
do
    case $flag in
        f)
            FIRSTNAME=$OPTARG;;
        l)
            LASTNAME=$OPTARG;;
        p)
            PRIVATE=1;;
        ?)
            echo "$0 -f <first name> -l <last name> -p"
            echo -e "\t-p [flag] keep my data private"
            exit
done

The getopts command is fairly straightforward (man getopts for more details). If an option requires an argument then a colon is placed after it's letter designation ('f' and 'l' in the above example).

You can check for required parameters by looking at which variables were set:


if [[-z "$FIRSTNAME" || -z "$LASTNAME" ]];
then
    echo "missing required parameter"
fi

Wrap that all up into a neat script with a subroutine that outputs a usage statement and you're home free:


function usage_and_exit()
{
    echo "$0 -f <first name> -l <last name> -p"
    echo -e "\t-p [flag] keep my data private"
    exit
}

while getopts f:l:p flag
do
    case $flag in
        f)
            FIRSTNAME=$OPTARG;;
        l)
            LASTNAME=$OPTARG;;
        p)
            PRIVATE=1;;
        ?)
            usage_and_exit;;
done

if [[ -z "$FIRSTNAME" || -z "$LASTNAME" ]];
then
    echo "missing a required parameter (firstname and lastname are required)"
    usage_and_exit
fi

if [[ $PRIVATE -ne 0 ]];
then
    echo "protecting private data"
fi

PaulBo

I just tried updating our mysql server to accept very long connections. I have a ton of jobs running, so I wanted to set the wait_timeout variable (via the mysql shell) to something reasonable for these jobs. The default of 8 hours is not sufficient in some rare cases so I tried to set the timeout higher:


mysql> SHOW VARIABLES LIKE 'wait_timeout';
+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| wait_timeout             | 28800 |
+--------------------------+-------+
1 row in set (0.00 sec)

mysql> SET GLOBAL wait_timeout=86400;
mysql> SHOW VARIABLES LIKE 'wait_timeout';
+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| wait_timeout             | 28800 |
+--------------------------+-------+
1 row in set (0.00 sec)

What? Why wasn't it set?

Well, the reason is that the SHOW VARIABLES command defaults to the session variables. So, the local session wait_timeout is still 2880, but the global wait_timeout was actually updated correctly:


mysql> SHOW GLOBAL VARIABLES LIKE 'wait_timeout';
+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| wait_timeout             | 86400 |
+--------------------------+-------+
1 row in set (0.00 sec)

PaulBo

Ever wanted to keep an eye on the running processes in a MySQL database? How about something that works a little like the top command, but without any of the bells and whistles?

Well, here you go:


> while [[ 1 ]];
do
 clear;
 mysql -u <username> -p<password> -h <host> -e "show full processlist";
 sleep 1;
done

Remember to replace the <username> <password> and <host> variables with the values for your database. Also, if you don't like the bounding box on the mysql output, you can have a cleaner output by using redirection instead of the -e flag:


mysql -u <username> -p<password> -h <host> <<< "show full processlist";

PaulBo

By default regexes in [[ ]] are case sensitive. If you want to match in a case insensitive way you have to set the shell option nocasematch.


mytext="eXistenZ"
if [[ $mytext =~ existenz ]];
then 
 echo "yep"
else
 echo "nope"
fi

If you run the above script you should get "nope" as the output. For case insensitive matching just insert this into the script prior to the regex:


shopt -s nocasematch;

You can unset the nocasematch shell option using the following: shopt -u nocasematch

Here's a more complete example:


mytext="eXistenZ"
function testText
{
 if [[ $mytext =~ existenz ]];
 then 
  echo "yep"
 else
  echo "nope"
 fi;
}
testText
shopt -s nocasematch
testText

If you run that script then the output is:


nope
yep

PaulBo

These are just a few R links I think are interesting and worth a further read:

Colour palettes in R.

Using R for introductory statistics

Web-friendly visualisations in R

A different way to view probability densities

Thoughts on Making Data Work

Also of note is Google's announcement (on June 2nd) that they were working with the USPTO to make all granted patents, trademarks and published applications freely available for download - see this post.

PaulBo

Sometimes it's the little things in life that give us the greatest pleasures. I love subshells. There, I admitted it.

Say you want to do something with the output of a program but would like to prepend some output/text that also needs to be operated on. The standard idiom would be to create a file with the prepended output, append your output to this and then operate on the file, but you can remove the intermediate files using a subshell. Here's a contrived example:


> (echo -e "some\toutput\tto\tappend"; perl -lane 'print join("\t", log($F[0])/log(10), @F[1 .. #$F])' ;) | a2ps

Something else I use subshells for a lot is to launch programs in a different directory without having to cd there and then back again:


> (cd ~/workspace/; eclipse &)

like I said; it's the little things.

PaulBo

Automatically generating files? Annoyed at all the empty ones? Here's how to purge them:


> for file in $( ls . ); do if [ ! -s $file ]; then rm -f $file; fi; done

Of course you could just get over your fear of find and take a look at that man page. Perfect for a recursive search and delete all in one:


> find . -empty -delete

simple, no?

PaulBo

I often need to escape quotation characters or other special characters that are being piped into a bash process:

Say you have a mysql table which contains protein names. Some of these have the ' character in them (e.g. "Inosine 5' monophosphate dehydrogenase") and you want to do some bulk processing on these names, you could do something like this:


> mysql -u paulbo -N <<< "SELECT name from proteins" | while read protein_name; do mysql -u paulbo -N "SELECT count(*) FROM data INNER JOIN proteins ON data.protein_id = proteins.protein_id where proteins.name = '${protein_name//\'/\\\'}'"; done

Ah, three backslashes. Why didn't I think of that?

And sometimes you just want to output the text with the special characters all converted into something a bit more amenable (like the old and trusted '_').


> mysql -u paulbo -N <<< "SELECT name from proteins" | while read protein_name; do echo ${protein_name//[-\/\:\'\"\(\) ]/_};done

PaulBo

Ok, well, this will be obvious to anyone who's really read the ls man page, but I only came across it a couple of days ago.

Say you have a directory with tons of files in it mostly with a single extension and you want to see what else is in there. Sure, you can use grep, but you can also use ls's inbuilt flat --hide.


> ls -l
arg.txt
blarg.txt
foo.txt
...
interesting.sh
... # and a ton more *.txt files
zlargyblorg.txt
> ls -l --hide=*.txt
interesting.sh
hidden.csv

Isn't that useful?

PaulBo

Why do people keep giving me tons of files with spaces in the filename?

Anyway, here's a good way to get rid of those pesky spaces:


ls * | while read file; do mv "$file" ${file// /_}; done

First I tried using "for file in `ls *`" but of course the whitespaces came back to bite me... This was also true for the mv command. You have to quote "$file" in order for the whitespace ridden filename to be recognised as a unit rather than multiple file descriptors.

PaulBo

Strangely this caught me off guard. If you use ${#array[@]} to get the 'size' of an array, it actually only returns the number of assigned elements in the array.

e.g.


> array[23]=123
> echo ${#array[@]}
1

Hmm... only 1? Not 24?

As far as I can tell, there's no way around this. Just don't expect this behaviour and fill your arrays wisely.

While we're on bash arrays, remember that you can change the 'join' character for naive printing of arrays by manipulating the IFS (Internal Field Separator) variable. Below I also show that the quotation context is important for this:


> array[0]=1;array[1]=2;array[2]=3;
> echo ${array[*]}
1 2 3
> echo "${array[*]}"
1 2 3
> IFS=","
> echo ${array[*]}
1 2 3
> echo "${array[*]}"
1,2,3
>

Note: It's best practice to store and then restore the original IFS variable.


> ORIG_IFS=IFS
... do stuff
> IFS=ORIG_IFS

PaulBo

You can request the jvm create a heap dump when an OutOfMemoryError is thrown. This is handy if you have a process that consumes a ton of RAM and you don't know why. Set the max heap size to something around 500M (or less. It needs to be fairly small if you're going to inspect the heap with 'jhat'). Use the -XX:+HeapDumpOnOutOfMemoryError flag to request the heap dump. This will output to java_pid.hprof by default. You can set the output filename manually using -XX:HeapDumpPath=<filename>.

e.g.


> java -Xmx100m -XX:+HeapDumpOnOutOfMemoryError -XX:HeadDumpPath=/tmp/dump.hprof com.geekbraindump.MyMemoryHoggingClass

PaulBo

It's always annoyed me that I have to open up a file in order to cut and paste the contents into a web browser (I use this a lot for capturing information on an internal wiki). As of 2010-03-23 there are no default command line access utilities for this (on CentOs anyway).

However, download and install xclip and the clipboard is yours to command.

xclip allows access to both the PRIMARY (middle mouse button) and SECONDARY (standard copy/paste) selections.

By default piping into xclip puts the text in the PRIMARY clipboard (middle mousebutton).


> echo $RANDOM | xclip
> xclip -o
4807

You can define which selection to input to. Say you want to store text in the SECONDARY selection (accessed using standard cut and paste commands):


> echo $RANDOM | xclip -sel 'clipboard'
> xclip -o -sel 'clipboard'
4807

You can now use edit->paste to output the text. Note that the random number was the same as before. I guess this is some shell caching mechanism. For a new random number each time you have to use a new shell:


> (echo $RANDOM) | cat
1297

PaulBo

I've just been fangling with a UV Painter project (row of LEDs to 'paint' on a glow-in-the-dark-wall). I quickly ran out of RAM when adding patterns to the system and learned how to add variables to the Flash program memory instead. Pretty simple and very useful:


#include <avr/pgmspace.h>

const uint8_t mCylonScan[10][N_LED] PROGMEM = {
  {255,0,0,0,0,0},
  {0,255,0,0,0,0},
  {0,0,255,0,0,0},
  {0,0,0,255,0,0},
  {0,0,0,0,255,0},
  {0,0,0,0,0,255},
  {0,0,0,0,255,0},
  {0,0,0,255,0,0},
  {0,0,255,0,0,0},
  {0,255,0,0,0,0}
};

Then to access the data you just do the following (where 'i' and 'j' are loop variables):


data = pgm_read_byte(&(mCylonScan[i][j]));

See the AVR libc docs for more details.

PaulBo

I'm finding myself looking these up quite a lot, so here's a little cheat sheet of the basics.

Using the string "ABCDEFG12345" as an example:


> string="ABCDEFG12345"
> echo $string
ABCDEFG12345

> #Replacement:
> echo ${string/ABC/___}
___DEFG12345

> #Replacement with character class
> echo ${string/[ABC4]/_}
_BCDEFG12345

> #Replace all occurrences:
> echo ${string//[ABC4]/_}
___DEFG123_5

> #Extract from a defined position in the string
> echo ${string:7}
12345
> echo ${string:7:3}
123

> #substring removal (from the front of the string)
> echo ${string#ABC}
DEFG12345
> echo ${string##ABC} #strips the longest substring match
DEFG12345
> string2="abcABCabc123ABCabc
> echo ${string2#a*C}
abc123ABCabc
> echo ${string2##a*C}
abc
> # use the % sign to match from the end 
> # % for shortest and %% for longest substring match
> echo ${string%45}
ABCDEFG123

PaulBo

Say you want to split an SDF into individual entities, you could write a Perl script/one-liner (which is what I've been doing for a long time) or you could just use csplit. Thanks to Pat and Jessen for pointing this one out.

e.g. say you had an SDF, test_mols.sdf, with 8 molecules in it and you wanted individual mol files:

> csplit -kzsf "test_mols" -b %0d.mol test_mols.sdf /\$\$\$\$/+1 {*}

This would result in 8 files called test_mols00.mol through test_mols07.mol. Unfortunately these would still contain the SDF delimiter at the end of the file (so, technically these are still SDFs). That's pretty easy to clean up with something like:

> perl -ni -e 'print unless /\$\$\$\$/' *.mol

See the csplit manpage for more details.

PaulBo

I often refer to other bugs in a Bugzilla comment. These are 'autolinkified' by bugzilla. I've only just learned that you can refer to a comment in a bug as well and have this 'autolinkified'.

See the Bugzilla hintandtips page.

Short answer:

Bug autolink:" bug 1234"
Comment autolink:"bug 1234, comment 12"
attachment: autolink" bug 1234, attachment 4"

PaulBo

You can perform regular expression matching on a variable within an extended test command (see the Conditional Constructs part of the bash manual).

e.g.


prompt> name=foobar.blarg;if [[ $name =~ foo ]]; echo yep; else echo nope; fi
yep
prompt> name=foobar.blarg;if [[ $name =~ foo[a-c] ]]; echo yep; else echo nope; fi
yep
prompt> name=foobar.blarg;if [[ $name =~ foo[d-z] ]]; echo yep; else echo nope; fi
nope

PaulBo

Quick one-liner for generating a HTML table from tab delimited input. Either pipe in your data or include the file as a command line argument.


perl -F'\t' -lane 'BEGIN{print "<table border=1 cellpadding=3 cellspacing=0>"}print "<tr>", (map {"<td>$_</td>"} @F), "</tr>";END{print "</table>}'

The map is in parentheses so that the closing '<tr>' tag is not slurped in as part of it's input array.

PaulBo

I'm using gcc/WinAVR to program my AVRs and have just been discovering how to program the clock speed. I'm playing with an ATtiny13 and an ATmega8. Both of these ship with their clocks set to 1MHz by default but both can be clocked to a higher speed.

There are a few things to note:

1. F_CPU is used by the compiler for calculating timings (most obvious example is in the delay.h routines). Setting it has no effect on the actual clock speed. This needs to be set to the correct value.

2. Easiest way (for me) to set the clock speed is to program the relevant fuse bits. This was different for the two chips I've been using - ATtiny13 and ATmega8). Note: for fuse bits 1 = unprogrammed and 0 = programmed (this is due to the nature of EEPROM).

This step is made easy when using the Eclipse avr plugin. There's a GUI/wizard for setting them in Project->Properties->AVR->AVRDude->Fuses. Accessible by selecting the "direct hex values" radio option and then clicking the "start editor" button.

3. For some MCUs you can dynamically adjust the clock speed in software (I know this is true for the ATtiny13 at least). However, this has to be done within 4 clock cycles of setting the CLKPCE bit (again, this is for the Atiny13). See this forum post on avrfreaks.net and pg.28 of the datasheet.

Here's a good overview of setting the clock for the ATmega8: Electrons - AVR Fuses HOWTO Guide.

PaulBo

I guess I don't do this enough to remember it:

When reading a file in a for loop in bash, the following idiom will read each work (whitespace delimited):

for word in `cat file.txt`; do echo $word; done

If you want to grab the whole line then you can do this:

while read line; do echo $line; done < file.txt

or

cat file.txt | while read line; do echo $line; done

PaulBo

There are a few shortcuts I use all the time for navigating in Eclipse and I've just learned a few more useful ones (I was looking for quick ways to jump between editor windows).

Here's the essential list (IMO):

Ctrl+E (go to other open editors - opens selection box)
Ctrl+Q (jump to last edit location)
Crtl+O (jump to any member/method/inner-class in the current editor)
Ctrl+shift+T (open any type)
Ctrl+shift+R (open any file)
Ctrl+L (jump to a particular line number)
Ctrl+T (go to a supertype/subtype - multiple presses toggle between super/sub)
Alt+left/right arrow (jump through visited files)
Ctrl+. Ctrl+, (navigate up and down through error/warning locations)

PaulBo

There are two primary random functions to be aware of in stdlib.h: random() and rand() the main difference is in the range of values returned by the two functions.

random() returns a pseudo random number in the range 0 -> 0x7FFFFFFF = 0 -> 1,879,098,192 (RANDOM_MAX).
rand() returns values in the range 0 -> 0x7FFF = 0 -> 28,672 (RAND_MAX).

I had mistakenly been using one in place of the other in some micro-controller code and had spent some time wondering why it wasn't behaving as expected...

Sunday, November 28, 2010

Friday, November 12, 2010

Friday, September 17, 2010

Wednesday, August 18, 2010

Wednesday, July 14, 2010

Wednesday, June 16, 2010

Sunday, June 13, 2010

Tuesday, June 1, 2010

Tuesday, May 18, 2010

Friday, May 14, 2010

Thursday, May 6, 2010

Friday, April 30, 2010

Tuesday, April 27, 2010

Thursday, April 15, 2010

Tuesday, March 23, 2010

Sunday, February 28, 2010

Friday, February 26, 2010

Thursday, February 11, 2010

Tuesday, February 2, 2010

Friday, January 29, 2010

Tuesday, January 26, 2010

Saturday, January 16, 2010

Tuesday, January 12, 2010

Friday, January 8, 2010

Sunday, January 3, 2010

Subscribe To

Blog Archive

Labels

About Me