Wednesday, November 21, 2012

hg over ssh

I'm working with a repo that I can't access using http, so it's just as well that Mercurial works just fine and dandy over ssh as well - you just have to do a little bit of tweaking to the basic configuration to get things working smoothly.

Add the following to the client .hgrc:
[ui]
remotecmd=<path_to_hg>
Where <path_to_hg> is the path to the hg executable on the main repo machine.

This was from StackOverflow: cloning-a-mercurial-repository-over-ssh

Now, I'm working very remotely from the main repo (6000 miles) so it is worth using compression. Just add the following to the [ui] section of .hgrc:
ssh = ssh -C
You could also configure ssh to use compression by default (see details in link below).

This page has a good overview of using ssh with mercurial: collaborating-with-other-people

Thursday, October 18, 2012

bash while loops and pipes

In bash, if you pipe into a while loop, the while loop is run in a subshell. This means that you're going to be very disappointed if you were hoping to capture data/variables within the loop.

The work around is to not have the pipe there - which is possible through process substitution.

e.g.
echo -e "one\ntwo\nthree" | \
 while read name;
  do val=$name;
  echo $val;
 done;
echo $val
This outputs:
one
two 
three

Notice that the final value "three" is not printed twice as 'val' no longer contains a value.

And the work around:
while read name; 
 do val=$name;
 echo $val;
done < <(echo -e "one\ntwo\nthree")
echo $val
which now outputs:
one
two
three
three
Success! (or, if you're a "Bill and Ted's Bogus Journey" fan - Station!).

Monday, October 8, 2012

Mercurial file patterns

I wanted to add all the scripts in a directory tree to the local Mercurial repository. I was going to do something like this:
> find . -name \*.sh | xargs hg add
Which is nice enough (find and xargs go well together), but you can also do it using just Mercurial using patterns. e.g.
> hg add 'glob:**.sh'
The two most useful patterns (for me) are:
  • '*' match any text in the current directory only
  • '**' match anything in the entire tree
The patterns are much richer than this though; see hg help patterns for more on what's available (including regexes).

Tuesday, June 5, 2012

Removing VMWare Player - blank grey dialogue box

I've just spent way too long trying to upgrade VMWare Player on my laptop.  The issue was that, when I tried to uninstall the incumbent version, I was presented with an unhelpful blank grey dialogue box...  Notice the lack of, well, anything.



The box didn't go away , even after a couple of hours waiting -  left it because I was wondering if something was being unpacked in the background.  This kind of information is frustratingly difficult for me to get at on a windows machine...  Eventually, I had to use Task Manager to kill it.  I went through a few iterations of this trying out a few different ways of uninstalling or running the newer installer (even setting different default browsers since the contents of the box turned out to be HTML and I thought it might be a compatibility issue - but computer says "no"). 


I looked around for solutions & came across the following, which worked for me:


http://superuser.com/questions/245424/vmware-workstation-install-problem


The most pertinent advice was:

  • To uninstall any old version, go to C:\Windows\Installer
  • Add the "Authors" column and sort by it
  • One of the .msi files with have a "VMware" author
  • Double-click it and follow through with the uninstall steps
After uninstalling the older VMWare Player using this method, I was then able to install the latest version and get playing with my brand spanking new ACE image.  Success!

Friday, March 9, 2012

Timing your R code

Ever wanted to find out which of a set of methods is faster in R?   Well, there's a very easy way to time your code: system.time.

For example: I wanted to compare the speed of using subset's "select" option over post restricting the full returned data.frame.

Here are examples showing the comparison I mean. Assume that "molecule_data" is a data.frame with at least one field (name) and that name_list is a vector of molecule names that I'm interested in.

Here is an example of using subset's "select" restriction mechanism
mol_names <-
   unique( subset(molecule_data, name %in% name_list, select="name") )
here is an example of restricting to a single column post subsetting:
mol_names <-
   unique( subset(molecule_data, name %in% name_list)$name )

I found out that, for my data, using subset's select option was ~50x faster.

system.time(
   mol_names <-
      unique(subset(molecule_data, name %in% name_list, name)))

 user  system elapsed
0.001  0.000  0.001

system.time(
   mol_names <- 
      unique(subset(molecule_data, name %in% name_list)$name)

 user  system elapsed
0.055  0.000  0.056
These timings are unreliable given how small they are (esp the first one), so lets run the operation a hundred times to get a better  estimate:
system.time(
   for(i in 1:100){
     mol_names <- unique(subset(molecule_data, name %in% name_list, name))
   }
)

 user  system elapsed
0.131  0.000  0.135

system.time(
   for(i in 1:100){
      mol_names <- unique(subset(molecule_data, name %in% name_list)$name
   }
)

 user  system elapsed
5.607  0.161  5.802

You can see that the time difference holds up over multiple runs.  Subset's "select" is the clear winner!