GeekBrainDump: March 2012

Friday, March 9, 2012

Timing your R code

Ever wanted to find out which of a set of methods is faster in R? Well, there's a very easy way to time your code: system.time.

For example: I wanted to compare the speed of using subset's "select" option over post restricting the full returned data.frame.

Here are examples showing the comparison I mean. Assume that "molecule_data" is a data.frame with at least one field (name) and that name_list is a vector of molecule names that I'm interested in.

Here is an example of using subset's "select" restriction mechanism

mol_names <-
   unique( subset(molecule_data, name %in% name_list, select="name") )

here is an example of restricting to a single column post subsetting:

mol_names <-
   unique( subset(molecule_data, name %in% name_list)$name )

I found out that, for my data, using subset's select option was ~50x faster.

system.time(
   mol_names <-
      unique(subset(molecule_data, name %in% name_list, name)))

 user  system elapsed
0.001  0.000  0.001

system.time(
   mol_names <- 
      unique(subset(molecule_data, name %in% name_list)$name)

 user  system elapsed
0.055  0.000  0.056

These timings are unreliable given how small they are (esp the first one), so lets run the operation a hundred times to get a better estimate:

system.time(
   for(i in 1:100){
     mol_names <- unique(subset(molecule_data, name %in% name_list, name))
   }
)

 user  system elapsed
0.131  0.000  0.135

system.time(
   for(i in 1:100){
      mol_names <- unique(subset(molecule_data, name %in% name_list)$name
   }
)

 user  system elapsed
5.607  0.161  5.802

You can see that the time difference holds up over multiple runs. Subset's "select" is the clear winner!

GeekBrainDump

Friday, March 9, 2012

Timing your R code

Blog Archive

Labels

About Me

GeekBrainDump

Friday, March 9, 2012

Timing your R code

Subscribe To

Blog Archive

Labels

About Me