Ever wanted to find out which of a set of methods is faster in R? Well, there's a very easy way to time your code:
system.time.
For example: I wanted to compare the speed of using subset's "select" option over post restricting the full returned data.frame.
Here are examples showing the comparison I mean. Assume that "molecule_data" is a data.frame with at least one field (name) and that name_list is a vector of molecule names that I'm interested in.
Here is an example of using subset's "select" restriction mechanism
mol_names <-
unique( subset(molecule_data, name %in% name_list, select="name") )
here is an example of restricting to a single column post subsetting:
mol_names <-
unique( subset(molecule_data, name %in% name_list)$name )
I found out that, for my data, using subset's select option was ~50x faster.
system.time(
mol_names <-
unique(subset(molecule_data, name %in% name_list, name)))
user system elapsed
0.001 0.000 0.001
system.time(
mol_names <-
unique(subset(molecule_data, name %in% name_list)$name)
user system elapsed
0.055 0.000 0.056
These timings are unreliable given how small they are (esp the first one), so lets run the operation a hundred times to get a better estimate:
system.time(
for(i in 1:100){
mol_names <- unique(subset(molecule_data, name %in% name_list, name))
}
)
user system elapsed
0.131 0.000 0.135
system.time(
for(i in 1:100){
mol_names <- unique(subset(molecule_data, name %in% name_list)$name
}
)
user system elapsed
5.607 0.161 5.802
You can see that the time difference holds up over multiple runs. Subset's "select" is the clear winner!