For example: I wanted to compare the speed of using subset's "select" option over post restricting the full returned data.frame.
Here are examples showing the comparison I mean. Assume that "molecule_data" is a data.frame with at least one field (name) and that name_list is a vector of molecule names that I'm interested in.
Here is an example of using subset's "select" restriction mechanism
mol_names <- unique( subset(molecule_data, name %in% name_list, select="name") )here is an example of restricting to a single column post subsetting:
mol_names <- unique( subset(molecule_data, name %in% name_list)$name )
I found out that, for my data, using subset's select option was ~50x faster.
system.time(
   mol_names <-
      unique(subset(molecule_data, name %in% name_list, name)))
 user  system elapsed
0.001  0.000  0.001
system.time(
   mol_names <- 
      unique(subset(molecule_data, name %in% name_list)$name)
 user  system elapsed
0.055  0.000  0.056
These timings are unreliable given how small they are (esp the first one), so lets run the operation a hundred times to get a better  estimate:system.time(
   for(i in 1:100){
     mol_names <- unique(subset(molecule_data, name %in% name_list, name))
   }
)
 user  system elapsed
0.131  0.000  0.135
system.time(
   for(i in 1:100){
      mol_names <- unique(subset(molecule_data, name %in% name_list)$name
   }
)
 user  system elapsed
5.607  0.161  5.802
You can see that the time difference holds up over multiple runs. Subset's "select" is the clear winner!
 
 
 
 Posts
Posts
 
