For example: I wanted to compare the speed of using subset's "select" option over post restricting the full returned data.frame.
Here are examples showing the comparison I mean. Assume that "molecule_data" is a data.frame with at least one field (name) and that name_list is a vector of molecule names that I'm interested in.
Here is an example of using subset's "select" restriction mechanism
mol_names <- unique( subset(molecule_data, name %in% name_list, select="name") )here is an example of restricting to a single column post subsetting:
mol_names <- unique( subset(molecule_data, name %in% name_list)$name )
I found out that, for my data, using subset's select option was ~50x faster.
system.time( mol_names <- unique(subset(molecule_data, name %in% name_list, name))) user system elapsed 0.001 0.000 0.001 system.time( mol_names <- unique(subset(molecule_data, name %in% name_list)$name) user system elapsed 0.055 0.000 0.056These timings are unreliable given how small they are (esp the first one), so lets run the operation a hundred times to get a better estimate:
system.time( for(i in 1:100){ mol_names <- unique(subset(molecule_data, name %in% name_list, name)) } ) user system elapsed 0.131 0.000 0.135 system.time( for(i in 1:100){ mol_names <- unique(subset(molecule_data, name %in% name_list)$name } ) user system elapsed 5.607 0.161 5.802
You can see that the time difference holds up over multiple runs. Subset's "select" is the clear winner!