There's a handy R built-in function for calculating the empirical cdf called ecdf. I had 40 separate distributions (molecular weight for different series of molecules) and wanted to look at the cdf for this. In R it's really straightforward. Here's an example where we simulate some molecular weight data and plot the CDF for the whole set:
# create the data
R> data <- data.frame(mw=rnorm(10000, mean=400, sd=50), name=1:10000, group=sample(c('a','b','c','d'), 1000, replace=T))
R> mw_cdf <- data.frame(cdf=ecdf(data$mw)(data$mw)*100, mw=data$mw)
R> qplot(mw, cdf, data=mw_cdf, geom="step", xlab="Molecular weight", ylab="Total (%)", main="CDF of MW")
The ecdf() call returns a function (of class "ecdf") which you can call with a MW to calculate the probability of seeing a molecule of this, or smaller, MW in your set.
The R code above produces the following plot:
I found this useful for looking at the MW distributions for disparate groups of molecules, both together and separately. If you're going to apply a molecular weight cutoff when analysing a set of molecules, it's good to get an idea of how many you'll be excluding and this provides a very quick way of seeing that information.
Tuesday, December 6, 2011
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment