Sunday, February 13, 2011

Reading stdin from a pipe for command-line R shenanigans

I think I rely on command-line tomfoolery a little bit too much. Today I wanted to pipe some data into R and have the commands to run defined on the command line as well. This is the kind of thing I do with Perl or bash to get some quick answers and I'd love to add R to the repertoire.

So, I tried a number of things all of which failed. This is how I got it to work for me:
> perl -le 'printf "%.4f\t%.4f\n", rand(), rand() for 1 .. 20' \
 | R --vanilla --slave -e\
 "data=read.delim(pipe('cat /dev/stdin'), header=F);\
  cor.test(data\$V1, data\$V2)"

You have to remember to escape any special characters in the R script ($ in this case).

3 comments:

  1. Thanks, it is nice, but how to read LINE BY LINE and have a check on each line, and IF it passes that check, adding it to data? Indeed, I don't want to read all table once into R, since my data is big.

    ReplyDelete
  2. Depends on the complexity of your check. If it's a simple text match then piping the data through 'grep' before piping into R would work. For a more complex check I'd pre-process the data with perl/awk or similar.

    But you can do it in R. Take a look at the following StackOverflow thread for a couple of examples:

    http://stackoverflow.com/questions/4106764/what-is-a-good-way-to-read-line-by-line-in-r

    ReplyDelete
  3. I had read that page, and indeed it has good examples. But my problem is:
    I want to use the R code in a PIG script. The R code that I have written work when I do CAT Myexampl.txt | R --vanilla --slave -f MyCode.R
    However, when I can it in PIG (using DEFINE and them STREAMING THROUGH), it return all NA values. I am confused why.

    ReplyDelete