So, I tried a number of things all of which failed. This is how I got it to work for me:
> perl -le 'printf "%.4f\t%.4f\n", rand(), rand() for 1 .. 20' \ | R --vanilla --slave -e\ "data=read.delim(pipe('cat /dev/stdin'), header=F);\ cor.test(data\$V1, data\$V2)"
You have to remember to escape any special characters in the R script (
$
in this case).
Thanks, it is nice, but how to read LINE BY LINE and have a check on each line, and IF it passes that check, adding it to data? Indeed, I don't want to read all table once into R, since my data is big.
ReplyDeleteDepends on the complexity of your check. If it's a simple text match then piping the data through 'grep' before piping into R would work. For a more complex check I'd pre-process the data with perl/awk or similar.
ReplyDeleteBut you can do it in R. Take a look at the following StackOverflow thread for a couple of examples:
http://stackoverflow.com/questions/4106764/what-is-a-good-way-to-read-line-by-line-in-r
I had read that page, and indeed it has good examples. But my problem is:
ReplyDeleteI want to use the R code in a PIG script. The R code that I have written work when I do CAT Myexampl.txt | R --vanilla --slave -f MyCode.R
However, when I can it in PIG (using DEFINE and them STREAMING THROUGH), it return all NA values. I am confused why.