Thursday, March 7, 2019

R: use purrr::map_dfc to update all cells in a data.frame

I've just loaded a large data.frame generated from a set of SDFs from a collaborator. The 'empty' cell value is set as a period ('.') which isn't great for general analysis.

The quick way of converting these to NA is by using the map_dfc function from the purrr package.


# replace all insances of '.' in all columns without changing anything else.
dt <- data.frame(
a=sample(LETTERS, 10),
b=sample(c('.', 1, 2, 3), 10, replace=T),
c=sample(c('.', 'rick', 'morty', 'summer'), 10, replace=T))
# print the data.frame
dt
# use purrr::map_dfc to loop through the columns of the data.frame
dt %>%
purrr::map_dfc(
function(x) str_replace(x, '^\\.$', replacement=NA_character_))
This should output the following (note that all occurrences of '.' have been replaced with NA):
   a b      c
1  T 2  morty
2  D 1 summer
3  P 1   rick
4  O . summer
5  V 3 summer
6  M 3 summer
7  Z .   rick
8  F 2      .
9  J 2   rick
10 E 3   rick

  a     b     c
 1 T     2     morty
 2 D     1     summer
 3 P     1     rick
 4 O     NA    summer
 5 V     3     summer
 6 M     3     summer
 7 Z     NA    rick
 8 F     2     NA
 9 J     2     rick
10 E     3     rick

No comments:

Post a Comment