3 votes

R dplyr : Difficulté d'évaluation non standard. J'aimerais utiliser des noms de variables dynamiques dans filter et mutate.

J'ai créé un exemple reproductible pour illustrer le problème que je rencontre avec l'évaluation non standard dans R (dplyr). J'aimerais utiliser des noms de variables dynamiques dans le scénario ci-dessous :

# Given a data frame of patient data, I need to find records containing date logic errors.
# My datasets are enormous but here is a tiny example

patientData <- data.frame(
      patientID = 1:20,
      birth_d = seq(as.Date("2010-01-01"),by = 90, length.out = 20),
      treat_d = seq(as.Date("2011-01-01"),by = 90, length.out = 20),
      death_d = seq(as.Date("2012-01-01"),by = 90, length.out = 20)
 )

# To create some random records that will be in error (death_d before birth_d, birth_d after treat_d, etc):

patientData$birth_d[5] <- as.Date("2017-01-01")
patientData$death_d[7] <- as.Date("2001-01-01")
patientData$treat_d[10] <- as.Date("2018-01-01")
patientData$birth_d[12] <- as.Date("2018-05-05")

# To determine which records have birth_d after death_d I could do the following:

badRecords <- patientData %>% filter(death_d < birth_d)

OR

badRecords <- patientData %>% mutate(dateDiff = death_d - birth_d) %>% filter(dateDiff < 0)

# But in my large application (with lots and lots of date variables) 
# I want to be able to use the date field names as *variables* and, using one date pair at a time,
# determine which records have dates out of sequence. For example,

firstDateName <- "birth_d"
secondDateName <- "death_d"

# I would like to do this, but it doesn't work
badRecords <- patientData %>% filter(!!firstDateName > !!secondDateName)

# This doesn't work... 
badRecords <- patientData %>% mutate(dateDiff = !!secondDateName - !!firstDateName) %>% filter(dateDiff < 0)

# Neither does this... it creates a dateDiff data frame.. with 20 duplicate records
badRecords <- patientData %>% mutate(dateDiff = .[secondDateName] - .[firstDateName]) %>% filter(dateDiff < 0)

`

3voto

G. Grothendieck Points 40825

1) rlang Utilisation sym comme ceci :

library(dplyr)
library(rlang)

firstDateName <- sym("birth_d")
secondDateName <- sym("death_d")
badRecords <- patientData %>% filter(!!firstDateName > !!secondDateName)

donner :

> badRecords
  patientID    birth_d    treat_d    death_d
1         5 2017-01-01 2011-12-27 2012-12-26
2         7 2011-06-25 2012-06-24 2001-01-01
3        12 2018-05-05 2013-09-17 2014-09-17

2) Base R ou dans la base R :

firstDateName <- "birth_d"
secondDateName <- "death_d"
is.bad <- patientData[[firstDateName]] > patientData[[secondDateName]]
badRecords <- patientData[is.bad, ]

2a) sous-ensemble Une autre solution de base consisterait à remplacer les deux dernières lignes ci-dessus par :

subset(patientData, get(firstDateName) > get(secondDateName))

1voto

akrun Points 148302

Voici une option avec parse_expr de rlang

library(rlang)
library(dplyr)
patientData %>%
        filter(!! parse_expr(paste(firstDateName, ">", secondDateName)))
#   patientID    birth_d    treat_d    death_d
#1         5 2017-01-01 2011-12-27 2012-12-26
#2         7 2011-06-25 2012-06-24 2001-01-01
#3        12 2018-05-05 2013-09-17 2014-09-17

Prograide.com

Prograide est une communauté de développeurs qui cherche à élargir la connaissance de la programmation au-delà de l'anglais.
Pour cela nous avons les plus grands doutes résolus en français et vous pouvez aussi poser vos propres questions ou résoudre celles des autres.

Powered by:

X