J'ai créé un exemple reproductible pour illustrer le problème que je rencontre avec l'évaluation non standard dans R (dplyr). J'aimerais utiliser des noms de variables dynamiques dans le scénario ci-dessous :
# Given a data frame of patient data, I need to find records containing date logic errors.
# My datasets are enormous but here is a tiny example
patientData <- data.frame(
patientID = 1:20,
birth_d = seq(as.Date("2010-01-01"),by = 90, length.out = 20),
treat_d = seq(as.Date("2011-01-01"),by = 90, length.out = 20),
death_d = seq(as.Date("2012-01-01"),by = 90, length.out = 20)
)
# To create some random records that will be in error (death_d before birth_d, birth_d after treat_d, etc):
patientData$birth_d[5] <- as.Date("2017-01-01")
patientData$death_d[7] <- as.Date("2001-01-01")
patientData$treat_d[10] <- as.Date("2018-01-01")
patientData$birth_d[12] <- as.Date("2018-05-05")
# To determine which records have birth_d after death_d I could do the following:
badRecords <- patientData %>% filter(death_d < birth_d)
OR
badRecords <- patientData %>% mutate(dateDiff = death_d - birth_d) %>% filter(dateDiff < 0)
# But in my large application (with lots and lots of date variables)
# I want to be able to use the date field names as *variables* and, using one date pair at a time,
# determine which records have dates out of sequence. For example,
firstDateName <- "birth_d"
secondDateName <- "death_d"
# I would like to do this, but it doesn't work
badRecords <- patientData %>% filter(!!firstDateName > !!secondDateName)
# This doesn't work...
badRecords <- patientData %>% mutate(dateDiff = !!secondDateName - !!firstDateName) %>% filter(dateDiff < 0)
# Neither does this... it creates a dateDiff data frame.. with 20 duplicate records
badRecords <- patientData %>% mutate(dateDiff = .[secondDateName] - .[firstDateName]) %>% filter(dateDiff < 0)
`