I am using
R to impute random missing data. I ran into a problem when attempting to account for conditional or structured
NAs in a dataset.
A simple dataset to illustrate the problem:
TestData <- data.frame(Condition= c(1,1,1,1,2,NA,2,2), Dependent1=c(1,NA,2,3,NA,NA,NA,NA), Dependent2=c(1,12,44,1,NA,NA,NA,NA), Dependent3=c(NA,2,3,5,NA,NA,NA,NA), UnaffiliatedQ=c(1,NA,3,2,27,NA,32,35)) TestData$Condition <- factor(TestData$Condition, levels = c(1,2), labels = c("Yes","No"))
In this example, the variable
Condition is a gatekeeper question which determines whether a respondent needs to fill the next three questions,
Dependent#. If a respondent answers with "No" and he/she does not see the next three questions, then they are marked as
NAs - though not technically missing.
What can I do in this type of situation? If I Impute the
NA value in the
Condition variable, along with those in
Dependent3, how would I ensure that I don't end up with values in
Dependent# that don't make sense?
I've thought of possible solutions, but none that I think would be valid or a good idea,e.g., creating a structured missing value like
-999 subsetting the dataframe based on conditional answers.
In reading through the documentation and paper of
mices authors I don't see any arguments in
mice for this type of situation. The other alternative is that I've simply been running down the rabbit hole of multiple imputation and this is not the correct use of it.