HMS777: Missing Data

Monday, April 11, 2011

Missing Data

I'd like to ask about some comments the study guide makes (following de Vaus) about the effects of using statistical imputation for missing data.

For example, (p16 of SG), it says : group means approach ... can inflate correlation between variables.

I'm not sure if my method is valid, what what I did is get the "cars" dataset( (from the R package datasets ). This dataset has 50 cases, and two variables. Speed and distance to stop.

I deleted 4 items from the distance variable, and replaced the missing data by each of the 3 statistical imputation methods.

The correlation between the 2 variables without missing data was 0.806, and the correlation for the datasets with missing values replced by group means was 0.785. I would have thought to inflate the correlation is to have a higher correlation; that's not the case here.

I also got a result that was not consistent with the SG for Random assignment method. The SG says variability not affected by this method, whereas I got a different level of variability.

Is this a valid way to approach the issue, and state that the SG / de Vaus is not correct on this matter.

--------------

Theoretically, similar to sample mean approach, group mean approach can lead to reduced variability and smaller correlation between the variables. Hence it seems to be an error in the textbook and Topic Notes, correspondingly. Unless we a re missing something in de Vaus' reasoning. Random assignment method theoretically should not affect variability. If you want to use replacement methods, my advice is to analyse data both with and without the missing value replaced. If the replacement method produces very different results, don't use that method.

HMS777

Monday, April 11, 2011

Missing Data

No comments:

About Me

Search This Blog

Followers

Blog Archive