ERIC Number: ED663540
Record Type: Non-Journal
Publication Date: 2024-Sep-19
Pages: N/A
Abstractor: As Provided
ISBN: N/A
ISSN: N/A
EISSN: N/A
Latent Factor Score Estimation with Missing Data
Ehri Ryu
Society for Research on Educational Effectiveness
Background/Context: Confirmatory factor analysis (CFA) model is a commonly adopted framework to estimate and test a measurement model. Once a well-fitting final CFA model is selected, the selected model may be used to test structural relationships of the latent constructs with other variables, to construct a test with desired reliability and validity, or to save the estimated latent factor scores to be analyzed in subsequent analyses. Under missing-at-random assumption for missing data mechanism, full information maximum likelihood (FIML) has been shown to perform well to produce unbiased estimates. Unless a separate exclusion criterion is applied, all observations with at least one non-missing value contribute to the FIML estimation. In CFA models with two or more latent factors, it is possible that a factor score is estimated for observations with no observed indicator score for the factor, if they have non-missing data on the indicators that load on the other factors. In this case, the factor score is estimated with complete absence of observed data to measure the construct represented by the latent factor. Purpose/Objective/Research Question: This study investigates the statistical properties of the estimated factor scores when the CFA model is estimated with missing data, particularly the impact of sparse data due to missingness. The estimated factor scores are calculated based on the estimated parameters in the CFA model (parameter estimates) and the observed indicator values (data). This study focuses on the role of data when parameter estimates hold desirable statistical properties. Population/Participation/Subjects: A simulation study is conducted to empirically examine the statistical properties of the estimated factor scores. This study does not involve any human or animal subject. Research Design: A two-factor CFA model with six indicators (three indicators per factor with no cross-loading) is used. The following conditions are varied: missing data mechanism (MCAR or MAR), missing patterns, proportion of missingness, communality of the indicators, correlation between the two factors, and sample size. Missing values are generated in the set of three indicators (X1, X2, X3) for the first factor (F1). The other three indicators (X4, X5, X6) always have complete data. Under MAR mechanism, the probability of missingness is manipulated as a function of the other three indicators (X4, X5, X6) that load on the second factor (F2), so that the analysis model always meets the MAR assumption. Factor scores are estimated in the correctly specified model. The following statistics are examined: means, standard deviations, skewness, and kurtosis of the saved factor scores; correlations between the true factor score and saved factor score; correlation between two saved factor scores; correlation between the saved factor scores and an external criterion variable. Finding/Results: Selected preliminary findings are presented in Figures 1 to 3. In Figure 1, the correlation between the true factor score Ksi2 and the saved factor score F2 (i.e., the factor whose indicators have no missing value) was not affected by missing values. For the first factor (Ksi1, F1), the correlation showed more severe bias, compared to the correlation with no missing value, with a higher proportion of missing values. In Figure 2, the correlation between the saved factor scores (F1, F2) was more biased with a higher proportion of missing values, when the two factors were correlated. In Figure 3, The correlation of the saved factor score with an external criterion variable was more severely biased with a higher proportion of missingness. Missing values also affected the correlation of the second factor (whose indicators did not have any missing values) with the external criterion, in smaller magnitude, when the two factors were correlated in population. Conclusions: The preliminary findings showed that the statistics associated with the estimated factor scores are affected by missing data, even when the model parameters are unbiased. This study also compares the statistics to those obtained using two alternative methods: one estimated in latent variable models, and the other estimated with selected factor scores after excluding observations with extremely sparse data. These comparisons will provide practical guidelines to reduce the problem due to missing data.
Descriptors: Research Problems, Factor Analysis, Scores, Computation, Test Construction, Test Reliability, Test Validity, Statistical Bias, Correlation
Society for Research on Educational Effectiveness. 2040 Sheridan Road, Evanston, IL 60208. Tel: 202-495-0920; e-mail: contact@sree.org; Web site: https://www.sree.org/
Publication Type: Reports - Research
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: Society for Research on Educational Effectiveness (SREE)
Grant or Contract Numbers: N/A