# Swiss cheese and MICE: Harmonizing instruments with multiple imputation

*This time, we apply multiple imputation to harmonize data for the same construct measured different instruments. We will treat data as swiss cheese and then unleash mice, sorry MICE, on it. The approach will pose some hurdles regarding the required data and the analysis complexity. However, if those hurdles are met, it can be a flexible and powerful tool for ex-post harmonization.*

*Dieses Mal wenden wir “Multiple Imputation” an, um Daten zu harmonisieren die das gleiche Konstrukt abbilden, aber mit unterschiedlichen Instrumenten gemessen wurden. Dabei stellen wir uns Daten als löchrigen Schweizer Käse vor und lassen Mäuse, Verzeihung MICE, drauf los. Der Ansatz hat gewisse Hürden bezüglich der benötigten Daten und der Auswertungskomplexität. Werden diese Hürden überwunden, so haben wir ein weiteres mächtiges Werkzeug für die ex-post Harmonisierung gewonnen.*

DOI: 10.34879/gesisblog.2021.38

In this post of our series on harmonization, we will look at a completely different way of harmonizing surveys and their measurement instruments. We will be seeing our data as swiss cheese (i.e., full of holes) and then applying, you guessed it, MICE. Or, more generally: multiple imputation! (MICE – Multiple Imputation with Chained Equations ^{1} is a popular approach to multiple imputation, but it is not the only one. It just lends itself very well to my cheesy pun. 😉)

To understand the basic idea and its application in ex-post harmonization, we will first look at the “imputation” part of multiple imputation. We will then talk about the “multiple” part to understand why single imputation is insufficient and why multiple imputation means running analyses quite differently than we are used to.

### The basic idea and research design

Imputation procedures are, as you probably know, methods to fill in missing values in datasets. That is a controversial idea to some, but also an attractive idea to many researchers, especially in fields with costly data collection and frequent missing values, such as medicine ^{2}. In this post, we will not discuss imputation in general, but rather its application for a specific case in ex-post harmonization: harmonizing data from **two measurement instruments (A and B) for the same concept**. Imagine a dataset combined from different sources, where some participants only answered instrument A and some participants only answered instrument B. Now visualize the combined dataset with separate variables for A and for B. In that case, we are left with (structurally) missing values for the instrument that the respective respondent did not answer.

Once we conceptualize this harmonization problem as a problem of missing values, it is no leap to think of imputation. However, there is a snag. Imputation draws its “best guesses” for what to impute from the relationship between variables in cases where there are valid values for both variables. If respondents only answered instrument A or instrument B, but no respondent answered both, then the imputation procedure cannot establish a direct relationship between A and B. Sure, it can still impute missing values of A and B based on other variables in the dataset. But it would completely ignore the values of A to impute B, and values of B to impute A. That is hardly desirable.

Consequently, harmonizing instruments with imputation requires a **special data structure**. To harmonize data captured with instruments A and B, we also need some cases where respondents answered both instruments ^{3}. This is not unheard of in survey data in the social sciences but usually not the norm. Sometimes, there were methodological experiments. At other times, data for two survey programs are collected together (e.g., the ALLBUS and the ISSP in Germany). Or we could use panel data in panels where waves are frequent, or the construct is very stable over time. However, in most cases, we will have to supply data where respondents saw both instruments ourselves. Such data collected for the express purpose of ex-post harmonization of instruments are called a “**calibration sample**.”^{4} In the broader literature on linking, scaling, and equating, “calibration” is another term to describe making the numerical format of different instruments comparable ^{5}. If you do collect a calibration sample, please be mindful of question order effects. To mitigate such effects, space the two instruments further apart and randomize the order in which they appear.

#### The data structure illustrated

In the image below, we see a schematic view of the data structure. Cases 1 to 3 represent survey data where only instrument A was used. Cases 7 to 9 represent survey data where only instrument B was used. These are the data points we want to harmonize. Cases 4 to 6 in the middle, meanwhile, represent data where both instruments were applied to the same respondents. These cases are the calibration data that bridge the two instruments. The two right columns are other variables (covariates) that are correlated with the instruments and may help supplement the imputation with additional information.

Still, the question of whether this is a suitable approach for your harmonization project depends on how much time and resources you have. Please note, however, that the calibration sample does not have to be a probabilistic sample of your target population. Instead, less expensive samples (and survey modes) might be a very serviceable alternative. Think of non-probabilistic online access panels, for example. Even so, a word of caution: If the calibration sample has been drawn from a population in which the instruments are interpreted quite differently than the populations we want to harmonize, then the imputation may be biased. But this bias may be mitigated by taking other covariates into account. These covariates may allow us to model and thus address some population differences.

### Multiple multiples

So far, so good! However, if you were not previously familiar with multiple imputation, you might still wonder about the “multiple” part of multiple imputation. This aspect is quite important because multiple imputation cannot be used to impute the dataset once and then analyze the data as we would with conventional, complete data. Instead, multiple imputation changes the whole analysis process.

Let us perhaps start with a basic problem and an ingenious solution. The **basic problem** of traditional “single” imputation methods was that imputation only represents a best guess of which value to impute in a certain variable for a certain respondent. However, since we only impute a single value, the uncertainty behind that best guess is lost ^{6}. Imagine imputing a dichotomous variable. Now, imagine that nearly everyone chose “yes” in one sociodemographic group, while in another sociodemographic group, only a bit over half of the respondents chose “yes.” In both cases, single imputation might impute “yes” for some respondents. However, if the respondent is from the first group, then this “yes” is a far more certain guess than the “yes” of someone from the other group.

**Multiple imputation** circumvents this problem by not deciding on a single value. Instead, it performs several imputations so that multiple values are calculated for each empty cell. (I am simplifying enormously here, of course. ^{7}) In this set of values, more likely imputation candidates are more frequent, and less likely imputation candidates are less frequent. The information on the uncertainty of our imputation choices is preserved.

Of course, we cannot just write several values into each cell and call it a day. How would we even analyze something like that? To solve this, multiple imputation does everything multiple times. First, we take our incomplete dataset and impute its missing values multiple times by repeating the imputation with a random component. The result is multiple copies of the dataset, with different values imputed. These values represent the probability of the imputed values. Then, we perform our analysis multiple times in parallel: Once for each imputed dataset. This creates multiple analysis results; one set of coefficients for each imputed dataset. In the last step, the results of all parallel analyses are aggregated into a single analysis result. The schematic figure below summarizes the general idea ^{8}:

Of course, we do not have to perform each step by hand. **Statistical software** and specialized packages take over some of the work—for example, the MICE package for R, which coincidentally follows the MICE approach to multiple imputation ^{9}. Still, multiple imputation requires a bit of art and not just science. For one, converging multiple imputations can require some finetuning, as it is with most random iterative algorithms. Furthermore, if you want to calculate complex analyses, packages might not automatically combine the results. This means manually aggregating the respective coefficients, which can be an involved process and requires an advanced level of statistical expertise.

### To impute or not impute…

Multiple imputation is certainly an interesting approach to the ex-post harmonization of survey instruments. It is very flexible and can address nominal, ordinal, and continuous variables with the same ease. Still, there are three hurdles: First, we need at least some cases where respondents answered both instruments. Often, this means collecting a calibration sample. Second, multiple imputation can make the analysis process far more involved. This also implies that the approach is not very suitable for providing an ex-post harmonized dataset. And third, some researchers may not be comfortable with the idea of imputation in general.

Nonetheless, if you have the appropriate data structure or resources to collect a calibration sample, multiple imputations should alt least be considered. And if your project already involves multiple imputation to solve other missing-value problems, then the choice seems obvious. Lastly, approaches like equating and multiple imputation are not mutually exclusive. Instead, different concepts can be harmonized with different approaches. For example, equate where you have equating data and impute where you can get imputation data.

### References

- Buuren, S. van, & Groothuis-Oudshoorn, K. (2011). mice: Multivariate Imputation by Chained Equations in R.
*Journal of Statistical Software*,*45*(3). https://doi.org/10.18637/jss.v045.i03 - Quartagno, M., & Carpenter, J. R. (2016). Multiple imputation for IPD meta‐analysis: Allowing for heterogeneity and studies with missing covariates.
*Statistics in Medicine*,*35*(17), 2938–2954. https://doi.org/10.1002/sim.6837 - Siddique, J., Reiter, J. P., Brincks, A., Gibbons, R. D., Crespi, C. M., & Brown, C. H. (2015). Multiple imputation for harmonizing longitudinal non-commensurate measures in individual participant data meta-analysis.
*Statistics in Medicine*,*34*(26), 3399–3414. https://doi.org/10.1002/sim.6562 - Siddique, J., Reiter, J. P., Brincks, A., Gibbons, R. D., Crespi, C. M., & Brown, C. H. (2015). Multiple imputation for harmonizing longitudinal non-commensurate measures in individual participant data meta-analysis.
*Statistics in Medicine*,*34*(26), 3399–3414. https://doi.org/10.1002/sim.6562 - Kolen, M. J., & Brennan, R. L. (2014).
*Test Equating, Scaling, and Linking*(3rd ed.). Springer. https://doi.org/10.1007/978-1-4939-0317-7 - Azur, M. J., Stuart, E. A., Frangakis, C., & Leaf, P. J. (2011). Multiple imputation by chained equations: What is it and how does it work?: Multiple imputation by chained equations.
*International Journal of Methods in Psychiatric Research*,*20*(1), 40–49. https://doi.org/10.1002/mpr.329 - Azur, M. J., Stuart, E. A., Frangakis, C., & Leaf, P. J. (2011). Multiple imputation by chained equations: What is it and how does it work?: Multiple imputation by chained equations.
*International Journal of Methods in Psychiatric Research*,*20*(1), 40–49. https://doi.org/10.1002/mpr.329 - Azur, M. J., Stuart, E. A., Frangakis, C., & Leaf, P. J. (2011). Multiple imputation by chained equations: What is it and how does it work?: Multiple imputation by chained equations.
*International Journal of Methods in Psychiatric Research*,*20*(1), 40–49. https://doi.org/10.1002/mpr.329 - Buuren, S. van, & Groothuis-Oudshoorn, K. (2011). mice: Multivariate Imputation by Chained Equations in R.
*Journal of Statistical Software*,*45*(3). https://doi.org/10.18637/jss.v045.i03

## Leave a Reply