The new normal: Linear equating of different instruments

As an alternative to linear stretching, we look into the observed score equating approach in this post and the next. The easiest form of observed score equating is linear equating: a powerful approach that corrects for biases in the mean and standard deviation while harmonizing different instruments.

Als Alternative zu “linear stretching” betrachten wir dieses und nächstes Mal den “observed score equating”-Ansatz. Die einfachste Variante des “observed score equatings” ist “linear equating”: ein mächtiger Ansatz, der Verzerrungen des Mittelwerts und der Standardabweichung bei der Harmonisierung unterschiedlicher Messinstrumente korrigiert.

DOI: 10.34879/gesisblog.2021.33


Harmonizing single questions for latent constructs in surveys is no easy task. The December post illustrated the challenge, and the January post showed that easy solutions like linear stretching are not always best.  If you recall, we want to ensure that respondents who are the same with regard to the measured concept should (on average) be represented with the same numerical value in a harmonized variable, regardless of the source instrument or source survey.

To achieve that, ex-post harmonization must solve a tough problem. How can we ensure that people who are similar in reality get the same numerical value if we do not know which people are similar? The true latent construct intensity of our respondents is, after all, hidden from us. Instead, we only have the observed responses which are discrete, ordinal projections of the underlying latent reality 1. And, frustratingly, different measurement instruments would project the same respondents onto different response options. Below, we see a simplified example of two instruments, A and B, with seven and five scale points respectively. Through differences in question wording or response category labels average participants choose different responses: a 4 in A or a 2 in B. The visual layout already shows that linear stretching would not be enough to align the average responses. An average response of “2” in instrument B would become a “2.5” after stretching; still a far cry away from the corresponding average response of “4” in instrument A.

So, in sum, the responses in our datasets are a mix of measurement and reality. Harmonization across different instruments then requires that we disentangle latent reality and observed measurement.

Observed Score Equating

This is where observed score equating comes into play. Equating in general is a family of psychometric approaches to align the numerical format of different measurement instruments 2. Some of those methods require instruments with multiple items. Observed score equating, however, works with single-question instruments as well 3.

Observed Score Equating step-by-step

The initial problem is that in ex-post harmonization we only have the response distributions for the two instruments we want to harmonize: i.e., the response frequencies for instrument A and instrument B. What we do not have is an insight into the actual latent construct intensity of our respondents. And this makes the responses of the two instruments hard to compare. We just do not know how respondents of different instruments relate to each other in terms of their true construct intensity. Consider again the example above, where we average respondents would get completely different values in our harmonized dataset depending on the instrument used. And that even if the different number of scale points has been aligned with linear stretching.

1. Random Groups Design

Observed score equating solves the problem of aligning respondents’ true latent construct intensities without knowing them by using a special research design. The random groups design means that we need data for both instruments randomly drawn from the same population 4. We still do not know respondents true latent construct intensities. We also do not know the shape of the distribution of those latent intensities. However, through the random groups design we know that the latent distribution is the same for both instruments (barring random error).

In essence, we set the latent level as equal with a random experiment. (If you will recall: This is basically the “population link” from December.) In fact, such an anchoring of instrument scores to a common population is nothing new. For example, the IQ is nothing else than raw test scores anchored to a population so that 100 represents the average score, and 15 IQ points usually represent one standard deviation.

2. Align observed response distributions

In the random groups design above, we then just have to align the shapes of the observed response distributions 5 . After all, the random groups design controls for differences in the latent distribution. Differences in the response distributions now only represent differences in measurement. Removing these differences by aligning distributions better match respondents’ latent values. The logic behind aligning the response distributions based on random samples of the same population is that we position response options along the population distribution. We then match participants with similar positions in the population with regard to their construct intensity.

Linear Equating

Now all that remains is to figure out how to align the shape of different response distributions. The so-called linear equating is the easiest form of observed score equating. In a nutshell: If the response distributions are approximately normally distributed, then the job is easy. Normal distributions can be defined by only two parameters: The mean (i.e., their position along the x-axis) and their standard deviation (i.e., how narrow or broad it is). To transform the response distribution of instrument B into that of instrument A, we only have to add or subtract to align the mean and multiply or divide to align the standard deviation. Or in other words: We perform a linear transformation.

Below is a short animation that illustrates the steps: (0) Start, (1) align the means, (2) align the SDs, and (3) the resulting recoding table that we want. Please note that the normal distributions symbolize the observed response distributions we want to align.

In essence, we can imagine linear equating to align respondents of the two instruments by their relative position in our study population. Participants with above (+1 SD), average (mean), or below average (-1 SD) construct intensity are aligned. Based on this we can derive recoding information so that we can transform scores on one instrument into the format of the other and vice versa.

(For a more formal explanation see Kolen and Brennan’s brilliant book 6. And if you want to try it out yourself, get the “equate” Package for R 7)

Summary and outlook

Observed score equating aligns the numerical format of two instruments so that participants with the same construct intensity get assigned the same number on average. This form of equating requires a special research design: Data for both instruments has to be drawn from the same population. However, please note that equating harmonizes instruments and not just data. Equating can be performed in one dataset, with data for both instruments randomly sampled from the same population, to derive a recoding table for both instruments. Then, this recoding table can be applied to other instances where the same instruments were used. 

Linear equating is a special form of observed score equating. It aligns response distributions by treating them as essentially normally distributed and then aligning the parameters mean and standard deviation. This approach solves two of the three shortcomings of linear stretching discussed in last month’s blog post. First, it accounts for questions that are easier or harder to agree to by aligning the means. Second, it accounts for whether respondents use the full range of a scale or only part of it by aligning the standard deviations. The latter also elegantly solves the problem of differing numbers of response options.

Observed score equating should thus, in my mind, become the new normal (pardon the pun). However, linear equating works less well if response distributions are not normally distributed. And response distributions are, in fact, often skewed or sometimes even bimodal. So, in next month’s blog post we will look into another variant of observed score equating which can accommodate different response distribution shapes: Equipercentile Equating.


  1. Raykov, T., & Marcoulides, G. A. (2011). Introduction to Psychometric Theory. Routledge.
  2. Kolen, M. J., & Brennan, R. L. (2014). Test Equating, Scaling, and Linking (3rd ed.). Springer.
  3. Singh, R. K. (2020). Harmonizing Instruments with Equating. Harmonization: Newsletter on Survey Data Harmonization in the Social Sciences, 6(1).
  4. Singh, R. K. (2020). Harmonizing Instruments with Equating. Harmonization: Newsletter on Survey Data Harmonization in the Social Sciences, 6(1).
  5. Kolen, M. J., & Brennan, R. L. (2014). Test Equating, Scaling, and Linking (3rd ed.). Springer.
  6. Kolen, M. J., & Brennan, R. L. (2014). Test Equating, Scaling, and Linking (3rd ed.). Springer.
  7. Albano, A. D. (2016). equate: An R Package for Observed-Score Linking and Equating. Journal of Statistical Software, 74(8).

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.