Split Questionnaire Designs as a clever way to make surveys shorter

A person using a stylus to interact with a digital checklist displayed over a laptop, symbolizing survey design and data collection.

Long questionnaires are a challenge for respondents and researchers alike. Split Questionnaire Designs (SQDs, Raghunathan and Grizzle, 1995) offer a clever way to make surveys shorter: instead of answering every question, respondents receive only parts of the full questionnaire. But how these parts—or “modules”—are constructed makes a difference for data quality. Our study tests strategies to balance the perspective of questionnaire design with accurate statistical estimation.

Lange Fragebögen sind eine Herausforderung – sowohl für Befragte als auch für Forschende. Split Questionnaire Designs (SQDs) verkürzen Fragebögen, indem Befragte nur Teile des vollständigen Fragebogens beantworten. Wie diese Teilfragebögen – sogenannten Module – konstruiert werden, beeinflusst jedoch die Qualität der Schätzungen. Unsere Studie testet Strategien, um die Perspektive des Fragebogendesigns mit einer akkuraten statistischen Schätzung in Einklang zu bringen.

DOI: 10.34879/gesisblog.2025.111

When Shorter Surveys Mean More Missing Data

Imagine you’re designing a large-scale survey. You want to capture opinions on a wide variety of topics—politics, social values, digital behavior—but every additional question increases respondent burden. This can have negative consequences not only for the respondent experience, but also for response rates, survey costs, and measurement error. Especially when conducting online surveys, you quickly face the need to shorten your questionnaire.

Split Questionnaire Designs seem to offer a way out: instead of giving every participant the full questionnaire, you distribute its parts across different modules and randomly assign some of them to each respondent. As a result, each respondent answers fewer questions, but together, the responses cover the entire questionnaire.

This convenience, however, comes at a price: many data points are now intentionally missing. To make the data analyzable—especially for multivariate analyses—these planned missing values must later be imputed statistically. And here begins the core challenge: how to split the questionnaire in a way that keeps both respondents and data analysts happy.

The Dilemma: Questionnaire Perspective vs. Imputation Perspective

From a statistical perspective, highly correlated variables should be separated into different modules.¹ In practice, this can be approximated, for example, by randomly allocating questions to different modules.² This helps imputation algorithms reconstruct the missing data more accurately.

This can be explained by a short example: Think of two variables that are strongly associated with each other—for example, concerns about climate change and support for environmental policies. If both variables are placed in different modules, one of them will be observed for many respondents when the other is missing, providing useful information to impute the other (assuming that there are enough pairwise observations realized). But if they are placed in the same module, they will always be observed or missing together, leaving the imputation algorithm with little information to work with.

From a questionnaire design perspective, though, this idea can seem odd. As survey methodologists Smyth³ and Krosnick & Presser⁴ note, grouping thematically related questions can help respondents recall relevant information and maintain attention. Therefore, a faster pace of topic changes and some questions being left out within each topic might make the questionnaire appear inconsistent or confusing to respondents, putting the effectiveness of the measurement instrument into question. Thus, questionnaire designers may prefer to allocate all questions on a given topic to the same module.

In other words, what’s best for the quality of the imputed data might not be best for the questionnaire. The art of split questionnaire design lies in this tension between statistical efficiency and questionnaire coherence.

A Core to Hold It All Together

To reconcile these perspectives, researchers sometimes implement an extended core module into their Split Questionnaire Design—a set of questions that everyone receives. The rest of the questions are distributed across different modules, of which only a random subset is administered to each respondent. For example, such an extended core module strategy combined with single-topic modules has been implemented in the European Values Study.⁵

Our study examined whether extending this core with the most informative variables from the different survey topics could help single-topic modules perform as well as random modules. The idea is simple: if the core includes key variables that are strongly correlated with others, perhaps we can enjoy the benefits of thematic coherence and reliable imputation.

We tested this hypothesis through a Monte Carlo simulation based on data from the German Internet Panel,⁶ ⁷ comparing four strategies:

Single-topic modules (STM) with a simple core (only sociodemographics)
Single-topic modules with an extended core (including correlated variables from the different survey topics)
Random modules (RM) with a simple core
Random modules with an extended core

What the Simulations Reveal

The results were clear. For single-topic modules, extending the core improved the quality of imputed estimates—the deviations from the benchmark were smaller. However, even with this improvement, single-topic modules did not outperform random module allocation. For random modules, the extended core made little to no difference.

In short:

Extended cores help single-topic modules, making their estimates more reliable.
Random modules still perform best overall, irrespective of whether an extended core is included.

In other words, while an extended core can make single-topic modules less problematic, random allocation remains the most robust solution from an imputation perspective.

Box plot illustrating the absolute average Monte Carlo deviations of frequencies for different questionnaire designs: Simple Core Single-Topic Modules (STM), Extended Core STM, Simple Core Random Modules (RM), and Extended Core RM. — Note to the figure: The boxplot shows the absolute values of the average Monte Carlo deviations for frequency estimates for the included variables of the German Internet Panel for the four included strategies (single topic allocation vs. random allocation with and without extended core). Each point is related to a certain frequency of variable. The estimates are averaged across the simulation runs. The closer the points are to zero, the more accurate the estimate.

Source: Axenfeld, J. B., Bruch, C., & Wolf, C. (2025). Composition of core modules and item allocation in split questionnaire designs: impact on estimates from imputed data. International Journal of Social Research Methodology, 1–24. https://doi.org/10.1080/13645579.2025.2561653

Balancing Questionnaire Consistency and Data Quality

For practitioners designing large-scale surveys, these results offer guidance on how to navigate the crucial trade-off between questionnaire consistency and the quality of the imputed data. From a purely statistical perspective, random modules are clearly the best option available. This strategy also has the advantage that no extended core module is needed, which helps keep the questionnaire short.

Yet, when priority is given to a slower pace of questionnaire topics and avoiding potential disruption through certain questions being left out within each topic, single-topic modules with an extended core may offer a compromise. However, this strategy falls behind random modules in terms of imputed data quality and results in longer questionnaires due to the extended core module presented to every respondent.

It should also be noted that distributing survey topics across different modules does not necessarily imply that topics are spread throughout the entire questionnaire or that question order changes. Even with random modules, questions can be presented in their original order—only those from non-assigned modules are omitted. Hence, the potential advantages of single-topic modules in terms of questionnaire consistency mainly lie in the slower pace of topic changes and the absence of skipped questions within a topic.

Looking Ahead

Future research could explore how single-topic modules versus random modules—as well as extended core modules—affect response behavior in practice. Can the statistical drawbacks of single-topic modules be offset by higher response quality due to their thematic coherence? Do the more abrupt topic changes in random modules irritate respondents, or does their diversity help prevent boredom? And perhaps most importantly, how do these strategies affect respondent burden and response rates?

Main Reference

Axenfeld, J. B., Bruch, C., & Wolf, C. (2025). Composition of core modules and item allocation in split questionnaire designs: impact on estimates from imputed data. International Journal of Social Research Methodology, 1–24. https://doi.org/10.1080/13645579.2025.2561653

References

Raghunathan, T. E., & Grizzle, J. E. (1995). A split questionnaire survey design. Journal of the American Statistical Association, 90(429), 54–63. https://doi.org/10.1080/01621459.1995.10476488
Axenfeld, J. B., Blom, A. G., Bruch, C., & Wolf, C. (2022a). Split questionnaire designs for online surveys: The impact of module construction on imputation quality. Journal of Survey Statistics and Methodology, 10(5), 1236–1262. https://doi.org/10.1093/jssam/smab055
Smyth, J. D. (2016). Designing questions and questionnaires. In C. Wolf, D. Joye, T. W. Smith, & Y.-C. Fu (Eds.), The SAGE Handbook of survey methodology (pp. 218–235). SAGE Publications.
Krosnick, J. A., & Presser, S. (2010). Question and questionnaire design. In P. V. Marsden & J. D. Wright (Eds.), Handbook of survey research (pp. 263–314). Emerald Group Publishing Limited.
Luijkx, R., Jónsdóttir, G. A., Gummer, T., Ernst Stähli, M., Frederiksen, M., Ketola, K., Wolf, C., Brislinger, E., Christmann, P., Gunnarsson, S. Þ., Hjaltason, Á. B., Joye, D., Lomazzi, V., Maineri, A. M., Milbert, P., Ochsner, M., Pollien, A., Sapin, M., Solanes, I., … , and Wolf, C. (2021). The European Values Study 2017: On the way to the future using mixed-modes. European Sociological Review, 37(2), 330–346, https://doi.org/10.1093/esr/jcaa049.
Blom, A. G., Gathmann, C., & Krieger, U. (2015). Setting up an online panel representative of the general population: The German Internet panel. Field Methods, 27(4), 391–408. https://doi.org/10.1177/1525822X15574494
Cornesse, C., Felderer, B., Fikel, M., Krieger, U., & Blom, A. G. (2021). Recruiting a probability-based online panel via postal mail: Experimental evidence. Social Science Computer Review, 40(5), 1259–1284. https://doi.org/10.1177/08944393211006059