Standardizing Survey Data Collection to Enhance Reproducibility: Development and Comparative Evaluation of the ReproSchema Ecosystem

Key findings

ReproSchema is a schema-centric ecosystem for survey-based data collection that met all 14 FAIR criteria and supported six of eight key survey functionalities across a 12-platform comparison, with three use cases — NIMH-Minimal, ABCD/HBCD longitudinal tracking, and a neuroimaging best-practices checklist — illustrating its reproducibility benefits.

Abstract

Source: pubmed

BACKGROUND: Inconsistencies in survey-based (eg, questionnaire) data collection across biomedical, clinical, behavioral, and social sciences pose challenges to research reproducibility. ReproSchema is an ecosystem that standardizes survey design and facilitates reproducible data collection through a schema-centric framework, a library of reusable assessments, and computational tools for validation and conversion. Unlike conventional survey platforms that primarily offer graphical user interface-based survey creation, ReproSchema provides a structured, modular approach for defining and managing survey components, enabling interoperability and adaptability across diverse research settings. OBJECTIVE: This study examines ReproSchema's role in enhancing research reproducibility and reliability. We introduce its conceptual and practical foundations, compare it against 12 platforms to assess its effectiveness in addressing inconsistencies in data collection, and demonstrate its application through 3 use cases: standardizing required mental health survey common data elements, tracking changes in longitudinal data collection, and creating interactive checklists for neuroimaging research. METHODS: We describe ReproSchema's core components, including its schema-based design; reusable assessment library with >90 assessments; and tools to validate data, convert survey formats (eg, REDCap [Research Electronic Data Capture] and Fast Healthcare Interoperability Resources), and build protocols. We compared 12 platforms-Center for Expanded Data Annotation and Retrieval, formr, KoboToolbox, Longitudinal Online Research and Imaging System, MindLogger, OpenClinica, Pavlovia, PsyToolkit, Qualtrics, REDCap, SurveyCTO, and SurveyMonkey-against 14 findability, accessibility, interoperability, and reusability (FAIR) principles and assessed their support of 8 survey functionalities (eg, multilingual support and automated scoring). Finally, we applied ReproSchema to 3 use cases-NIMH-Minimal, the Adolescent Brain Cognitive Development and HEALthy Brain and Child Development Studies, and the Committee on Best Practices in Data Analysis and Sharing Checklist-to illustrate ReproSchema's versatility. RESULTS: ReproSchema provides a structured framework for standardizing survey-based data collection while ensuring compatibility with existing survey tools. Our comparison results showed that ReproSchema met 14 of 14 FAIR criteria and supported 6 of 8 key survey functionalities: provision of standardized assessments, multilingual support, multimedia integration, data validation, advanced branching logic, and automated scoring. Three use cases illustrating ReproSchema's flexibility include standardizing essential mental health assessments (NIMH-Minimal), systematically tracking changes in longitudinal studies (Adolescent Brain Cognitive Development and HEALthy Brain and Child Development), and converting a 71-page neuroimaging best practices guide into an interactive checklist (Committee on Best Practices in Data Analysis and Sharing). CONCLUSIONS: ReproSchema enhances reproducibility by structuring survey-based data collection through a structured, schema-driven approach. It integrates version control, manages metadata, and ensures interoperability, maintaining consistency across studies and compatibility with common survey tools. Planned developments, including ontology mappings and semantic search, will broaden its use, supporting transparent, scalable, and reproducible research across disciplines.

Topics

reproducibility-tooling
open-data-standards

Associated projects

ReproNim — Center for Reproducible Neuroimaging Computation

NIBIB P41 EB019936 (Site PI; PI Kennedy, UMass Med; Director TR&D Project 2)

Preprint precursor

Earlier versions of this work that have been superseded by the published record above.