How to Conduct Ethical and Statistically Rigorous Primary Data Collection for Clinical and Epidemiological PhD Research
How to Conduct Ethical and Statistically Rigorous Primary Data Collection for Clinical and Epidemiological PhD Research
- Home
- Academy
- PhD Research Methodology
- Primary Data Collection for Clinical and Epidemiological PhD Research
Ethical and Statistically Rigorous
- 2. Study Design and Statistical Rigour: Lessons from the Conception Project
- 3. Ethical Compliance in Real-World Contexts
- 4. Technology-Driven Data Capture: German EHR-EDC Integration Case
- 5. Ethical Data in Medical Imaging Research
- 6. Data Management and Validation: Case of the All of Us Research Program
- 7. Interlinking Ethics, Technology, and Statistics
- 8. Practical Guidelines for Doctoral Researchers
- 9. The Value of Research Support Services
- Conclusion
Recent Post
Introduction
In clinical and epidemiological studies, it is the quality of primary data that guarantees the dependability of findings. While access to digital repositories and secondary datasets is expanding, they often lack the specificity, accuracy, and contextual features necessary for causal analysis. As Mazhar et al. (2021) state, “primary data collection may ultimately be a fundamental tool of health care research, because it does allow the investigator to define variables precisely and empirical norms governing transparency in the methodology.”
However, the collection of primary data requires more than just fieldwork; it requires ethical sensitivity, statistical rigor, and technical precision. This article addresses each of the elements in the accompanying case studies as well as methodological examples, which demonstrate best practices for those researchers and PhD scholars from the health research field.
1. Foundations of Primary Data Collection
Primary data collects information directly from research participants as well as settings under controlled conditions via interviews, surveys, observation, or through randomised trials (Mazhar et al., 2021), and to some extent, to the research question and feasibility. As Ganesha and Aithal (2022) made clear, “The assurance that the data collection tools align with the research design assures the variables measured truly reflect the conceptual constructs that are being studied as part of the research project”.
Relating to previous examples of doctoral research undertaken, on the research topic of “stress management among oncology nurses,” self-administered questionnaires were issued with testing for cortisol levels. This mixed-methods approach of utilising both quantitative biomarker and self-report data was meant to address objectivity and ecological validity.
2. Study Design and Statistical Rigour: Lessons from the Conception Project
The IMI Conception Project represents a major European initiative to help support the standardisation of data collection for studies of pregnancy safety (Richardson et al., 2023; Favre et al., 2024). The first phase of the data collection component of the project involved harmonising over 100 clinical variables (e.g., maternal age) and drug-exposure times among countries.
Technical Insights:
– Data were collected using structured electronic case report forms (eCRFs), each accompanied by a variety of validation rules that limited human error.
– Researchers used a distributed data network framework, which allowed institutions to retain ownership of the data and submit de-identified data for analysis at a central location.
This type of standardisation is worthy of consideration for statistical validity purposes that PhD-level researchers should contemplate when conducting multi-site or multi-phase data collection.
3. Ethical Compliance in Real-World Contexts
Following ethical compliance protects participants, but it also instils professional integrity in researchers. Qualitative researchers often find ethical activities as compromises that arise from consent, confidentiality, or emotional vulnerabilities (Nii Laryeafio & Ogbewe, 2023).
Case Example: WHO COVID-19 Solidarity Trial (2020-2021).
Concerning the COVID-19 pandemic, the WHO approved a global RCT that occurred in 400 hospitals. Collecting primary data in real-time during a crisis maintained robust ethical dilemmas for researchers to manage valid informed consent, specifically relating to patients, are critically ill.
Dron et al. (2022) also cite ethical shortcuts to data sharing that may have put patients at risk for confidentiality. Regardless of emergency ethics, researchers should consider utilizing models of tiered consent and in-the-moment dashboard ethical board oversight to monitor ethical compliance. PhD scholars can ethically utilize dynamic consent systems in a good design for clinical research studies—using digital systems for patients-participants and allowing virtually real-time change of their consent, as part of the consent processes and evolves within the study.
4. Technology-Driven Data Capture: German EHR-EDC Integration Case
Mueller et al. (2023) conducted a feasibility study in Germany exploring fully automated transfers of electronic health records (EHRs) to electronic data capture (EDC) methods. The usual procedure for clinical research staff is to manually re-enter data from the hospital records into a study database. Manual data entry is subject to human error and results in delays.
Data Technical Outcomes:
– Automation reduced the time required for data entry by 67%.
– Cross-checking for consistency allowed for immediate recognition of errors in a sample of the study database.
– Episodes of data entry were compliant with Good Clinical Practice (GCP) and General Data Protection Regulation (GDPR) considerations.
Relevance – this model to PhD students doing hospital-based research to demonstrate how API-based integration of EHR will provide successful and faster data capture, improve accuracy, and still be compliant.
5. Ethical Data in Medical Imaging Research
Gathering medical image data has its own ethical and technical considerations. Padmapriya and Parthasarathy (2024) note that medical imaging datasets used for training AI models must adopt a consistent approach to de-identification to avoid the possible re-identification of patients.
Example: The UK Biobank Imaging Study
This study is a large-scale study with MRI and CT scans from over 100,000 subjects. All images have been processed for anonymization, which is part of an automated process that strips the metadata before transfer to secure cloud servers.
Technical Perspective:
- Images were processed utilising DICOM anonymisation tools with supporting audit logs.
- Researchers implemented mathematically-oriented differential privacy methods by adding “noise” to datasets, without compromising the quality of the research.
PhD researchers who develop diagnostic AI tools may be able to use comparable ethical-technical protections when addressing the confidentiality of the patient, without risk of affecting analytic validity.
6. Data Management and Validation: Case of the All of Us Research Program
The National Institutes of Health (NIH) developed the U.S. All of Us Research Program to gather primary data from over a million research participants in order to study genetic and environmental influences on health. The program provides a systematic data cleaning and validation process, which is an added benefit (Gardner et al., 2022). Technical Processes include:
- Continuous data quality checks utilizing Python-based validation scripts.
- Automated alerts for missing demographic or lab variables.
- Patient portal participant updates.
This is an example of how an automated data validation system can help ensure the integrity and quality of data for a large-scale project, which could also add value for PhD students who are using survey and longitudinal designs.
7. Interlinking Ethics, Technology, and Statistics
The principles of statistical rigor and ethical integrity are intricately tied together. Research that excludes marginalized populations or lacks transparency has consequences for ethics and sampling bias. Padmapriya and Parthasarathy (2024) emphasized this for low-income populations in digital health research, claiming that it is not only unethical but also reduces the generalizability of the results as well.
For example, if a study in public health is conducted on hypertension and collects data only through smartphone-based surveys, it will create sampling effects by excluding underrepresented populations in rural areas without digital access for assessments. Researchers could mitigate this issue by employing multi-channel data collection that balances rural representation while also still using app-based reports for data collection, among field interviews.
8. Practical Guidelines for Doctoral Researchers
The cases described yield practical recommendations for Ph.D. researchers who wish to design their studies to be both ethically and statistically valid.
- Ensure ethical and technical (statistical) design go together – while seeking IRB approval, ensure these designs adhere to (technical) ethical approval (i.e., GDPR).
- Pilot test the instruments with a small, representative sample so your design flaws are known ahead of time.
- Using mixed-mode data collection (online + in-person) to ensure inclusivity.
- Identify any inconsistencies by directing data into automated validation tools.
- Notate metadata (when, method, and where you collected your data) throughout your research so reproducibility can be met.
- Train field staff in ethical methods of sensitive data collection, especially in healthcare settings.
For example, a doctoral project exploring maternal nutrition could utilize both a food recall app and local community surveys to not only increase the richness of triangulated data, but also participant ethical standards of recruitment/participation.
9. The Value of Research Support Services
Coordinating primary data gathering can not only be burdensome but also professionally demanding for independent researchers. Utilizing the assistance of a research services organization will potentially increase the quality of the data being collected–while maintaining ethical and confidentiality agreements (Ganesha & Aithal, 2022). Research service personnel may assist the researcher with survey construction, instrument testing, and supervision in the field, and potentially satisfy regulatory and ethical principles for human subject research.
Example: A multi-state health survey where participant recruitment is outsourced to a certified research lab that uses GDPR-trained personnel to expand efficiency while still complying with the regulatory and ethical requirements is a common example of organizations like Ph.D. Assistance or a similar institution.
Conclusion
Primary data collection is fundamental for producing credible, reproducible, and ethically defensible research across disciplines. Through case studies, examining the IMI Conception Project and the recent advancements in EHR automation in Germany, we see that successful primary data collection requires a mixture of standardization, digitalization, and appropriately ethical governance.
For doctoral researchers, the challenge lies at the intersection of the technical sophistication concern with the ethically responsible behavior. As Mueller et al. (2023) and Richardson et al. (2023) describe, data automation and standardization can enhance data quality through guidance to rigorous ethical approaches or frameworks (Padmapriya & Parthasarathy, 2024).
In the current context of artificial intelligence (AI) enhanced health data analytics, researchers cannot consider primary data collection as merely a step in the process and must recognize it as the strategic, ethical, and technical foundation for producing credible science. By better protecting the values of transparency, standardization, and respect for participants, scholars must address that their research meets academic rigor, but also addresses the social contract of trust.
Are you ready to collect primary data for your clinical and epidemiological studies?
At the PhD Assistance Research Lab, we specialize in guiding medical professionals and researchers through every stage of this process. Our experts will guide you in conducting a data collection that addresses your research gap effectively.
Contact the PhD Assistance Research Lab to complete your PhD research successfully.
References
- Dron, L., Kalatharan, V., Gupta, A., Haggstrom, J., Zariffa, N., Morris, A. D., … & Park, J. (2022). Data capture and sharing in the COVID-19 pandemic: A cause for concern. The Lancet Digital Health, 4(10), e748–e756. https://doi.org/10.1016/S2589-7500(22)00143-3
- Favre, G., Richardson, J. L., Moore, A., Geissbühler, Y., Jehl, V., Oliver, A., … & Winterfeld, U. (2024). Improving data collection in pregnancy safety studies: Towards standardisation of data elements in pregnancy reports from public and private partners, a contribution from the ConcePTION Project. Drug Safety, 47(3), 227–236. https://doi.org/10.1007/s40264-023-01364-z
- Ganesha, H. R., & Aithal, P. S. (2022). How to choose an appropriate research data collection method and method choice among various research data collection methods and method choices during a PhD program in India. International Journal of Management, Technology, and Social Sciences, 7(2), 455–489. https://doi.org/10.5281/zenodo.7102113
- Gardner, H., Elfeky, A., Pickles, D., Dawson, A., Gillies, K., Warwick, V., & Treweek, S. (2022). A good use of time? Providing evidence for how effort is invested in primary and secondary outcome data collection in trials. Trials, 23(1), 1047. https://doi.org/10.1186/s13063-022-06939-8
- Mazhar, S. A., Anjum, R., Anwar, A. I., & Khan, A. A. (2021). Methods of data collection: A fundamental tool of research. Journal of Integrated Community Health, 10(1), 6–10.
- Mueller, C., Herrmann, P., Cichos, S., Remes, B., Junker, E., Hastenteufel, T., & Mundhenke, M. (2023). Automated electronic health record to electronic data capture transfer in clinical studies in the German health care system: Feasibility study and gap analysis. Journal of Medical Internet Research, 25, e47958. https://doi.org/10.2196/47958
- Nii Laryeafio, M., & Ogbewe, O. C. (2023). Ethical consideration dilemma: Systematic review of ethics in qualitative data collection through interviews. Journal of Ethics in Entrepreneurship and Technology, 3(2), 94–110.
- Padmapriya, S. T., & Parthasarathy, S. (2024). Ethical data collection for medical image analysis: A structured approach. Asian Bioethics Review, 16(1), 95–108. https://doi.org/10.1007/s41649-023-00247-3
- Richardson, J. L., Moore, A., Bromley, R. L., Stellfeld, M., Geissbühler, Y., Bluett-Duncan, M., … & Yates, L. M. (2023). Core data elements for pregnancy pharmacovigilance studies using primary source data collection methods: Recommendations from the IMI ConcePTION Project. Drug Safety, 46(5), 479–491. https://doi.org/10.1007/s40264-023-01278-w

