From Measurements to Patients: Data Aggregation in Supervised Classification of X-Ray Diffraction Datasets
Date
May 15, 2026
Category
Medical Applications

From Measurements to Patients: Data Aggregation in Supervised Classification of X-Ray Diffraction Datasets
Authors: Alexander Alekseev, Keith Rogers, Lev Mourokh, Pavel Lazarev
This article examines how aggregation strategies can improve supervised machine learning for medical diagnostics using X-ray diffraction datasets. The work focuses on the transition from individual measurements to patient-level diagnosis, a critical step when multiple measurements are collected from the same patient or sample.
The authors applied aggregation approaches before and after machine learning modeling to two XRD datasets: human breast biopsy samples and canine claw samples. Random Forest and Logistic Regression classifiers were evaluated using ROC-AUC and balanced accuracy.
Across both datasets, aggregation improved classification performance, with post-model aggregation generally providing stronger results. For human breast samples, Random Forest with logit aggregation achieved an ROC-AUC above 0.9. For canine samples, both Random Forest with logit aggregation and Logistic Regression using the median cancer probability reached an ROC-AUC of about 0.85.
The study demonstrates that simple, interpretable aggregation methods can significantly improve patient-level classification in X-ray diffraction diagnostics, supporting future clinical applications of XRD-based structural biomarkers.
Keywords: machine learning; aggregation; supervised classification; X-ray diffraction

