From Measurements to Patients: Data Aggregation in Supervised Classification of X-Ray Diffraction Datasets

Date

May 15, 2026

Category

Medical Applications

From Measurements to Patients: Data Aggregation in Supervised Classification of X-Ray Diffraction Datasets

Authors: Alexander Alekseev, Keith Rogers, Lev Mourokh, Pavel Lazarev

This article examines how aggregation strategies can improve supervised machine learning for medical diagnostics using X-ray diffraction datasets. The work focuses on the transition from individual measurements to patient-level diagnosis, a critical step when multiple measurements are collected from the same patient or sample.

The authors applied aggregation approaches before and after machine learning modeling to two XRD datasets: human breast biopsy samples and canine claw samples. Random Forest and Logistic Regression classifiers were evaluated using ROC-AUC and balanced accuracy.

Across both datasets, aggregation improved classification performance, with post-model aggregation generally providing stronger results. For human breast samples, Random Forest with logit aggregation achieved an ROC-AUC above 0.9. For canine samples, both Random Forest with logit aggregation and Logistic Regression using the median cancer probability reached an ROC-AUC of about 0.85.

The study demonstrates that simple, interpretable aggregation methods can significantly improve patient-level classification in X-ray diffraction diagnostics, supporting future clinical applications of XRD-based structural biomarkers.

Keywords: machine learning; aggregation; supervised classification; X-ray diffraction

https://doi.org/10.3390/ijtm6020022