TY - JOUR
T1 - Accurate segmentation of head and neck radiotherapy CT scans with 3D CNNs
T2 - consistency is key
AU - Henderson, Edward G. A.
AU - Vasquez Osorio, Eliana M.
AU - van Herk, Marcel
AU - Brouwer, Charlotte L.
AU - Steenbakkers, Roel J. H. M.
AU - Green, Andrew F.
N1 - Funding Information:
Marcel van Herk was supported by NIHR Manchester Biomedical Research Centre. This work was also supported by Cancer Research UK via funding to the Cancer Research Manchester Centre [C147/A25254] and by Cancer Research UK RadNet Manchester [C1994/A28701].
Publisher Copyright:
© 2023 The Author(s). Published on behalf of Institute of Physics and Engineering in Medicine by IOP Publishing Ltd.
PY - 2023/4/21
Y1 - 2023/4/21
N2 - Objective. Automatic segmentation of organs-at-risk in radiotherapy planning computed tomography (CT) scans using convolutional neural networks (CNNs) is an active research area. Very large datasets are usually required to train such CNN models. In radiotherapy, large, high-quality datasets are scarce and combining data from several sources can reduce the consistency of training segmentations. It is therefore important to understand the impact of training data quality on the performance of auto-segmentation models for radiotherapy. Approach. In this study, we took an existing 3D CNN architecture for head and neck CT auto-segmentation and compare the performance of models trained with a small, well-curated dataset (n = 34) and then a far larger dataset (n = 185) containing less consistent training segmentations. We performed 5-fold cross-validations in each dataset and tested segmentation performance using the 95th percentile Hausdorff distance and mean distance-to-agreement metrics. Finally, we validated the generalisability of our models with an external cohort of patient data (n = 12) with five expert annotators. Main results. The models trained with a large dataset were greatly outperformed by models (of identical architecture) trained with a smaller, but higher consistency set of training samples. Our models trained with a small dataset produce segmentations of similar accuracy as expert human observers and generalised well to new data, performing within inter-observer variation. Significance. We empirically demonstrate the importance of highly consistent training samples when training a 3D auto-segmentation model for use in radiotherapy. Crucially, it is the consistency of the training segmentations which had a greater impact on model performance rather than the size of the dataset used.
AB - Objective. Automatic segmentation of organs-at-risk in radiotherapy planning computed tomography (CT) scans using convolutional neural networks (CNNs) is an active research area. Very large datasets are usually required to train such CNN models. In radiotherapy, large, high-quality datasets are scarce and combining data from several sources can reduce the consistency of training segmentations. It is therefore important to understand the impact of training data quality on the performance of auto-segmentation models for radiotherapy. Approach. In this study, we took an existing 3D CNN architecture for head and neck CT auto-segmentation and compare the performance of models trained with a small, well-curated dataset (n = 34) and then a far larger dataset (n = 185) containing less consistent training segmentations. We performed 5-fold cross-validations in each dataset and tested segmentation performance using the 95th percentile Hausdorff distance and mean distance-to-agreement metrics. Finally, we validated the generalisability of our models with an external cohort of patient data (n = 12) with five expert annotators. Main results. The models trained with a large dataset were greatly outperformed by models (of identical architecture) trained with a smaller, but higher consistency set of training samples. Our models trained with a small dataset produce segmentations of similar accuracy as expert human observers and generalised well to new data, performing within inter-observer variation. Significance. We empirically demonstrate the importance of highly consistent training samples when training a 3D auto-segmentation model for use in radiotherapy. Crucially, it is the consistency of the training segmentations which had a greater impact on model performance rather than the size of the dataset used.
KW - 3D auto-segmentation
KW - convolutional neural network
KW - effective supervised learning
KW - medical image analysis
KW - small dataset
KW - training annotation consistency
UR - https://www.scopus.com/pages/publications/85151563399
U2 - 10.1088/1361-6560/acc309
DO - 10.1088/1361-6560/acc309
M3 - Article
C2 - 36893469
SN - 0031-9155
VL - 68
JO - Physics in medicine and biology
JF - Physics in medicine and biology
IS - 8
M1 - 085003
ER -