TY - JOUR
T1 - Performance of federated learning-based models in the Dutch TAVI population was comparable to central strategies and outperformed local strategies
AU - Yordanov, Tsvetan R
AU - Ravelli, Anita C J
AU - Amiri, Saba
AU - Vis, Marije
AU - Houterman, Saskia
AU - Van der Voort, Sebastian R
AU - Abu-Hanna, Ameen
AU - the NHR THI Registration Committee
N1 - Publisher Copyright:
2024 Yordanov, Ravelli, Amiri, Vis, Houterman, Van der Voort and Abu-Hanna.
PY - 2024
Y1 - 2024
N2 - BACKGROUND: Federated learning (FL) is a technique for learning prediction models without sharing records between hospitals. Compared to centralized training approaches, the adoption of FL could negatively impact model performance.AIM: This study aimed to evaluate four types of multicenter model development strategies for predicting 30-day mortality for patients undergoing transcatheter aortic valve implantation (TAVI): (1)
central, learning one model from a centralized dataset of all hospitals; (2)
local, learning one model per hospital; (3)
federated averaging (
FedAvg), averaging of local model coefficients; and (4)
ensemble, aggregating local model predictions.
METHODS: Data from all 16 Dutch TAVI hospitals from 2013 to 2021 in the Netherlands Heart Registration (NHR) were used. All approaches were internally validated. For the
central and federated approaches, external geographic validation was also performed. Predictive performance in terms of discrimination [the area under the ROC curve (AUC-ROC, hereafter referred to as AUC)] and calibration (intercept and slope, and calibration graph) was measured.
RESULTS: The dataset comprised 16,661 TAVI records with a 30-day mortality rate of 3.4%. In internal validation the AUCs of
central,
local,
FedAvg, and
ensemble models were 0.68, 0.65, 0.67, and 0.67, respectively. The
central and
local models were miscalibrated by slope, while the
FedAvg and
ensemble models were miscalibrated by intercept. During external geographic validation,
central,
FedAvg, and
ensemble all achieved a mean AUC of 0.68. Miscalibration was observed for the
central,
FedAvg, and
ensemble models in 44%, 44%, and 38% of the hospitals, respectively.
CONCLUSION: Compared to centralized training approaches, FL techniques such as
FedAvg and
ensemble demonstrated comparable AUC and calibration. The use of FL techniques should be considered a viable option for clinical prediction model development.
AB - BACKGROUND: Federated learning (FL) is a technique for learning prediction models without sharing records between hospitals. Compared to centralized training approaches, the adoption of FL could negatively impact model performance.AIM: This study aimed to evaluate four types of multicenter model development strategies for predicting 30-day mortality for patients undergoing transcatheter aortic valve implantation (TAVI): (1)
central, learning one model from a centralized dataset of all hospitals; (2)
local, learning one model per hospital; (3)
federated averaging (
FedAvg), averaging of local model coefficients; and (4)
ensemble, aggregating local model predictions.
METHODS: Data from all 16 Dutch TAVI hospitals from 2013 to 2021 in the Netherlands Heart Registration (NHR) were used. All approaches were internally validated. For the
central and federated approaches, external geographic validation was also performed. Predictive performance in terms of discrimination [the area under the ROC curve (AUC-ROC, hereafter referred to as AUC)] and calibration (intercept and slope, and calibration graph) was measured.
RESULTS: The dataset comprised 16,661 TAVI records with a 30-day mortality rate of 3.4%. In internal validation the AUCs of
central,
local,
FedAvg, and
ensemble models were 0.68, 0.65, 0.67, and 0.67, respectively. The
central and
local models were miscalibrated by slope, while the
FedAvg and
ensemble models were miscalibrated by intercept. During external geographic validation,
central,
FedAvg, and
ensemble all achieved a mean AUC of 0.68. Miscalibration was observed for the
central,
FedAvg, and
ensemble models in 44%, 44%, and 38% of the hospitals, respectively.
CONCLUSION: Compared to centralized training approaches, FL techniques such as
FedAvg and
ensemble demonstrated comparable AUC and calibration. The use of FL techniques should be considered a viable option for clinical prediction model development.
KW - EHR
KW - TAVI
KW - distributed machine learning
KW - federated learning
KW - multicenter
KW - prediction models
KW - privacy-preserving algorithms
KW - risk prediction
UR - http://www.scopus.com/inward/record.url?scp=85198946214&partnerID=8YFLogxK
U2 - 10.3389/fcvm.2024.1399138
DO - 10.3389/fcvm.2024.1399138
M3 - Article
C2 - 39036502
SN - 2297-055X
VL - 11
SP - 1399138
JO - Frontiers in cardiovascular medicine
JF - Frontiers in cardiovascular medicine
M1 - 1399138
ER -