Abstract
Our objective was to create a gold standard Dutch language annotated corpus of clinical notes with adverse drug event (ADE) mentions, specifically for Intensive Care patients with drug-related acute kidney injury. We used anonymized clinical notes from 102 adult intensive care unit (ICU) patients suspected of acute kidney injury (AKI) and admitted to Amsterdam University Medical Centre, The Netherlands, over a four-year period (November 2015– January 2020). The notes were extracted from the electronic health record (EHR) system and manually reviewed for drug-related causes. Each clinical note contained at least one ADE mention (drug-related AKI). Annotation guidelines were developed over three rounds of annotation based on review of annotations and clarifications during the process. Two clinical expert annotators labelled mentions of drugs and disorders, as well as the relationship between these entities indicating an ADE. The final gold standard corpus was a result of adjudication of the two sets of expert labels. The corpus contains 102 notes with 16,470 labels, consisting of 8,914 Disorder entities, 5,307 Drug entities, 134 Qualitative Concept entities, 1,501 Indication relations, and 614 ADE relations. Annotation reached high agreement for all entities (F1 score 0.7724) with an expected lower agreement for relations (F1 score 0.4327). The Dutch ADE corpus is a real-world data set that can be used to evaluate natural language processing pipelines for ADE detection tasks. Although the corpus was developed for drug-related AKI, 158 additional ADEs were identified. The combination of iterative annotation guideline development and double annotation followed by adjudication produced high quality annotations. Future work will use this gold standard annotated corpus to train and validate NLP models to detect ADEs in Dutch clinical text.
| Original language | English |
|---|---|
| Pages (from-to) | 2763-2779 |
| Number of pages | 17 |
| Journal | Language Resources and Evaluation |
| Volume | 59 |
| Issue number | 3 |
| Early online date | 2025 |
| DOIs | |
| Publication status | Published - Sept 2025 |
Keywords
- Data curation
- Drug-related side effects and adverse reactions
- Electronic health records
- Natural language processing
- Real-world data
Fingerprint
Dive into the research topics of 'Creation of a gold standard Dutch corpus of clinical notes for adverse drug event detection: the Dutch ADE corpus'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver