Skip to main navigation Skip to search Skip to main content

A distantly supervised dataset for automated data extraction from diagnostic studies

  • University of Amsterdam
  • Université Paris-Saclay

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

Systematic reviews are important in evidence based medicine, but are expensive to produce. Automating or semi-automating the data extraction of index test, target condition, and reference standard from articles has the potential to decrease the cost of conducting systematic reviews of diagnostic test accuracy, but relevant training data is not available. We create a distantly supervised dataset of approximately 90,000 sentences, and let two experts manually annotate a small subset of around 1,000 sentences for evaluation. We evaluate the performance of BioBERT and logistic regression for ranking the sentences, and compare the performance for distant and direct supervision. Our results suggest that distant supervision can work as well as, or better than direct supervision on this problem, and that distantly trained models can perform as well as, or better than human annotators.

Original languageEnglish
Title of host publicationBioNLP 2019 - SIGBioMed Workshop on Biomedical Natural Language Processing, Proceedings of the 18th BioNLP Workshop and Shared Task
PublisherAssociation for Computational Linguistics (ACL)
Pages105-114
Number of pages10
ISBN (Electronic)9781950737284
Publication statusPublished - 2019
Event18th SIGBioMed Workshop on Biomedical Natural Language Processing, BioNLP 2019 - Florence, Italy
Duration: 1 Aug 2019 → …

Publication series

NameBioNLP 2019 - SIGBioMed Workshop on Biomedical Natural Language Processing, Proceedings of the 18th BioNLP Workshop and Shared Task

Conference

Conference18th SIGBioMed Workshop on Biomedical Natural Language Processing, BioNLP 2019
Country/TerritoryItaly
CityFlorence
Period01/08/2019 → …

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Fingerprint

Dive into the research topics of 'A distantly supervised dataset for automated data extraction from diagnostic studies'. Together they form a unique fingerprint.

Cite this