Skip to main navigation Skip to search Skip to main content

Towards a Digital Infrastructure for Illustrated Handwritten Archives

  • Andreas Weber*
  • , Mahya Ameryan
  • , Katherine Wolstencroft
  • , Lise Stork
  • , Maarten Heerlien
  • , Lambert Schomaker
  • *Corresponding author for this work
  • University of Twente
  • University of Groningen
  • Leiden University
  • Naturalis Biodiversity Center

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

10 Downloads (Pure)

Abstract

Large and important parts of cultural heritage are stored in archives that are difficult to access, even after digitization. Documents and notes are written in hard-to-read historical handwriting and are often interspersed with illustrations. Such collections are weakly structured and largely inaccessible to a wider public and scholars. Traditionally, humanities researchers treat text and images separately. This separation extends to traditional handwriting recognition systems. Many of them use a segmentation free OCR approach which only allows the resolution of homogenous manuscripts in terms of layout, style and linguistic content. This is in contrast to our infrastructure which aims to resolve heterogeneous handwritten manuscript pages in which different scripts and images are narrowly intertwined. Authors in our use case, a 17, 000 page account of exploration of the Indonesian Archipelago between 1820-1850 (“Natuurkundige Commissie voor Nederlands-Indië") tried to follow a semantic way to record their knowledge and observations, however, this discipline does not exist in the handwriting script. The use of different languages, such as German, Latin, Dutch, Malay, Greek, and French makes interpretation more challenging. Our infrastructure takes the state-of-the-art word retrieval system MONK as starting point. Owing to its visual approach, MONK can handle the diversity of material we encounter in our use case and many other historical collections: text, drawings and images. By combining text and image recognition, we significantly transcend beyond the state-of-the art, and provide meaningful additions to integrated manuscript recognition. This paper describes the infrastructure and presents early results.

Original languageEnglish
Title of host publicationDigital Cultural Heritage - Final Conference of the Marie Skłodowska-Curie Initial Training Network for Digital Cultural Heritage, ITN-DCH 2017, Revised Selected Papers
EditorsMarinos Ioannides
PublisherSpringer Science and Business Media Deutschland GmbH
Pages155-166
Number of pages12
ISBN (Print)9783319758251
DOIs
Publication statusPublished - 2018
EventFinal Conference of the Marie Sklodowska-Curie Initial Training Network for Digital Cultural Heritage, ITN-DCH 2017 - Olimje, Slovenia
Duration: 23 May 201725 May 2017

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10605 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceFinal Conference of the Marie Sklodowska-Curie Initial Training Network for Digital Cultural Heritage, ITN-DCH 2017
Country/TerritorySlovenia
CityOlimje
Period23/05/201725/05/2017

Keywords

  • Biodiversity heritage
  • Deep learning
  • Digital heritage
  • Natural history

Fingerprint

Dive into the research topics of 'Towards a Digital Infrastructure for Illustrated Handwritten Archives'. Together they form a unique fingerprint.

Cite this