Information Extraction from Spanish Radiology Reports using multilingual BERT


This paper describes our team’s participation in Task 1 of the Conference and Labs of the Evaluation Forum (CLEF eHealth 2021). The Task 1 challenge targets Named Entity Recognition (NER) from radiol- ogy reports written in Spanish. Our approach addresses this challenge as a sequence labeling task and is based on multilingual BERT with a classification layer on top. Three BERT-based models were trained to support overlapping entities extraction: the first model predicts the first specific label annotated in the corpus; the second predicts the second label for tokens that have two different annotations; and the third is used for tokens annotated with a third label in the corpus. Our approach obtained 78.47% and 73.27% for a Lenient and the exact F1 score, respectively.

Tipo de Publicación : Artículo de conferencia
Editor : CLEF 2021 Evaluation Labs and Workshop: Online Working Notes