MAPA

Multilingual Anonymisation for Public Administrations

The MAPA Project is an integration project that aims at introducing Natural Language Processing (NLP) tools and at developing a toolkit for effective and reliable anonymisation of texts in the medical and legal fields. The proposal addresses all EU official languages, including under-resourced ones (such as Latvian, Lithuanian, Estonian, Slovenian and Croatian). Our project addressed the precise concept called “de-identification”: removing directly identifying expressions, such as person names, from a text.

The Action will build a deployable, dock-ready, open-source full multilingual anonymisation toolkit able to detect personal data (name, addresses, emails, credit card and bank accounts, etc.) as defined by deployment cases in different Member States. anonymisation Moreover, the toolkit will be able to anonymize these data, thus, it will help public administrations to comply with GDPR particularly in the health and legal fields.

The Action will promote Public Administration data sharing that is fully de-identified and thus not traceable to personal details, making it GDPR compliant. As a result, data that remains in silos and cannot be shared, will be able to be re-used in European initiatives such as NEC TM (data coming from Public Administrations for translation whose source language contains personal details), ELRC and ELRC-Share (Public Administrations can share data that otherwise would not be able to), and potentially eTranslation (offering anonymisation services as a separate service or embedding it as part of its translation service), etc.

Calls to MAPA’s toolkit will be developed as a pre- or post-processing module (API-ready toolkit dockerized version). This will ease integration and deployment as an isolated I/O module not disturbing current digital infrastructures.

The ultimate goal of the Action is to develop a fully deployable multilingual anonymisation kit based on Named-Entity Recognition (NER) applicable to all EU languages, not restricted to European names and surnames but those mostly common in all EU countries, and with a connection to eTranslation, irrespective of whether the text is monolingual, bilingual or of mixed languages.

The MAPA project is an INEA-funded Action for the European Commission under the Connecting Europe Facility (CEF) – Telecommunications Sector with Grant Agreement No INEA/CEF/ICT/A2019/1927065.

Partners

Pangeanic, Spain

SEDIA, Spain

Vicomtech, Spain

Tilde, Latvia

ELDA-ELRA, France

University of Malta, Malta

LIMSI-CNRS, France