Track 3

n2c2/UMass Track on Clinical Concept Normalization


Clinical findings, diseases, procedures, body structures, and medications recorded in the medical notes constitute invaluable resources for diverse clinical applications. Effective use and exchange of information about clinically relevant concepts in the free-text clinical narratives require two complementary processes: Named Entity Recognition (NER) and Named Entity Normalization (NEN). NER for clinical notes identifies mention spans of the clinically-relevant concepts. NEN involves linking named entities to concepts in standardized medical terminologies, thereby allowing for better generalization across contexts.

The present shared task on Normalization of Medical Concepts in Clinical Narrative focuses specifically on NEN. In contrast to the previous NEN shared tasks (ShARe/CLEF eHealth 2013 Task 1, SemEval-2014 Task 7 and SemEval-2015 Task 14), normalization is not restricted to the mentions of disorders and covers a broad set of clinical concepts.


The task utilizes part of the i2b2 2010 data set. It involves normalization of given named entities, which include clinical concepts annotated as medical problems, treatments and tests in the 2010 i2b2/VA Shared Task. Each named entity is mapped to a Concept Unique Identifier (CUI) in the UMLS 2017AB version from either SNOMED CT or RxNorm. For example, "heart attack", "MI", and "myocardial infarction" are all normalized to the same CUI (C0027051 in the UMLS).

The data for this task is provided by Partners HealthCare. All records have been fully de-identified and annotated.

Data for the challenge will be released under a Rules of Conduct and Data Use Agreement. Obtaining the data requires completing the 2019 n2c2 Track 3 registration process.

Evaluation and Format

Evaluation will be conducted using withheld test data, with accuracy as the evaluation metric. Each team is allowed to upload (through this website) up to three system runs.

System outputs are expected to be submitted in the exact format of the ground truth annotations provided by the organizers. Evaluation script will be made available to the participants, so that continuous evaluation on the training data could be conducted during system development.


Participants are asked to submit a 500-word long abstract describing their methodologies. Abstracts may also have a graphical summary of the proposed architecture. The document should not exceed 2 pages (i.e., 1.5 line spacing, 12pt-font size). The authors of either top performing systems or particularly novel approaches will be invited to present or demonstrate their systems at the workshop. A journal venue will be organized following the workshop.


Please join the discussion group below for announcements. Questions about the challenge can be addressed to the organizers by posting to the group (New Topic button) or sending email to the address below.
Discussion Group: N2C2_2019_MCN_Task 

Tentative Timeline

Registration April 8, 2019
Training Data Release May 1, 2019
Test Data Release July 8, 2019
System Outputs Due July 10, 2019 (11:59pm Eastern Time)
Aggregate Results Release July 15, 2019 (11:59pm Eastern Time)
Abstract Submission August 5, 2019


Luo YF, Sun W, Rumshisky A. MCN: A Comprehensive Corpus for Medical Concept Normalization.  Journal of Biomedical Informatics. 2019 Feb 22:103132.