2019 n2c2 Shared-Task and Workshop

Track 2: n2c2/OHNLP Track on Family History Extraction


Tentative Timeline

Registration April 10, 2019
Training Data Release June 4, 2019
Test Data Release August 12, 2019
System Outputs Due August 14, 2019 (11:59pm Eastern Time)
Aggregate Results Release August 21, 2019 (11:59pm Eastern Time)
Abstract Submission September 9, 2019


Family History (FH) information plays a significant role in diagnosis and treatment, especially of genetic disorders. Patient Provided Information (PPI) questionnaires, which are usually stored in semi-structured/unstructured format in electronic health records, serve as the primary source of FH information. Elements of FH information reported in these questionnaires may include: family member, disease, cause, medication, age of onset, length of disease, etc. Extraction of these diverse set of FH information elements can be challenging. Past efforts implemented for this purpose have been quite limited. n2c2/OHNLP shared task Track on Family History information Extraction aims to address this gap.

Task Overview

This track will be divided into two subtasks. The first subtask is entity identification. Two types of information should be extracted by the participants: 1) family members mentioned in the text, and 2) observations (diseases) in the family history. The participants should also provide side of family (i.e. Paternal, Maternal, NA) for each family member. The output for the first subtask should be provided in a single file with fields delimited by tabs. In Subtask 2, the participants need to extract the relations between family members, observations and living status. In addition, negation information for each observation should be provided. The purpose of this subtask is to evaluate participating systems as end-to-end family history summarization systems based on clinical texts. The output file should be in TSV format in which the columns are: family member, side of family, living status, and observation. In cases where there is more than one observation for a family member, the systems should provide those observations in separate rows. Again, negation information should be provided after each observation (i.e., Negated or Non_Negated). Please refer to this link (https://github.com/ohnlp/n2c2_fh ) for detailed information about input data and expected output.

Evaluation Format

We will use standard F1-score as the evaluation (ranking) metric. An evaluation script will be provided and the participants are requested to run the evaluation script on their output and the gold standard to check system performance. Please refer to this link (https://github.com/ohnlp/n2c2_fh ) for detailed information about evaluation metrics.

Participating teams are required to register and sign a Mayo Data Use Agreement to get access to the dataset.


Participants are asked to submit a 500-word long abstract describing their methodologies. Abstracts may also have a graphical summary of the proposed architecture. The document should not exceed 2 pages (i.e., 1.5 line spacing, 12pt-font size). The authors of either top performing systems or particularly novel approaches will be invited to present or demonstrate their systems at the workshop. A journal venue will be organized following the workshop.


Please join the discussion group below for announcements. Questions about the challenge can be addressed to the organizers by posting to the group (New Topic button) or sending email to the address below.
Discussion Group: N2C2/OHNLP_2019_FH_Task
Email: n2c2ohnlp_2019_fh_task@googlegroups.com