CheXpert: New Large Chest X-Ray Dataset And Competition

The Machine Learning group at Stanford University has released a large labeled dataset of chest X-rays along with a competition for automated chest x-ray interpretation.

The new dataset is called CheXpert, and it is a result of joint efforts from researchers from Stanford ML Group, patients and radiology experts.

The researchers collected chest radiographic examinations (X-ray images) in a retrospective manner from Stanford Hospital. The data contains the radiology reports linked to each examination from the period of 2002 – 2017.

To label the collected data, a labeling tool was developed within the project. Each radiology report is marked using the tool for the presence of 14 observations. The labels assigned to each observation can be: positive, negative, or uncertain.

The final dataset contains 224,316 chest radiographs of 65,240 patients. This dataset represents a significant contribution in the field of medicine since chest radiography is the most common imaging examination globally, critical for screening, diagnosis, and management of many life-threatening diseases.

A competition for chest x-ray interpretation was released as part of the project. Along with it, a test set consisting of 500 studies from 500 unseen patients was collected. Teams can participate by submitting their executable code solutions on Codalab. The solution is then run on that test set that is not publicly readable to preserve the integrity of the test results.