AI model identifies certain breast tumor stages likely to progress to invasive cancer

Ductal carcinoma in situ (DCIS) is a type of preinvasive tumor that sometimes progresses to a highly deadly form of breast cancer. It accounts for about 25 percent of all breast cancer diagnoses.

Because it is difficult for clinicians to determine the type and stage of DCIS, patients with DCIS are often overtreated. To address this, an interdisciplinary team of researchers from MIT and ETH Zurich developed an AI model that can identify the different stages of DCIS from a cheap and easy-to-obtain breast tissue image. Their model shows that both the state and arrangement of cells in a tissue sample are important for determining the stage of DCIS.

Because such tissue images are so easy to obtain, the researchers were able to build one of the largest datasets of its kind, which they used to train and test their model. When they compared its predictions to conclusions of a pathologist, they found clear agreement in many instances.

In the future, the model could be used as a tool to help clinicians streamline the diagnosis of simpler cases without the need for labor-intensive tests, giving them more time to evaluate cases where it is less clear if DCIS will become invasive.

“We took the first step in understanding that we should be looking at the spatial organization of cells when diagnosing DCIS, and now we have developed a technique that is scalable. From here, we really need a prospective study. Working with a hospital and getting this all the way to the clinic will be an important step forward,” says Caroline Uhler, a professor in the Department of Electrical Engineering and Computer Science (EECS) and the Institute for Data, Systems, and Society (IDSS), who is also director of the Eric and Wendy Schmidt Center at the Broad Institute of MIT and Harvard and a researcher at MIT’s Laboratory for Information and Decision Systems (LIDS).

Uhler, co-corresponding author of a paper on this research, is joined by lead author Xinyi Zhang, a graduate student in EECS and the Eric and Wendy Schmidt Center; co-corresponding author GV Shivashankar, professor of mechogenomics at ETH Zurich jointly with the Paul Scherrer Institute; and others at MIT, ETH Zurich, and the University of Palermo in Italy. The open-access research was published July 20 in Nature Communications.

Combining imaging with AI   

Between 30 and 50 percent of patients with DCIS develop a highly invasive stage of cancer, but researchers don’t know the biomarkers that could tell a clinician which tumors will progress.

Researchers can use techniques like multiplexed staining or single-cell RNA sequencing to determine the stage of DCIS in tissue samples. However, these tests are too expensive to be performed widely, Shivashankar explains.

In previous work, these researchers showed that a cheap imagining technique known as chromatin staining could be as informative as the much costlier single-cell RNA sequencing.

For this research, they hypothesized that combining this single stain with a carefully designed machine-learning model could provide the same information about cancer stage as costlier techniques.

First, they created a dataset containing 560 tissue sample images from 122 patients at three different stages of disease. They used this dataset to train an AI model that learns a representation of the state of each cell in a tissue sample image, which it uses to infer the stage of a patient’s cancer.

However, not every cell is indicative of cancer, so the researchers had to aggregate them in a meaningful way.

They designed the model to create clusters of cells in similar states, identifying eight states that are important markers of DCIS. Some cell states are more indicative of invasive cancer than others. The model determines the proportion of cells in each state in a tissue sample.

Organization matters

“But in cancer, the organization of cells also changes. We found that just having the proportions of cells in every state is not enough. You also need to understand how the cells are organized,” says Shivashankar.

With this insight, they designed the model to consider proportion and arrangement of cell states, which significantly boosted its accuracy.

“The interesting thing for us was seeing how much spatial organization matters. Previous studies had shown that cells which are close to the breast duct are important. But it is also important to consider which cells are close to which other cells,” says Zhang.

When they compared the results of their model with samples evaluated by a pathologist, it had clear agreement in many instances. In cases that were not as clear-cut, the model could provide information about features in a tissue sample, like the organization of cells, that a pathologist could use in decision-making.

This versatile model could also be adapted for use in other types of cancer, or even neurodegenerative conditions, which is one area the researchers are also currently exploring.

“We have shown that, with the right AI techniques, this simple stain can be very powerful. There is still much more research to do, but we need to take the organization of cells into account in more of our studies,” Uhler says.

This research was funded, in part, by the Eric and Wendy Schmidt Center at the Broad Institute, ETH Zurich, the Paul Scherrer Institute, the Swiss National Science Foundation, the U.S. National Institutes of Health, the U.S. Office of Naval Research, the MIT Jameel Clinic for Machine Learning and Health, the MIT-IBM Watson AI Lab, and a Simons Investigator Award.