Skip to main content

A computational challenge to help developmental biology

Q&A with the scientists and organizers behind a new competition to use machine learning to reconstruct an entire animal’s developmental lineage


5 min read

New techniques are allowing researchers to trace the development of animals cell by single cell. Here, the "cell lineage" of a zebrafish early in development. Image courtesy of Alexander Shier, Ph.D.
New techniques are allowing researchers to trace the development of animals cell by single cell. Here, the “cell lineage” of a zebrafish early in development. Image courtesy of Alexander Shier, Ph.D.

It’s a complicated puzzle to understand how we — and other complex living beings — grow from a single cell to the billions or trillions of different interconnected cells that make up an adult body.

Recently developed techniques are giving scientists a new, detailed view of the pathways of developing creatures, using special labels — “barcodes” — stitched into the genetic code of a growing animal to capture the entire trajectory of development.

The datasets that emerge from these new methods can reveal the “family tree” of development, a living creature’s entire cellular pedigree, cell by single cell.

Building those trees accurately from the resulting piles of data is no small feat. Enter the Allen Institute Cell Lineage Reconstruction DREAM Challenge, a competition launching Oct. 15 to find new computational answers to this problem. DREAM challenges are open science competitions to develop computational solutions to biomedical research problems, on topics ranging from preterm birth to Parkinson’s disease to basic molecular biology.

“Like many fields, the emerging field of cell lineage tracing is challenged by the level of data and computation that’s needed to surface new biological insights,” said Kathryn Richmond, Ph.D., Director of The Paul G. Allen Frontiers Group, a division of the Allen Institute which is supporting the upcoming challenge. “We’re excited to use this mechanism to engage the community and create possible approaches that will advance the science.”

We recently sat down with two of the lead researchers from the Allen Discovery Center for Lineage TracingJay Shendure, M.D., Ph.D., and Michael Elowitz, Ph.D., who are spearheading this DREAM Challenge, which opened for pre-registration Sept. 23, and Pablo Meyer, Ph.D., a computational biologist at IBM and director of the DREAM Challenges, to find out more about the competition. Meyer and other members of the team organizing the competition will also be hosting a webinar Oct. 28 at 10 a.m. PST.

The following conversation has been lightly editing for length and clarity.

Why do we need to understand cell lineages?

Shendure: We all begin as a single cell. The pattern of cell divisions is what gives rise to a functioning multicellular organism, be it a human or a plant. But with a few exceptions, we don’t have a solid understanding of how that tree unfolds in space and time.

Elowitz: In much of developmental biology, techniques only allow scientists to see what is happening from the outside looking in. What’s exciting now is you can program the cell to keep its own records during development and tell us what happened at the end, from the inside out. We think with that change in perspective we’ll be able to get a complementary view, a different kind of view of development.

What is the problem you are hoping to address through the DREAM Challenge?

Elowitz: The power of this new approach depends on how accurately you can reconstruct information about what happened to the cells when they were developing from what you can observe at the end of the experiment, the barcodes. You have to infer what happened from these observations, in much the same way that we infer what happened in evolutionary history from the genome sequences of modern species. Doing that accurately, doing it in a computationally efficient way, these are open challenges with this kind of data.

Meyer: There are algorithms to reconstruct evolutionary trees, but this problem is a bit different. You’re looking at single cells from the same organism, that is very new. You can use some of the existing algorithms and do a pretty good job, but we think there are ways to improve on this.

Why is the Allen Discovery Center opening this DREAM Challenge?

Shendure: There’s a community of people working on computational approaches ranging from classical algorithms to machine learning and other ideas. We’re hoping to be able to tap into that broader community for their knowledge and expertise. This is especially important now as we expect to have much more complicated cell lineage datasets coming down the pike soon.

How will the challenge participants attack this problem?

Meyer: Most of our challenges are machine learning problems, and typical machine learning involves classifying things into different categories. This problem is a bit different, but we think it can still be approached with machine learning. When you reconstruct a cell lineage tree, you want to see which cells are more closely related to others. You’re trying to find a path that unites all the cells. Machine learning methods like reinforcement learning are able to tackle challenges like this. Famous problems that have been solved by reinforcement learning are the game of GO, or even solving a Rubik’s cube, it’s a series of movements and decisions you have to make.

What would be your ideal outcome for this challenge?

Elowitz: We hope to find solutions that could be scaled up to larger datasets, and that these approaches will be available to the community and accessible so many other people could use them. And the other hope is that this will stimulate further interest in this general approach on the computational side. What can we say about a developing organism, and what can’t we say? We’re just starting down this road, and we don’t have that understanding yet.

Science Programs at Allen Institute