How we integrate science and engineering

September 30, 2019

Alex’s Lemonade Stand Foundation created the Childhood Cancer Data Lab (CCDL) to accelerate the discovery of treatments and cures for childhood cancers by empowering pediatric cancer experts poised for the next big discovery with the knowledge, data, and tools to reach it.

Team Composition and Roles

The CCDL team includes science, engineering, and design expertise. Combining these three disciplines in different ways across projects enables us to carry out our mission.

Our design-focused team members identify gaps that are hindering progress in childhood cancer research by engaging with the community. Our design team is currently the smallest team, consisting of a single user experience designer, who is responsible for foundational research, survey development, user experience testing, and strategic planning. Our user experience designer also attends many of our training workshops to discuss researchers’ needs and frustrations as well as to test our software products.

Our engineering team is focused on building robust software that addresses gaps in the field via well-understood workflows. The idea behind our engineering efforts is that when a technology becomes sufficiently mature that it can be reliably automated, it should be. The engineering team is behind the implementation of refine.bio, which aims to make public gene expression data widely available and reusable. The software also allows the science team to have a very large collection of public gene expression datasets at their fingertips via its API. The engineering team also brings expertise in best practices in software development to the data science and design teams.

Our data science team engages with the pediatric cancer community to address gaps in knowledge or capabilities. The data science team has developed training workshops to introduce folks in the field to reproducible computing practices and provide researchers with the knowledge to solve common challenges. The team nucleates a cancer data science Slack group that aims to provide frequent touchpoints to members of the field working on data analysis challenges. Together, the goal of these efforts is to enhance a community of practice around big data analytics in pediatric cancer. The data science team also develops new computational techniques to address gaps in capabilities. CCDL scientist Dr. Jaclyn Taroni led the development and characterization of an approach called MultiPLIER which used large compendia of public data (like those in refine.bio) to provide a more detailed understanding of rare diseases. A peer reviewed manuscript describing MultiPLIER was recently published in Cell Systems.

A diagram of how the science, engineering, and design teams interact at the Data Lab.

Team Tactics

Infusing perspectives across teams

Infusing scientific, engineering, and design perspectives into each others’ work has been vital to our team. It helps ensure that design and engineering solutions are addressing scientific problems effectively.

Data Science

The data science team uses software development practices, gleaned from the engineering team, to implement proof-of-concept workflows. For example, in the OpenPBTA project the data science team identified continuous integration as a technique that would support large-numbers of scientific contributors. Members of the data science team consulted with the engineering team to build a continuous integration solution for data science within the project. These types of efforts enable the data science team to incorporate best coding practices into their analyses making them technically more robust and easier to convert to production-ready code.

The data science team is engaged early in the process of design. A discussion of gaps, initiated by the design team, leads to conversations around solutions with the data science team. These user-centric design principles allow the data science team to ensure that the workflows and analyses are easily accessible to a broad set of researchers.

Engineering

We use Github to manage and maintain our code base. Whenever new code is added to the code base, the new code is reviewed by other members of the team. The reviewers look at the approach to solving the problem and also how the code was tested. To ensure scientific integrity is maintained when the engineering team implements scientific workflows, any new code which touches on scientific processes requires the engineering team member to write a methods section that describes what has been implemented and a member of the data science team is also requested to review it. This keeps the engineering team up-to-date with the most recent methods for the large-scale analysis of genomic data.

Design

The CCDL focuses its efforts on gaps identified through user research. Solutions are designed in collaboration with both the engineering and data science teams to ensure technical feasibility and that the solutions address scientific problems. The design team meets with at least one member of the data science and engineering teams for a whiteboarding session to design a feature to address the gap. The goal of this session is to generate a solution blueprint that is technically feasible and scientifically valid. Throughout the development process, feature workflows are reviewed at low and high levels of fidelity with the engineering and science teams before the mockups are handed to engineering for implementation.

Structured Meetings Avoid Within-team Silos

We adopt processes to help us prevent from working in silos within teams and increase the potential for collaboration and knowledge diffusion across teams.

Sprint planning meetings

A sprint at the CCDL is two weeks long. We meet at the beginning of each sprint to create a shared understanding of goals for the sprint and discuss the issues which need to be addressed.

Daily Stand-ups  

Each member of the team briefly mentions what they worked on the previous day, what they are planning to work on today, and mentions if their work is being blocked or they are being blocked by something. This adds transparency to each team's work and gives an opportunity to get input from other team members as needed.

Demo Day

At the end of sprints, the team gathers to demo what we have accomplished that sprint, such as a neat analysis notebook or implementation of a feature etc.

           Demo Day at the CCDL

These open and responsive in-person channels of communication have been key to preventing us from working in silos. We also use Slack, GitHub, and other electronic methods of communication to provide more frequent contact.

Researcher-centric Decisions

Our primary goal is to address gaps in the pediatric cancer field. We start out by identifying gaps through user research which provides a starting point for discussions around new features or new tools. When prioritizing features or tools, major factors that influence our decisions are whether or not the solution would be of value to the researcher immediately and the extent to which it fits well into their current ecosystem. New features and tools are regularly tested with researchers to validate its utility and discover new gaps via usability evaluations.

Concluding Thoughts

The CCDL team brings together the disciplines of design, data science, and engineering with the goal of producing robust solutions that address key gaps in the field. As we have grown we have refined our processes to maintain information flow across teams while trying to avoid meeting fatigue for team members. We aim to keep meetings small, focused, and to a defined duration, while also providing avenues for team-wide communication each sprint.

Alex’s Lemonade Stand Foundation created the Childhood Cancer Data Lab (CCDL) to accelerate the discovery of treatments and cures for childhood cancers by empowering pediatric cancer experts poised for the next big discovery with the knowledge, data, and tools to reach it.

Team Composition and Roles

The CCDL team includes science, engineering, and design expertise. Combining these three disciplines in different ways across projects enables us to carry out our mission.

Our design-focused team members identify gaps that are hindering progress in childhood cancer research by engaging with the community. Our design team is currently the smallest team, consisting of a single user experience designer, who is responsible for foundational research, survey development, user experience testing, and strategic planning. Our user experience designer also attends many of our training workshops to discuss researchers’ needs and frustrations as well as to test our software products.

Our engineering team is focused on building robust software that addresses gaps in the field via well-understood workflows. The idea behind our engineering efforts is that when a technology becomes sufficiently mature that it can be reliably automated, it should be. The engineering team is behind the implementation of refine.bio, which aims to make public gene expression data widely available and reusable. The software also allows the science team to have a very large collection of public gene expression datasets at their fingertips via its API. The engineering team also brings expertise in best practices in software development to the data science and design teams.

Our data science team engages with the pediatric cancer community to address gaps in knowledge or capabilities. The data science team has developed training workshops to introduce folks in the field to reproducible computing practices and provide researchers with the knowledge to solve common challenges. The team nucleates a cancer data science Slack group that aims to provide frequent touchpoints to members of the field working on data analysis challenges. Together, the goal of these efforts is to enhance a community of practice around big data analytics in pediatric cancer. The data science team also develops new computational techniques to address gaps in capabilities. CCDL scientist Dr. Jaclyn Taroni led the development and characterization of an approach called MultiPLIER which used large compendia of public data (like those in refine.bio) to provide a more detailed understanding of rare diseases. A peer reviewed manuscript describing MultiPLIER was recently published in Cell Systems.

A diagram of how the science, engineering, and design teams interact at the Data Lab.

Team Tactics

Infusing perspectives across teams

Infusing scientific, engineering, and design perspectives into each others’ work has been vital to our team. It helps ensure that design and engineering solutions are addressing scientific problems effectively.

Data Science

The data science team uses software development practices, gleaned from the engineering team, to implement proof-of-concept workflows. For example, in the OpenPBTA project the data science team identified continuous integration as a technique that would support large-numbers of scientific contributors. Members of the data science team consulted with the engineering team to build a continuous integration solution for data science within the project. These types of efforts enable the data science team to incorporate best coding practices into their analyses making them technically more robust and easier to convert to production-ready code.

The data science team is engaged early in the process of design. A discussion of gaps, initiated by the design team, leads to conversations around solutions with the data science team. These user-centric design principles allow the data science team to ensure that the workflows and analyses are easily accessible to a broad set of researchers.

Engineering

We use Github to manage and maintain our code base. Whenever new code is added to the code base, the new code is reviewed by other members of the team. The reviewers look at the approach to solving the problem and also how the code was tested. To ensure scientific integrity is maintained when the engineering team implements scientific workflows, any new code which touches on scientific processes requires the engineering team member to write a methods section that describes what has been implemented and a member of the data science team is also requested to review it. This keeps the engineering team up-to-date with the most recent methods for the large-scale analysis of genomic data.

Design

The CCDL focuses its efforts on gaps identified through user research. Solutions are designed in collaboration with both the engineering and data science teams to ensure technical feasibility and that the solutions address scientific problems. The design team meets with at least one member of the data science and engineering teams for a whiteboarding session to design a feature to address the gap. The goal of this session is to generate a solution blueprint that is technically feasible and scientifically valid. Throughout the development process, feature workflows are reviewed at low and high levels of fidelity with the engineering and science teams before the mockups are handed to engineering for implementation.

Structured Meetings Avoid Within-team Silos

We adopt processes to help us prevent from working in silos within teams and increase the potential for collaboration and knowledge diffusion across teams.

Sprint planning meetings

A sprint at the CCDL is two weeks long. We meet at the beginning of each sprint to create a shared understanding of goals for the sprint and discuss the issues which need to be addressed.

Daily Stand-ups  

Each member of the team briefly mentions what they worked on the previous day, what they are planning to work on today, and mentions if their work is being blocked or they are being blocked by something. This adds transparency to each team's work and gives an opportunity to get input from other team members as needed.

Demo Day

At the end of sprints, the team gathers to demo what we have accomplished that sprint, such as a neat analysis notebook or implementation of a feature etc.

           Demo Day at the CCDL

These open and responsive in-person channels of communication have been key to preventing us from working in silos. We also use Slack, GitHub, and other electronic methods of communication to provide more frequent contact.

Researcher-centric Decisions

Our primary goal is to address gaps in the pediatric cancer field. We start out by identifying gaps through user research which provides a starting point for discussions around new features or new tools. When prioritizing features or tools, major factors that influence our decisions are whether or not the solution would be of value to the researcher immediately and the extent to which it fits well into their current ecosystem. New features and tools are regularly tested with researchers to validate its utility and discover new gaps via usability evaluations.

Concluding Thoughts

The CCDL team brings together the disciplines of design, data science, and engineering with the goal of producing robust solutions that address key gaps in the field. As we have grown we have refined our processes to maintain information flow across teams while trying to avoid meeting fatigue for team members. We aim to keep meetings small, focused, and to a defined duration, while also providing avenues for team-wide communication each sprint.

Back To Blog