Building Reproducible Research Skills: A Training Workshop with the Treehouse Childhood Cancer Initiative

June 3, 2025

The Data Lab recently traveled to California to lead a hands-on workshop for nine researchers from the UC Santa Cruz Treehouse Childhood Cancer Initiative. The participants, all from a range of backgrounds and experience levels, came together to learn common practices for reproducible computational research. Our relationship with Treehouse spans years, grounded in a shared commitment to open science and reproducibility. This workshop was a chance to strengthen that partnership and an opportunity to put shared values into practice.

Dr. Holly Beale, Lead Computational Biologist at the Treehouse Childhood Cancer Initiative, is no stranger to the Data Lab. We've had opportunities to work together by exchanging feedback on each other’s work and even co-mentoring a postdoctoral researcher. Dr. Beale also participated in the Open Pediatric Brain Tumor Atlas (OpenPBTA), a project we co-led with the Center for Data-Driven Discovery in Biomedicine (D3b) at Children’s Hospital of Philadelphia (CHOP). During that project, Dr. Beale experienced firsthand how we use Git and GitHub to coordinate cross-institutional research. This project is now complete, but you can still view the OpenPBTA-analysis repository

Creating a custom workshop

Dr. Beale, a strong advocate for open science and collaboration, was eager to build on those experiences. In April 2025, the perfect opportunity arose when we partnered with the Treehouse team to develop a custom training workshop taught by Data Lab Director Dr. Jaclyn Taroni and Senior Data Scientist Dr. Joshua Shapiro. Over two days, Dr. Beale and colleagues joined us to level up their reproducible research skills.

To design this workshop, members of the Treehouse team first reviewed material from our existing reproducible research workshop and a previous course we created for researchers at CHOP. (Check out this blog post on Git workflows, which we covered during the CHOP workshop.) They selected the topics most relevant to their needs and interests. Together, we decided the Data Lab would deliver a workshop covering version control, collaborative project management with Git/GitHub, analytical code review, and more.

Here’s a glimpse of the program we put together for the group. You can view the full workshop schedule and materials here.

The impact of training

For Dr. Beale, the training offered new insights into using GitHub as a real-time collaboration tool. The Treehouse team has already started applying these practices in a joint project with clinicians at UC San Francisco, investigating undiagnosed neuromuscular disorders using gene expression data from biopsies of 16 pediatric patients. Their GitHub repository now serves as a shared space where collaborators can contribute their analysis and code, share feedback, and build on other’s progress. 

"Since all members of Treehouse developed this expertise, our communication around code has improved significantly. We are sharing coding practices much more efficiently and at a finer scale. We are also all thinking about data and code in ways that are reproducible from the start, rather than trying to add reproducibility after all the work is done. With early adoption and frequent feedback, the reproducibility is far more robust," says Dr. Beale. 

We’re thrilled with the success of this workshop and look forward to hosting more like it! Since 2019, we have trained 400 researchers from over 80 institutions around the world. Interested in bringing a Data Lab workshop to your institution? Contact us at training@ccdatalab.org.

The Data Lab recently traveled to California to lead a hands-on workshop for nine researchers from the UC Santa Cruz Treehouse Childhood Cancer Initiative. The participants, all from a range of backgrounds and experience levels, came together to learn common practices for reproducible computational research. Our relationship with Treehouse spans years, grounded in a shared commitment to open science and reproducibility. This workshop was a chance to strengthen that partnership and an opportunity to put shared values into practice.

Dr. Holly Beale, Lead Computational Biologist at the Treehouse Childhood Cancer Initiative, is no stranger to the Data Lab. We've had opportunities to work together by exchanging feedback on each other’s work and even co-mentoring a postdoctoral researcher. Dr. Beale also participated in the Open Pediatric Brain Tumor Atlas (OpenPBTA), a project we co-led with the Center for Data-Driven Discovery in Biomedicine (D3b) at Children’s Hospital of Philadelphia (CHOP). During that project, Dr. Beale experienced firsthand how we use Git and GitHub to coordinate cross-institutional research. This project is now complete, but you can still view the OpenPBTA-analysis repository

Creating a custom workshop

Dr. Beale, a strong advocate for open science and collaboration, was eager to build on those experiences. In April 2025, the perfect opportunity arose when we partnered with the Treehouse team to develop a custom training workshop taught by Data Lab Director Dr. Jaclyn Taroni and Senior Data Scientist Dr. Joshua Shapiro. Over two days, Dr. Beale and colleagues joined us to level up their reproducible research skills.

To design this workshop, members of the Treehouse team first reviewed material from our existing reproducible research workshop and a previous course we created for researchers at CHOP. (Check out this blog post on Git workflows, which we covered during the CHOP workshop.) They selected the topics most relevant to their needs and interests. Together, we decided the Data Lab would deliver a workshop covering version control, collaborative project management with Git/GitHub, analytical code review, and more.

Here’s a glimpse of the program we put together for the group. You can view the full workshop schedule and materials here.

The impact of training

For Dr. Beale, the training offered new insights into using GitHub as a real-time collaboration tool. The Treehouse team has already started applying these practices in a joint project with clinicians at UC San Francisco, investigating undiagnosed neuromuscular disorders using gene expression data from biopsies of 16 pediatric patients. Their GitHub repository now serves as a shared space where collaborators can contribute their analysis and code, share feedback, and build on other’s progress. 

"Since all members of Treehouse developed this expertise, our communication around code has improved significantly. We are sharing coding practices much more efficiently and at a finer scale. We are also all thinking about data and code in ways that are reproducible from the start, rather than trying to add reproducibility after all the work is done. With early adoption and frequent feedback, the reproducibility is far more robust," says Dr. Beale. 

We’re thrilled with the success of this workshop and look forward to hosting more like it! Since 2019, we have trained 400 researchers from over 80 institutions around the world. Interested in bringing a Data Lab workshop to your institution? Contact us at training@ccdatalab.org.

Back To Blog