Projects

Current blog category

Recents post

From Blog

March 2, 2022

The OpenScPCA Project: What We've Built Together in Year One

The Open Single-cell Pediatric Cancer Atlas (OpenScPCA) project is one year old, and there is much to celebrate! For the past year, we’ve worked closely with pediatric cancer experts to analyze data from the ScPCA Portal, improving its utility for researchers everywhere. Our focus has been on adding reliable cell type annotations across samples on the Portal, but the journey has been much more than that.

March 2, 2022

Behind the scenes with an OpenScPCA contributor

Before we launched OpenScPCA, we had to outline the process for contributing to analyses and then document that process for others. In addition, when designing the process for contributing to the project, we made sure to implement strategies to ensure reproducibility over the life cycle of the project. After planning and documenting expectations for contributors, we prepared to launch our first call for contributions, where we asked pediatric cancer experts to help us assign cell type annotations for all samples on the Portal. We thought it would be helpful to have an existing analysis module that other contributors could reference, so we picked a member of our science team (it’s me, hi 👋) to go through the process of developing an analysis module.

March 2, 2022

Three reasons to share your pediatric cancer data on the ScPCA Portal

In 2023, we launched our first-ever call for contributions to the Single-cell Pediatric Cancer Atlas (ScPCA) Portal, inviting the research community to share their data. This initiative has been instrumental in expanding the Portal, with numerous pediatric cancer researchers responding to the call and collaborating with us to make more data available. Today, the Portal holds data from 700 samples across 55 cancer types, and we look forward to increasing those numbers with our latest call for contributions.

March 2, 2022

Diving into cell type annotation: Insights from the OpenScPCA project

Launching the Open Single-cell Pediatric Cancer Atlas (OpenScPCA) project in April 2024 was a highlight of our year! This community-driven initiative aims to analyze data from the ScPCA Portal, which currently holds 700 samples from over 55 pediatric cancer types. The project is a step forward in advancing our knowledge of pediatric cancers through single-cell analysis, and we're excited to expand OpenScPCA in 2025! To that end, we're reflecting on some of our recent accomplishments and how we can keep that momentum going into next year.

March 2, 2022

Three ways we’ve enhanced the Single-cell Pediatric Cancer Atlas (ScPCA) Portal in 2024!

When the Data Lab launched the Single-cell Pediatric Cancer Atlas (ScPCA) Portal in 2022, we knew it was only the beginning! We started by making data easily available for the research community and received an overwhelmingly positive response. But we know firsthand from training hundreds of pediatric cancer researchers in analysis that making data available is just the first step. We’re increasing the impact of the Portal by listening to the growing ScPCA community. Now more researchers can contribute datasets, new features are continuously being developed, and we started an open, collaborative project to further explore the available data! Here’s a look back at how we’ve enhanced the ScPCA Portal in 2024.

March 2, 2022

Building reproducible workflows for testing and reproducible results in OpenScPCA

In our last blog post, we shared some of the tools and methods we are using in the Open Single-cell Pediatric Cancer Atlas (OpenScPCA) project to ensure that the analysis code remains usable and runnable throughout the project. That post mainly focused on some of the most dynamic phases of the project, when contributors are adding new analysis modules and updating existing ones with more refined results. Here, we will discuss the test data that enables the methods and our approach to running the full set of analyses on real data.

March 2, 2022

Working reproducibly with others on OpenScPCA

Earlier this year, we launched the Open Single-cell Pediatric Cancer Atlas (OpenScPCA) project, a collaborative project to openly analyze the data in the Single-cell Pediatric Cancer Atlas Portal on GitHub. We hope this project will bring transparently and expertly assigned cell type labels to the data in the Portal, help the community understand the strengths and limitations of applying existing single-cell methods to pediatric cancer data, and, frankly, allow us to meet more scientists in our community working with single-cell data (maybe you? 😄).

March 2, 2022

Introducing the first community-contributed datasets on the ScPCA Portal!

In March 2022, we launched the Single-cell Pediatric Cancer Atlas (ScPCA) Portal to make uniformly processed single-cell and single-nuclei RNA-Seq data widely available to the childhood cancer research community. Initially, all data available on the Portal was generated through grants funded by Alex’s Lemonade Stand Foundation (ALSF) as part of the ScPCA project. But enabling access to ALSF-funded data was just the beginning of our vision.Sharing is key to ensuring the Portal’s continued growth. Our sights were set on allowing more pediatric cancer researchers to contribute data to the ScPCA Portal.

March 2, 2022

Introducing the Open Single-cell Pediatric Cancer Atlas (OpenScPCA) Project!

The Data Lab has just launched the brand new Open Single-cell Pediatric Cancer Atlas (OpenScPCA) project! This open, collaborative project aims to analyze data from the ScPCA Portal, which currently holds 500 samples from over 50 pediatric cancer types. We are seeking contributors with diverse skills and expertise to join the project!

March 2, 2022

Collaborating with the Data Lab on OpenPBTA shaped how our team works reproducibly

At the Center for Data-Driven Discovery in Biomedicine (D3b), I lead the Bioinformatics Translational Pediatric Oncology Team, a team of bioinformatics scientists. Our mission is to advance pediatric oncology research and precision medicine through collaboration and development of open-source analytical tools, frameworks, and data resources. In 1998, I lost my four year old cousin John Matthew to a brain tumor we now know was likely a diffuse intrinsic pontine glioma. So, it was bittersweet for me to see the Open Pediatric Brain Tumor Atlas (OpenPBTA) manuscript published in Cell Genomics on the last day of brain tumor awareness month this past year. But let’s rewind.

March 2, 2022

Downstream Analysis Workflows – do you have a list of genes whose expression you are particularly interested in?

The Childhood Cancer Data Lab maintains a collection of uniformly processed single-cell data from pediatric cancer clinical samples and xenografts in the Single-cell Pediatric Cancer Atlas (ScPCA) Portal. Although access to preprocessed data saves researchers time, we know that the downloads from the ScPCA Portal are only the starting point. That’s why we’ve created downstream analysis workflows for commonly performed analyses. Instead of writing code wholesale, you can analyze data once you’ve configured these workflows.

March 2, 2022

Lessons learned from working reproducibly with others

In September 2022, the Open Pediatric Brain Tumor Atlas (OpenPBTA) project culminated (for now) in a preprint on bioRxiv. This project, started in late 2019 and co-organized with the Center for Data Driven Discovery in Biomedicine (D3b) at Children’s Hospital of Philadelphia (CHOP), is a collaborative effort to comprehensively describe the Pediatric Brain Tumor Atlas (PBTA), a collection of multiple data types from tens of tumor types (read more about why crowdsourcing expertise for the study of pediatric brain tumors is important here). The project is designed to allow for contributions from experts across multiple institutions. We’ve conducted analysis and drafting of the manuscript openly on the version-control platform GitHub from the project’s inception to facilitate those contributions.

March 2, 2022

refine.bio refactoring and Web Accessibility

In this blog post, I’d like to give an overview of the refine.bio refactoring process and web accessibility considerations. Through this process, our goal is to enhance the site usability and performance by improving the code quality and making the application more accessible. But before going into more details about them, let me provide you a quick history of refine.bio.

March 2, 2022

Introducing the ScPCA downstream analysis workflow!

At the Data Lab, we are constantly looking for ways to enhance the tools we build for pediatric cancer researchers. Earlier this year, we launched the Single-cell Pediatric Cancer Atlas portal, a database of uniformly-processed single-cell data from pediatric cancer clinical samples. One way we felt the portal could be even more beneficial to pediatric cancer researchers is with a ready-to-go workflow that takes in single-cell data and prepares it for downstream analyses such as unsupervised clustering.

March 2, 2022

Teaching with live coding in R and RStudio

The Data Lab teaches data science courses targeted toward pediatric cancer researchers that introduce topics such as analysis of gene expression in bulk and single-cell data and principles of reproducible research. I wrote previously about how we use RStudio Server for our remote courses to simplify setup, and I wanted to write a bit more about some of the instructional practices we use so that our participants get the best experience we can provide. In particular, I wanted to talk about our use of live coding to facilitate active learning, and one of the tools we developed to make our course development just a bit easier.

March 2, 2022

Introducing the Single-cell Pediatric Cancer Atlas (ScPCA) Portal

The Single-cell Pediatric Cancer Atlas (ScPCA) project began in 2019 when Alex’s Lemonade Stand Foundation (ALSF) funded 10 awards for single-cell profiling of pediatric cancer samples. The goal was to produce an atlas of gene expression profiles for a variety of childhood cancer types from different organ sites.

March 2, 2022

How does big data help us tackle childhood cancer?

MultiPLIER is a machine learning approach that brings big data to bear on rare diseases. It’s also an example of the scientific approach and ethos of the CCDL, and the publication is a great opportunity to share how the CCDL is developing new technologies to accelerate research into cures for childhood cancers!

March 2, 2022

Does Bulk Tissue Still Belong in a Single-Cell Atlas?

Earlier this year, Alex’s Lemonade Stand Foundation identified single-cell gene expression profiling as an opportunity to build an atlas of cell types within tumors that could be broadly reused by pediatric cancer researchers.

March 2, 2022

Exploring neurofibromatosis data with refine.bio

I’m a scientist at Sage Bionetworks, a nonprofit research organization in Seattle, WA. My work focuses on a family of rare pediatric diseases (NF): neurofibromatosis type 1, type 2, and schwannomatosis.

March 2, 2022

3 things the CCDL is doing right now to keep pediatric cancer research moving forward

To help keep pediatric cancer research moving forward, here are 3 ways the CCDL is helping the research community during this time: refine.bio, virtual workshops, and the Open Pediatric Brain Tumor Atlas project.

March 2, 2022

OpenPBTA: Someone is wrong on the internet and it’s probably us (updated 9-9-2020)

Here at the Childhood Cancer Data Lab, we value transparency and the practice of open science. Much of the work we’ve done and the products that we build hinge on the generosity and openness of other scientists. In this post, as part of National Brain Tumor Awareness month, we want to talk about a project that our science team has been working on over the last few months (and to do so in a way that aligns with our values).

March 2, 2022

Introducing Example Analyses for Use with refine.bio Data

Introducing refine.bio examples. Here, users can access a variety of example analyses implemented in R, such as clustering and heat maps, differential expression analysis, and pathway analysis, for use with refine.bio data.

March 2, 2022

A Desperate Plea for a Free Software Alternative to Aspera

I work at the Childhood Cancer Data Lab, where we use very big data to find cures for childhood cancers. To move data around the internet at very high speeds, we are forced to use a proprietary software suite called Aspera. If somebody could make a Free Software alternative, the future of the internet would be way more awesome! Best of all, you can be the one to do it!

March 2, 2022

refine.bio, Part 2

March 2, 2022

refine.bio, Part 1