Current blog category
At the Center for Data-Driven Discovery in Biomedicine (D3b), I lead the Bioinformatics Translational Pediatric Oncology Team, a team of bioinformatics scientists. Our mission is to advance pediatric oncology research and precision medicine through collaboration and development of open-source analytical tools, frameworks, and data resources. In 1998, I lost my four year old cousin John Matthew to a brain tumor we now know was likely a diffuse intrinsic pontine glioma. So, it was bittersweet for me to see the Open Pediatric Brain Tumor Atlas (OpenPBTA) manuscript published in Cell Genomics on the last day of brain tumor awareness month this past year. But let’s rewind.
The Childhood Cancer Data Lab maintains a collection of uniformly processed single-cell data from pediatric cancer clinical samples and xenografts in the Single-cell Pediatric Cancer Atlas (ScPCA) Portal. Although access to preprocessed data saves researchers time, we know that the downloads from the ScPCA Portal are only the starting point. That’s why we’ve created downstream analysis workflows for commonly performed analyses. Instead of writing code wholesale, you can analyze data once you’ve configured these workflows.
In September 2022, the Open Pediatric Brain Tumor Atlas (OpenPBTA) project culminated (for now) in a preprint on bioRxiv. This project, started in late 2019 and co-organized with the Center for Data Driven Discovery in Biomedicine (D3b) at Children’s Hospital of Philadelphia (CHOP), is a collaborative effort to comprehensively describe the Pediatric Brain Tumor Atlas (PBTA), a collection of multiple data types from tens of tumor types (read more about why crowdsourcing expertise for the study of pediatric brain tumors is important here). The project is designed to allow for contributions from experts across multiple institutions. We’ve conducted analysis and drafting of the manuscript openly on the version-control platform GitHub from the project’s inception to facilitate those contributions.
In this blog post, I’d like to give an overview of the refine.bio refactoring process and web accessibility considerations. Through this process, our goal is to enhance the site usability and performance by improving the code quality and making the application more accessible. But before going into more details about them, let me provide you a quick history of refine.bio.
At the Data Lab, we are constantly looking for ways to enhance the tools we build for pediatric cancer researchers. Earlier this year, we launched the Single-cell Pediatric Cancer Atlas portal, a database of uniformly-processed single-cell data from pediatric cancer clinical samples. One way we felt the portal could be even more beneficial to pediatric cancer researchers is with a ready-to-go workflow that takes in single-cell data and prepares it for downstream analyses such as unsupervised clustering.
The Data Lab teaches data science courses targeted toward pediatric cancer researchers that introduce topics such as analysis of gene expression in bulk and single-cell data and principles of reproducible research. I wrote previously about how we use RStudio Server for our remote courses to simplify setup, and I wanted to write a bit more about some of the instructional practices we use so that our participants get the best experience we can provide. In particular, I wanted to talk about our use of live coding to facilitate active learning, and one of the tools we developed to make our course development just a bit easier.
The Single-cell Pediatric Cancer Atlas (ScPCA) Portal project began in 2019 when Alex’s Lemonade Stand Foundation (ALSF) funded 10 awards for single-cell profiling of pediatric cancer samples. The goal was to produce an atlas of gene expression profiles for a variety of childhood cancer types from different organ sites.
MultiPLIER is a machine learning approach that brings big data to bear on rare diseases. It’s also an example of the scientific approach and ethos of the CCDL, and the publication is a great opportunity to share how the CCDL is developing new technologies to accelerate research into cures for childhood cancers!
Earlier this year, Alex’s Lemonade Stand Foundation identified single-cell gene expression profiling as an opportunity to build an atlas of cell types within tumors that could be broadly reused by pediatric cancer researchers.
To help keep pediatric cancer research moving forward, here are 3 ways the CCDL is helping the research community during this time: refine.bio, virtual workshops, and the Open Pediatric Brain Tumor Atlas project.
Here at the Childhood Cancer Data Lab, we value transparency and the practice of open science. Much of the work we’ve done and the products that we build hinge on the generosity and openness of other scientists. In this post, as part of National Brain Tumor Awareness month, we want to talk about a project that our science team has been working on over the last few months (and to do so in a way that aligns with our values).
Introducing refine.bio examples. Here, users can access a variety of example analyses implemented in R, such as clustering and heat maps, differential expression analysis, and pathway analysis, for use with refine.bio data.
I work at the Childhood Cancer Data Lab, where we use very big data to find cures for childhood cancers. To move data around the internet at very high speeds, we are forced to use a proprietary software suite called Aspera. If somebody could make a Free Software alternative, the future of the internet would be way more awesome! Best of all, you can be the one to do it!