Hello World!

July 27, 2017

CASEY GREENE

This blog will usually host technical content such as descriptions of the architecture of software that we build and notes from the team on specific topics. However, before that starts, I want to take the opportunity to say hello. The Childhood Cancer Data Lab was announced alongside the National Cancer Moonshot Summit hosted with Joe and Jill Biden. At the time, the goal was to launch this lab within 12-18 months. This post notes the start of the group. The initial team includes me as well as two software engineers from my lab at Penn who will work on CCDL software, Dongbo Hu and Kurt Wheeler. We’re actively recruiting additional team members and have open searches for a UX Designer, Frontend Developer, a Backend Engineer, and Full Stack Developer.

In the data lab, our goal is to provide the childhood cancer research community with new tools that enable scientists to put data to use. At the outset, we’re particularly focused on publicly available data. This is because there are millions of publicly available genome-wide assays. The primary challenge they pose is that the whole collection of these assays remains difficult to use, in large part because the data aren’t harmonized. This means that results from one set of assays are not directly comparable to results from another set.

We are now in the process of designing, developing, and deploying software that we call refine.bio. This software will run, all the time, on commercial cloud servers. At times, cloud providers will have computers that are not being actively used. They sell time on these machines at a fraction of the normal cost. The refine.bio server will take advantage of these spare computers to process and harmonize public datasets, which will be provided to the research community for free.

The ability to think big and to provide resources for research is one of the reasons that we’re thrilled to be powered by Alex’s Lemonade Stand Foundation. In addition to providing data, the software developed in the Childhood Cancer Data Lab, which we call the CCDL, will be provided under open licenses as we develop it. For example, the Data Refinery source code is available while we’re developing it. This allows others in the community to participate in development, contributing their own technical skills to the fight. As the software is developed, others can extend it or repurpose it for new tasks without having the rebuild the foundations.

We’re also in the planning stages for other projects and initiatives focused on data access and analysis for childhood cancer researchers. We look forward to joining the community in the fight.

This blog will usually host technical content such as descriptions of the architecture of software that we build and notes from the team on specific topics. However, before that starts, I want to take the opportunity to say hello. The Childhood Cancer Data Lab was announced alongside the National Cancer Moonshot Summit hosted with Joe and Jill Biden. At the time, the goal was to launch this lab within 12-18 months. This post notes the start of the group. The initial team includes me as well as two software engineers from my lab at Penn who will work on CCDL software, Dongbo Hu and Kurt Wheeler. We’re actively recruiting additional team members and have open searches for a UX Designer, Frontend Developer, a Backend Engineer, and Full Stack Developer.

In the data lab, our goal is to provide the childhood cancer research community with new tools that enable scientists to put data to use. At the outset, we’re particularly focused on publicly available data. This is because there are millions of publicly available genome-wide assays. The primary challenge they pose is that the whole collection of these assays remains difficult to use, in large part because the data aren’t harmonized. This means that results from one set of assays are not directly comparable to results from another set.

We are now in the process of designing, developing, and deploying software that we call refine.bio. This software will run, all the time, on commercial cloud servers. At times, cloud providers will have computers that are not being actively used. They sell time on these machines at a fraction of the normal cost. The refine.bio server will take advantage of these spare computers to process and harmonize public datasets, which will be provided to the research community for free.

The ability to think big and to provide resources for research is one of the reasons that we’re thrilled to be powered by Alex’s Lemonade Stand Foundation. In addition to providing data, the software developed in the Childhood Cancer Data Lab, which we call the CCDL, will be provided under open licenses as we develop it. For example, the Data Refinery source code is available while we’re developing it. This allows others in the community to participate in development, contributing their own technical skills to the fight. As the software is developed, others can extend it or repurpose it for new tasks without having the rebuild the foundations.

We’re also in the planning stages for other projects and initiatives focused on data access and analysis for childhood cancer researchers. We look forward to joining the community in the fight.

Related Post

March 2, 2022

Data Lab Advanced Single-cell RNA-Seq Workshop, Virtual, June 8-12, 2026

Applications are open for the Data Lab's upcoming workshop, which will cover advanced topics in the analysis of single-cell RNA-seq data for researchers studying pediatric cancer. The course will be held virtually from June 8-12, 2026 from 12-5pm Eastern time.

March 2, 2022

Data Lab Introduction to Single-Cell RNA-Seq Workshop, Virtual, May 11-15, 2026

The Data Lab will be holding a virtual workshop, Introduction to Single-cell RNA-Sequencing, from May 11-15, 2026! In this workshop, Data Lab staff will introduce researchers studying pediatric cancer to the R programming language, the Tidyverse R packages for data science, single-cell RNA-seq data analysis, and annotating cell types.

January 11, 2020

New Feature Release: Cell Type Annotations, CNV Inference, Custom Downloads, and More on the ScPCA Portal

Exciting news from the Single-cell Pediatric Cancer Atlas (ScPCA) Portal! All datasets on the portal have been updated to include several new features that enhance data quality and usability. Here’s a close look at what we’ve added and why.

January 11, 2020

Help You Help Future You: Organizing your research projects reproducibly

It's that time of year again – it's time to start a new research project! Planning for the research itself can be daunting, but planning for your work to be fully reproducible is a whole other layer that isn't always emphasized. In this blog post, we'll outline a few project organization principles we teach in our Reproducible Research Practices workshop that we think are really helpful for ensuring reproducibility (among other benefits!).

January 11, 2020

Pediatric Cancer Researchers Driving Progress Through Data, Training, and Collaboration

At the Childhood Cancer Data Lab, we support pediatric cancer researchers by providing data science training, collaborating on data-intensive projects, and designing open-source tools. Through this work, we’ve had the opportunity to engage with scientists who are applying rigorous, data-driven approaches to address some of the most pressing challenges in childhood cancer.

March 2, 2022

Full: Data Lab Advanced Single-Cell RNA-Seq Workshop, Virtual, December 8-12, 2025

Applications are open for the Data Lab's upcoming workshop, which will cover advanced topics in the analysis of single-cell RNA-seq data for researchers studying pediatric cancer. The course will be held virtually from December 8-12, 2025 from 12-5pm Eastern time.

March 2, 2022

Full: Level Up Your Reproducible Research Skills at Our Next Workshop!

At the Childhood Cancer Data Lab, we’re committed to helping pediatric cancer researchers work more efficiently, collaboratively, and reproducibly. That’s why we created our Reproducible Research Practices workshop, first piloted in 2022 with six participants. Since then, more than 50 pediatric cancer researchers have joined us to learn hands-on techniques for achieving reproducible results in computational cancer research. This fall, we’re excited to hold the next workshop in Philadelphia, PA!

September 13, 2024

Building Reproducible Research Skills: A Training Workshop with the Treehouse Childhood Cancer Initiative

The Data Lab recently traveled to California to lead a hands-on workshop for nine researchers from the UC Santa Cruz Treehouse Childhood Cancer Initiative. The participants, all from a range of backgrounds and experience levels, came together to learn common practices for reproducible computational research. Our relationship with Treehouse spans years, grounded in a shared commitment to open science and reproducibility. This workshop was a chance to strengthen that partnership and an opportunity to put shared values into practice!

May 4, 2022

Use cases as a brainstorming tool

‍Use cases define how users interact with a product or system, including actions users can take and how the system responds. It also identifies user goals and paths for the system to handle errors.

March 2, 2022

The OpenScPCA Project: What We've Built Together in Year One

The Open Single-cell Pediatric Cancer Atlas (OpenScPCA) project is one year old, and there is much to celebrate! For the past year, we’ve worked closely with pediatric cancer experts to analyze data from the ScPCA Portal, improving its utility for researchers everywhere. Our focus has been on adding reliable cell type annotations across samples on the Portal, but the journey has been much more than that.

March 2, 2022

Full: Data Lab Introduction to Single-Cell RNA-Seq Workshop, Virtual, August 4-8, 2025

The Data Lab will be holding a virtual workshop, Introduction to Single-cell RNA-Sequencing, from August 4-8, 2025! In this workshop, Data Lab staff will introduce researchers studying pediatric cancer to the R programming language, the Tidyverse R packages for data science, single-cell RNA-seq data analysis, and annotating cell types.

March 2, 2022

Full: Data Lab Advanced Single-cell RNA-Seq Workshop, Philadelphia area, June 10-12, 2025

Applications are open for the Data Lab's next training workshop! We will cover advanced topics in the analysis of single-cell RNA-seq data for researchers studying pediatric cancer. The 3-day course will take place June 10-12, 2025 from 9am-5pm Eastern time in Bala Cynwyd, PA, just outside of Philadelphia.

March 2, 2022

Behind the scenes with an OpenScPCA contributor

Before we launched OpenScPCA, we had to outline the process for contributing to analyses and then document that process for others. In addition, when designing the process for contributing to the project, we made sure to implement strategies to ensure reproducibility over the life cycle of the project. After planning and documenting expectations for contributors, we prepared to launch our first call for contributions, where we asked pediatric cancer experts to help us assign cell type annotations for all samples on the Portal. We thought it would be helpful to have an existing analysis module that other contributors could reference, so we picked a member of our science team (it’s me, hi 👋) to go through the process of developing an analysis module.

March 2, 2022

Alex’s Lemonade Stand Foundation at AACR 2025: Grants, workshops, and collaborative projects to accelerate your research!

Are you attending the American Association for Cancer Research (AACR) Annual Meeting in Chicago, IL? Visit us in the exhibit hall at booth 3706 from April 27-30, 2025, and during poster sessions. We have exciting news about grant opportunities, projects, free training workshops, and more!

March 2, 2022

Three reasons to share your pediatric cancer data on the ScPCA Portal

In 2023, we launched our first-ever call for contributions to the Single-cell Pediatric Cancer Atlas (ScPCA) Portal, inviting the research community to share their data. This initiative has been instrumental in expanding the Portal, with numerous pediatric cancer researchers responding to the call and collaborating with us to make more data available. Today, the Portal holds data from 700 samples across 55 cancer types, and we look forward to increasing those numbers with our latest call for contributions.

March 2, 2022

Full: Data Lab Single-Cell RNA-Seq Workshop, Virtual, March 24-28, 2025

We are excited to announce that our next virtual workshop, Introduction to Single-cell RNA-Seq, will run from March 24-28,2025! In this workshop, Data Lab staff will introduce researchers studying pediatric cancer to the R programming language, the Tidyverse R packages for data science, single-cell RNA-seq data analysis, and annotating cell types.

March 2, 2022

Diving into cell type annotation: Insights from the OpenScPCA project

Launching the Open Single-cell Pediatric Cancer Atlas (OpenScPCA) project in April 2024 was a highlight of our year! This community-driven initiative aims to analyze data from the ScPCA Portal, which currently holds 700 samples from over 55 pediatric cancer types. The project is a step forward in advancing our knowledge of pediatric cancers through single-cell analysis, and we're excited to expand OpenScPCA in 2025! To that end, we're reflecting on some of our recent accomplishments and how we can keep that momentum going into next year.

March 2, 2022

Three ways we’ve enhanced the Single-cell Pediatric Cancer Atlas (ScPCA) Portal in 2024!

When the Data Lab launched the Single-cell Pediatric Cancer Atlas (ScPCA) Portal in 2022, we knew it was only the beginning! We started by making data easily available for the research community and received an overwhelmingly positive response. But we know firsthand from training hundreds of pediatric cancer researchers in analysis that making data available is just the first step. We’re increasing the impact of the Portal by listening to the growing ScPCA community. Now more researchers can contribute datasets, new features are continuously being developed, and we started an open, collaborative project to further explore the available data! Here’s a look back at how we’ve enhanced the ScPCA Portal in 2024.

March 2, 2022

Building reproducible workflows for testing and reproducible results in OpenScPCA

In our last blog post, we shared some of the tools and methods we are using in the Open Single-cell Pediatric Cancer Atlas (OpenScPCA) project to ensure that the analysis code remains usable and runnable throughout the project. That post mainly focused on some of the most dynamic phases of the project, when contributors are adding new analysis modules and updating existing ones with more refined results. Here, we will discuss the test data that enables the methods and our approach to running the full set of analyses on real data.

March 2, 2022

Full: Data Lab Advanced Single-cell RNA-Seq Workshop, Philadelphia area, December 10-12, 2024

Applications are open for the Data Lab's next training workshop! We will cover advanced topics in the analysis of single-cell RNA-seq data for researchers studying pediatric cancer. The 3-day course will take place December 10-12, 2024 from 9am-5pm Eastern time in Bala Cynwyd, PA, just outside of Philadelphia.

March 2, 2022

Working reproducibly with others on OpenScPCA

Earlier this year, we launched the Open Single-cell Pediatric Cancer Atlas (OpenScPCA) project, a collaborative project to openly analyze the data in the Single-cell Pediatric Cancer Atlas Portal on GitHub. We hope this project will bring transparently and expertly assigned cell type labels to the data in the Portal, help the community understand the strengths and limitations of applying existing single-cell methods to pediatric cancer data, and, frankly, allow us to meet more scientists in our community working with single-cell data (maybe you? 😄).

September 13, 2024

A week of Bulk RNA-Seq at the University of Minnesota!

Recently, the Data Lab packed up and headed to the University of Minnesota (UMN) to host a workshop for 19 researchers. Participants with a variety of skill levels and backgrounds joined us from UMN, St. Jude Children’s Research Hospital, the Mayo Clinic, and the Medical University of South Carolina.

March 2, 2022

Full: Data Lab Reproducible Research Practices Workshop, Milwaukee, October 23-24, 2024

Applications are open for the Data Lab's next workshop! We will hold a Reproducible Research Practices Course on October 23-24, 2024 in Milwaukee, WI. Instructors will introduce principles and techniques to achieve reproducible results in computational cancer research. We’ll show you the fundamentals of commonly used approaches in reproducibility that you can apply to increase the impact of your research by making your findings more robust and reliable! To ensure that workshop attendees have a great hands-on experience, a very limited number of seats will be available.

March 2, 2022

Full: Data Lab Bulk RNA-Seq and Reproducible Research Practices Workshop, Minneapolis, August 19-22, 2024

We are excited to announce our next workshop, Introduction to Bulk RNA-Sequencing and Reproducible Research Practices, will take place in Minneapolis, MN from August 19-22, 2024! In this workshop, Data Lab staff will introduce researchers studying pediatric cancer to the R programming language, the Tidyverse R packages for data science, bulk RNA-seq data analysis, pathway analyses, and techniques to achieve reproducible results in computational cancer research.

March 2, 2022

OpenScPCA: Call for contributions, new grant offerings, and analyses in progress!

In April 2024, we announced the Open Single-cell Pediatric Cancer Atlas (OpenScPCA) project. Since then, we’ve been working to build a supportive community while getting started on a few analysis ideas! We’re excited to see growing interest in the project, and we have some big news for prospective collaborators.

May 4, 2022

Choosing wisely: A behind-the-scenes look at how we selected cell type annotation platforms for the ScPCA Portal

So you recently did some single-cell RNA sequencing and are working on analyzing your data. You’ve already quantified the gene expression data, performed any filtering, and normalized your data, but now what? You know you want to perform differential expression analysis or that you need to annotate the cell types found in your data, but there are so many different tools and methods for performing these analyses. How do you know which one is the best method for your dataset? Don’t worry, we’ve all been there – even experts in the single-cell field have been there.

January 11, 2020

Prototyping process with journey maps

The Open Single-cell Pediatric Cancer Atlas (OpenScPCA) is an open, collaborative project to analyze data from the Single-cell Pediatric Cancer Atlas (ScPCA) Portal, which currently holds over 500 samples from over 50 pediatric cancer types. OpenScPCA uses an open contribution model designed to allow experts worldwide to contribute and rapidly share the results of analyses in real time. The project was officially launched in April 2024.

March 2, 2022

Introducing the first community-contributed datasets on the ScPCA Portal!

In March 2022, we launched the Single-cell Pediatric Cancer Atlas (ScPCA) Portal to make uniformly processed single-cell and single-nuclei RNA-Seq data widely available to the childhood cancer research community. Initially, all data available on the Portal was generated through grants funded by Alex’s Lemonade Stand Foundation (ALSF) as part of the ScPCA project. But enabling access to ALSF-funded data was just the beginning of our vision.Sharing is key to ensuring the Portal’s continued growth. Our sights were set on allowing more pediatric cancer researchers to contribute data to the ScPCA Portal.

March 2, 2022

Introducing the Open Single-cell Pediatric Cancer Atlas (OpenScPCA) Project!

The Data Lab has just launched the brand new Open Single-cell Pediatric Cancer Atlas (OpenScPCA) project! This open, collaborative project aims to analyze data from the ScPCA Portal, which currently holds 500 samples from over 50 pediatric cancer types. We are seeking contributors with diverse skills and expertise to join the project!

March 2, 2022

Full: Data Lab Single-Cell RNA-Seq Workshop, Virtual, June 10-14, 2024

We are excited to announce that our next virtual workshop, Introduction to Single-cell RNA-Seq, will run from June 10-14, 2024! In this workshop, Data Lab staff will introduce researchers studying pediatric cancer to the R programming language, the Tidyverse R packages for data science, single-cell RNA-seq data analysis, and annotating cell types.

March 2, 2022

Full: Data Lab Reproducible Research Practices and Introduction to OpenScPCA Workshop, Philadelphia, May 14-15, 2024

Applications are open for the Data Lab's next workshop! We are holding a two-day course on Reproducible Research Practices and the Open Single-cell Pediatric Cancer Atlas (OpenScPCA) project from May 14-15, 2024. Please note that the OpenScPCA module is an optional part of the workshop. The course begins with an introduction to principles and techniques to achieve reproducible results in computational cancer research. On day two, you can choose to continue the workshop and learn how to put your skills to use for OpenScPCA, our new pediatric cancer research project.

March 2, 2022

Alex’s Lemonade Stand Foundation at AACR 2024: Resources, tools, and opportunities for pediatric cancer researchers

Are you attending the American Association for Cancer Research (AACR) annual meeting in San Diego, CA? Visit the Alex’s Lemonade Stand Foundation (ALSF) Grants and Data Lab teams at booth 3755 in the exhibit hall from April 7-10 and during poster sessions on April 8. We will announce a new collaborative project and share exciting news about the Single-cell Pediatric Cancer Atlas Portal and training opportunities!

January 11, 2020

Meet the women who integrate science, engineering, and design at the Childhood Cancer Data Lab

Did you know that 70% of the Alex’s Lemonade Stand Foundation (ALSF) Childhood Cancer Data Lab team are currently women? Advancing our mission to empower childhood cancer researchers with knowledge, data, and tools would not be possible without their expertise. On the International Day of Women and Girls in Science, we are excited to introduce you to these women who integrate science, engineering, and design to tackle some of the greatest challenges faced by the pediatric cancer research community!

May 4, 2022

Don't Make Me Write: Tips for Avoiding Typing in RStudio

I have a confession to make: I am lazy. Ok, maybe that's too strong. Let's go for a euphemism instead: I am efficient. I love learning handy tricks that make my life easier and make my job smoother with fewer hiccups along the way. This is one part of why, here in the Data Lab, we love automation - why waste our time on rote, repetitive, housekeeping tasks when we can get the bots to do it for us? In this blog post, we'll highlight a few tips about how you can use RStudio to code more efficiently.

January 11, 2020

Git workflows for scientific projects and when we use them

Writing source code is a significant part of data-intensive biomedical research. Everything from cleaning and pre-processing data to generating publication figures can be accomplished programmatically. Increasingly, funding agencies and journals require researchers to share their code. To pick a few examples, the Data Lab’s parent organization, Alex’s Lemonade Stand Foundation (ALSF), has such a requirement for awardees, and PLoS Computational Biology requires authors to make code underlying results and conclusions available.

January 11, 2020

I’m terrible with names…but I’m using ontologies to try to be better

There is an old joke in computer science about how there are only two hard things: cache invalidation, naming things, and off-by-one errors. I’ll leave aside the first one as beyond my own expertise, but the second comes up all the time in my work as a biological data scientist. Naming variables and functions in my code is a constant struggle, but one I have to deal with on my own or with my team. Much bigger problems come up when trying to deal with all the various ways that people across the world use names when talking about the diseases they work on, the types of cells they are looking at, the experimental methods they are using, and just about every other aspect of their studies.

March 2, 2022

Full: Data Lab Reproducible Research Practices Workshop, Philadelphia, October 24-25, 2023

Applications are open for the Data Lab's next workshop! We will be holding a Reproducible Research Practices Course in-person on October 24-25, 2023. Instructors will introduce principles and techniques to achieve reproducible results in computational cancer research. We’ll show you the fundamentals of commonly-used approaches in reproducibility that you can apply to increase the impact of your research by making your findings more robust and reliable! To ensure that workshop attendees have a great hands-on experience, there will be a very limited number of seats available.

March 2, 2022

Collaborating with the Data Lab on OpenPBTA shaped how our team works reproducibly

At the Center for Data-Driven Discovery in Biomedicine (D3b), I lead the Bioinformatics Translational Pediatric Oncology Team, a team of bioinformatics scientists. Our mission is to advance pediatric oncology research and precision medicine through collaboration and development of open-source analytical tools, frameworks, and data resources. In 1998, I lost my four year old cousin John Matthew to a brain tumor we now know was likely a diffuse intrinsic pontine glioma. So, it was bittersweet for me to see the Open Pediatric Brain Tumor Atlas (OpenPBTA) manuscript published in Cell Genomics on the last day of brain tumor awareness month this past year. But let’s rewind.

January 11, 2020

Don't Make Me Read: Tips for Writing Effective Documentation

Writing effective documentation is challenging. Users might not always read every word in the documentation. They might even just scroll past large chunks of text, but we can accommodate those behaviors by structuring and formatting content appropriately.

March 2, 2022

The Single-cell Pediatric Cancer Atlas (ScPCA) Portal is now accepting dataset submissions!

In 2019, Alex’s Lemonade Stand Foundation (ALSF) established the Single-cell Pediatric Cancer Atlas (ScPCA) through awards for data generation and to create an atlas of single-cell gene expression profiles of pediatric cancers of different types and from different organ sites. The Data Lab launched the ScPCA Portal in 2022 to make uniformly processed, summarized single-cell and single-nuclei RNA-seq data and de-identified metadata available for download. The ScPCA Portal also supports other data modalities, such as bulk RNA-seq, CITE-seq, and spatial transcriptomics. The ScPCA Portal currently hosts data for over 500 pediatric tumor and patient-derived xenograft samples from more than 50 cancer types, and continues to grow. The Data Lab is seeking contributions to the ScPCA Portal from researchers with existing single-cell datasets.

March 2, 2022

Full: Data Lab Single-Cell RNA-Seq Workshop, Philadelphia area, June 13-15, 2023

We are excited to announce that our next workshop, Introduction to Single-cell RNA-Seq, will take place in-person from June 13-15, 2023! Data Lab staff will introduce researchers studying pediatric cancer to the R programming language, the Tidyverse R packages for data science, single-cell RNA-seq data analysis, annotating cell types, and more. The 3-day course will take place from 9am-5pm Eastern time in Bala Cynwyd, PA, just outside of Philadelphia. Travel reimbursement (up to a certain amount) is available for qualifying participants.

March 2, 2022

Downstream Analysis Workflows – do you have a list of genes whose expression you are particularly interested in?

The Childhood Cancer Data Lab maintains a collection of uniformly processed single-cell data from pediatric cancer clinical samples and xenografts in the Single-cell Pediatric Cancer Atlas (ScPCA) Portal. Although access to preprocessed data saves researchers time, we know that the downloads from the ScPCA Portal are only the starting point. That’s why we’ve created downstream analysis workflows for commonly performed analyses. Instead of writing code wholesale, you can analyze data once you’ve configured these workflows.

March 2, 2022

Full: Data Lab Single-Cell RNA-Seq Workshop, Virtual, May 15-19, 2023

We are excited to announce that our next virtual workshop, Introduction to Single-cell RNA-Seq, will run from May 15-19, 2023! In this workshop, Data Lab staff will introduce researchers studying pediatric cancer to the R programming language, the Tidyverse R packages for data science, single-cell RNA-seq data analysis, and annotating cell types.

May 4, 2022

Creating an open source workflow to uniformly process data for the Single-cell Pediatric Cancer Atlas portal

Last year, the Data Lab launched the Single-cell Pediatric Cancer Atlas (ScPCA) Portal, which today holds uniformly processed single-cell gene expression data obtained from 8 separate labs, over 480 samples, and representing 38 cancer types. The portal is still growing as we continue to receive and process raw data from ScPCA investigators! All uniformly processed data is made available for download on the ScPCA Portal, giving researchers easy access to a growing database of summarized gene expression data and metadata to utilize for their own research. But how exactly did we make sure that all of the data was uniformly processed? And how are we able to ensure uniform processing for incoming samples as the portal continues to grow?

March 2, 2022

Visit Alex's Lemonade Stand Foundation at AACR 2023!

Are you attending the American Association for Cancer Research (AACR) annual meeting in Orlando, FL this year? Visit Alex's Lemonade Stand Foundation (ALSF) at booth 369 in the exhibit hall from April 16-19! You'll find information about ALSF's grants program, the Childhood Cancer Data Lab and more. The Data Lab will also be holding office hours during select time slots.

March 2, 2022

Full: Data Lab Advanced Single-Cell RNA-Seq Workshop, Virtual, March 13-17, 2023

The Data Lab is excited to announce that our next training workshop will be held virtually from March 13-17, 2023! During this workshop, we will cover advanced topics in the analysis of single-cell RNA-seq data for researchers studying pediatric cancer. The workshop will take place each day from 12-5pm Eastern. Each day consists of lectures and designated time for attendees to work on exercise materials and their own projects with our staff available for consultation. You’ll need a laptop with internet access and to install Zoom and Slack. You will log into an RStudio Server hosted by the Data Lab from your web browser. Pediatric cancer researchers are encouraged to apply now!

March 2, 2022

Lessons learned from working reproducibly with others

In September 2022, the Open Pediatric Brain Tumor Atlas (OpenPBTA) project culminated (for now) in a preprint on bioRxiv. This project, started in late 2019 and co-organized with the Center for Data Driven Discovery in Biomedicine (D3b) at Children’s Hospital of Philadelphia (CHOP), is a collaborative effort to comprehensively describe the Pediatric Brain Tumor Atlas (PBTA), a collection of multiple data types from tens of tumor types (read more about why crowdsourcing expertise for the study of pediatric brain tumors is important here). The project is designed to allow for contributions from experts across multiple institutions. We’ve conducted analysis and drafting of the manuscript openly on the version-control platform GitHub from the project’s inception to facilitate those contributions.

May 4, 2022

A clustering analysis workflow for use with your ScPCA dataset!

Recently, we told you about the Single-cell Pediatric Cancer Atlas (ScPCA) downstream analysis workflow. This ready-to-go workflow is intended to be used with single-cell and single-nuclei gene expression data available on the ScPCA Portal. We developed this workflow to filter, normalize, and perform dimensionality reduction, as well as incorporate initial clustering results to each processed sample/library object. Now we’re excited to introduce one of our latest offerings for use with ScPCA data, a clustering analysis workflow, which can be applied to datasets after running the filtering, normalization, and dimensionality reduction workflow!

March 2, 2022

Full: Data Lab Advanced Single-Cell RNA-Seq Workshop, Philadelphia area, January 31-February 2, 2023

The Data Lab is excited to announce that our next training workshop will be held in-person from January 31-February 2, 2023! During this workshop, we will cover advanced topics in the analysis of single-cell RNA-seq data for researchers studying pediatric cancer. The 3-day course will take place from 9am-5pm Eastern time in Bala Cynwyd, PA, just outside of Philadelphia. Travel reimbursement is available for qualifying participants.

January 11, 2020

Scientific Community Bulletin: What’s happening in December?

Welcome to the Data Lab’s December Scientific Community Bulletin! Each month we share upcoming opportunities from Alex’s Lemonade Stand Foundation (ALSF), the Data Lab, and other events that we have gathered from a variety of science and research organizations. Subscribe to our blog to be alerted about future Scientific Community Bulletin posts!

March 2, 2022

refine.bio refactoring and Web Accessibility

In this blog post, I’d like to give an overview of the refine.bio refactoring process and web accessibility considerations. Through this process, our goal is to enhance the site usability and performance by improving the code quality and making the application more accessible. But before going into more details about them, let me provide you a quick history of refine.bio.

January 11, 2020

Scientific Community Bulletin: What’s happening in November?

Welcome to the Data Lab’s November Scientific Community Bulletin! Each month we share upcoming opportunities from Alex’s Lemonade Stand Foundation (ALSF), the Data Lab, and other events that we have gathered from a variety of science and research organizations. Subscribe to our blog to be alerted about future Scientific Community Bulletin posts!

January 11, 2020

Cataloging the CCDI Childhood Cancer Data Catalog (CCDC)

Here at the Data Lab, we're all about, well, data! We believe that data sharing and accessibility is key to accelerating the research process, and ultimately to improving outcomes for childhood cancer patients. So, we were excited to learn that one of the goals of the NCI/NIH initiative, the Childhood Cancer Data Initiative (CCDI), is to build up a Data Ecosystem that will facilitate pediatric cancer researchers' ability to explore and collect data from disparate resources. Although this Ecosystem is still in the early stages, several components are already being developed and are available for researchers to use! One component that is particularly interesting to us is the CCDI's Childhood Cancer Data Catalog (CCDC).

January 11, 2020

Scientific Community Bulletin: What’s happening in October?

Welcome to the October Scientific Community Bulletin! Each month we share upcoming opportunities from Alex’s Lemonade Stand Foundation (ALSF), the Data Lab, and other events that we have gathered from a variety of science and research organizations. Subscribe to our blog to be alerted about future Scientific Community Bulletin posts!

January 11, 2020

Scientific Community Bulletin: What's happening in September?

Welcome to the September Scientific Community Bulletin! Each month we share upcoming opportunities from Alex’s Lemonade Stand Foundation (ALSF), the Data Lab, and other events that we have gathered from a variety of science and research organizations. Subscribe to our blog to be alerted about future Scientific Community Bulletin posts!

March 2, 2022

Introducing the ScPCA downstream analysis workflow!

At the Data Lab, we are constantly looking for ways to enhance the tools we build for pediatric cancer researchers. Earlier this year, we launched the Single-cell Pediatric Cancer Atlas portal, a database of uniformly-processed single-cell data from pediatric cancer clinical samples. One way we felt the portal could be even more beneficial to pediatric cancer researchers is with a ready-to-go workflow that takes in single-cell data and prepares it for downstream analyses such as unsupervised clustering.

March 2, 2022

Full: Data Lab Single-Cell RNA-Seq Workshop, Virtual, September 19-23, 2022

The Data Lab is excited to announce our next virtual workshop running from September 19-23, 2022! In this workshop, Data Lab staff will introduce researchers studying pediatric cancer to the R programming language, the Tidyverse R packages for data science, single-cell RNA-seq data analysis, and pathway analysis.

March 2, 2022

Teaching with live coding in R and RStudio

The Data Lab teaches data science courses targeted toward pediatric cancer researchers that introduce topics such as analysis of gene expression in bulk and single-cell data and principles of reproducible research. I wrote previously about how we use RStudio Server for our remote courses to simplify setup, and I wanted to write a bit more about some of the instructional practices we use so that our participants get the best experience we can provide. In particular, I wanted to talk about our use of live coding to facilitate active learning, and one of the tools we developed to make our course development just a bit easier.

January 11, 2020

Scientific Community Bulletin: What's happening in August?

Welcome to the August Scientific Community Bulletin! Each month we share upcoming opportunities from Alex’s Lemonade Stand Foundation (ALSF), the Data Lab, and other events that we have gathered from a variety of science and research organizations.

January 11, 2020

Queueing Javascript Promises

Often when building a server-client web application, we will encounter a situation where we want to send requests to our API in the chronological order that they occur on the client. Due to the asynchronous nature of these requests, it might not be possible to send them in the same callback for the event that triggered them. This is because we want to use the response from the previous request to craft our current one. A solution to this problem would be to implement a queue. Instead of calling the API immediately after events occur, implementing a queue ensures the latest data is sent with any request.

January 11, 2020

Scientific Community Bulletin: What's happening in July?

Welcome to the July Scientific Community Bulletin! Each month we share upcoming opportunities from Alex’s Lemonade Stand Foundation (ALSF), the Data Lab, and other events that we have gathered from a variety of science and research organizations. Subscribe to our blog to be alerted about future Scientific Community Bulletin posts!

January 11, 2020

Scientific Community Bulletin: What’s happening in June?

Welcome to the Childhood Cancer Data Lab’s new blog feature, the monthly Scientific Community Bulletin! At the start of each month, we will share upcoming opportunities from Alex’s Lemonade Stand Foundation (ALSF), the Data Lab, and other events that we have gathered from a variety of science and research organizations. Our goal is to promote learning opportunities and highlight some of the excellent resources that our community provides.

May 4, 2022

How we use renv to be in two places at once

At the Data Lab, our science team has a practice where an individual team member shares something that they recently figured out (or didn’t totally figure out yet) on a biweekly basis. We call this short 5-10 minute presentation How I Solved This, and it’s a great way to formally share (often hard-won) knowledge with each other. In this post, we thought we’d share how we solved something with the `renv` package with you.

May 4, 2022

Strategies to center user needs for research tools

The Childhood Cancer Data Lab builds resources guided by the most pressing needs of our primary users: pediatric cancer researchers. As the Data Lab's UX Designer, I conduct research activities with scientists like usability evaluations, semi-structured interviews, and card sorts to gain insight into their activities, processes, pain-points, and behaviors. I work with scientists and engineers at the Data Lab to use this information to improve existing products and services or to create new ones.

March 2, 2022

Data Lab Reproducibility Workshop, Philadelphia area, June 10, 2022

The Data Lab is excited to announce that our next training workshop is taking place in-person on Friday, June 10, 2022! During this full day workshop, instructors will introduce principles and techniques to achieve reproducible results in computational cancer research. We’ll show you the fundamentals of commonly-used approaches in reproducibility that you can apply to increase the impact of your research by making your findings more robust and reliable!

March 2, 2022

Welcome to the Data Lab’s newly renovated website

The Childhood Cancer Data Lab is growing as a resource for pediatric cancer researchers and we have more to offer to our community now, than ever before. Transitioning to our new and improved website is an exciting milestone, and here, we look forward to sharing progress, introducing new initiatives, and cultivating more opportunities to support childhood cancer research. Welcome to our new virtual home!

March 2, 2022

Introducing the Single-cell Pediatric Cancer Atlas (ScPCA) Portal

The Single-cell Pediatric Cancer Atlas (ScPCA) project began in 2019 when Alex’s Lemonade Stand Foundation (ALSF) funded 10 awards for single-cell profiling of pediatric cancer samples. The goal was to produce an atlas of gene expression profiles for a variety of childhood cancer types from different organ sites.

January 11, 2020

How we integrate science and engineering

The CCDL team includes science, engineering, and design expertise. Combining these three disciplines in different ways across projects enables us to carry out our mission.

January 11, 2020

Reflections on the Childhood Cancer Data Initiative Symposium

Here at the CCDL we value putting publicly available data to work. For example, we are currently processing and normalizing 1.5 million publicly available gene expression samples totaling ~$1.5 billion research dollars expended.

January 11, 2020

Pinning transitive R dependencies for fun and reproducible builds

Like many teams that work with large amounts of external software, we run into issues with our transitive dependencies. In general, transitive dependencies are a hard problem to solve.

January 11, 2020

Overcoming the steep data science learning curve in childhood cancer research using workshops

Though technology can introduce great benefit into our lives, it is often accompanied by a substantial amount of time and some expected frustration before we can reap the rewards. The time spent learning a new technology is what we usually call a learning curve.

March 2, 2022

CCDL RNA-Seq Workshop, Philadelphia, PA. Oct 14-16th, 2019

The workshop will last from 9AM to 5PM on October 14th, 15th, and 16th at the CCDL offices at 1429 Walnut St Philadelphia, PA, 19102.

March 2, 2022

How does big data help us tackle childhood cancer?

MultiPLIER is a machine learning approach that brings big data to bear on rare diseases. It’s also an example of the scientific approach and ethos of the CCDL, and the publication is a great opportunity to share how the CCDL is developing new technologies to accelerate research into cures for childhood cancers!

March 2, 2022

CCDL RNA-Seq Workshop, Bay Area, CA. Sept 3-5, 2019

The Childhood Cancer Data Lab powered by Alex's Lemonade Stand Foundation is hosting a workshop to introduce childhood cancer researchers to reproducible analysis of bulk and single-cell transcriptomic data.

January 11, 2020

17 Reasons to Work at the CCDL

The Childhood Cancer Data Lab (CCDL), an initiative of Alex's Lemonade Stand Foundation develops tools, trainings, and methods to empower childhood cancer researchers. The work at the CCDL is focused and impactful. There are multiple opportunities and challenges for you to apply and grow your skills as a scientist or as an engineer.

March 2, 2022

CCDL RNA-Seq Workshop, Chicago, IL. June 24-26, 2019

The Childhood Cancer Data Lab powered by Alex's Lemonade Stand Foundation is hosting a workshop to introduce childhood cancer researchers to reproducible analysis of bulk and single-cell transcriptomic data.

January 11, 2020

The Workshop that Turns Researchers into Data Wizards

At this hands-on, 3-day session held in Houston, researchers learned data science skills that could accelerate their own work. Drawing on skills learned at the workshop, childhood cancer researchers can perform basic analyses of their work to make informed decisions on how to proceed with their own research. Don’t just take our word for it, though. Read more about the workshop’s incredibly valuable benefits through its attendees’ perspectives.

January 11, 2020

Gene Expression Repositories Explained

The goal of our refine.bio project is to download, process, and make available gene expression datasets that can be analyzed together, or in parts, depending on a researcher’s need. Childhood cancer researchers need to be able to use data generated through multiple profiling technologies including microarrays and RNA-sequencing.

January 11, 2020

Better Logging in Python

There are countless log blog posts out there about the benefits of good logging, how to log well, and how much to log. Going through them all can be a real log blog slog. Wouldn't it be cool if you could log like this:logger.info("Something happened!", job=job.id, user=user.id) and get an easily searchable output.

January 11, 2020

Method for the preparation of a caffeine-containing solution from dehydrated magic beans

Caffeine is a stimulant that can induce alertness in certain individuals when consumed at an appropriate quantity. Caffeine is often obtained by ingesting caffeine-containing solutions. However, no protocol for obtaining caffeine from dehydrated, roasted beans using materials typically available in a Philadelphia office has been described in the published literature.

January 11, 2020

Why ALSF Views Resource Sharing as Important

Alex’s Lemonade Stand Foundation (ALSF) staunchly believes that stronger scientific sharing practices will accelerate the pace of discovery and finding cures for children with cancer. Robust sharing improves reproducibility, minimizes redundant studies and maximizes our return on research investment.

March 2, 2022

Does Bulk Tissue Still Belong in a Single-Cell Atlas?

Earlier this year, Alex’s Lemonade Stand Foundation identified single-cell gene expression profiling as an opportunity to build an atlas of cell types within tumors that could be broadly reused by pediatric cancer researchers.

January 11, 2020

2019 In Review: Highlights from the CCDL

This year was a big one for the CCDL. In our mission to empower pediatric cancer experts poised for big discoveries with the knowledge, data and methods to reach them we launched a software product, developed and delivered training workshops on single-cell and bulk RNA-seq analysis, and hired our data science team among other milestones.

March 2, 2022

Exploring neurofibromatosis data with refine.bio

I’m a scientist at Sage Bionetworks, a nonprofit research organization in Seattle, WA. My work focuses on a family of rare pediatric diseases (NF): neurofibromatosis type 1, type 2, and schwannomatosis.

January 11, 2020

How we set goals

Our particular process is designed to source opportunities from our team members and external stakeholders, convert those opportunities into a set of potential goals, and then select the goals that we expect will most advance our mission.

January 11, 2020

Automatic scroll restoration in Single Page Applications (SPA)

The ability to restore scroll position is often critical for website usability. It helps users keep the flow of navigation when going back and forth between different pages. Most modern browsers take care of restoring the scroll position automatically, but it doesn’t always work for Single Page Applications where the content is generated on the client’s side, often asynchronously.

March 2, 2022

Carnegie Mellon University Libraries RNA-Seq Workshop, Pittsburgh PA

Carnegie Mellon University Libraries is partnering with the Childhood Cancer Data Lab (CCDL), founded by Alex’s Lemonade Stand Foundation, to host a Data Analysis workshop using CCDL materials.

March 2, 2022

POSTPONED: Visit the Childhood Cancer Data Lab at Booth 1601 at AACR 2020!

The CCDL will have a team of scientists at the American Association for Cancer Research 2020 Annual Meeting in sunny San Diego! Our team members are excited to talk to researchers studying pediatric cancer at Booth 1601.

March 2, 2022

3 things the CCDL is doing right now to keep pediatric cancer research moving forward

To help keep pediatric cancer research moving forward, here are 3 ways the CCDL is helping the research community during this time: refine.bio, virtual workshops, and the Open Pediatric Brain Tumor Atlas project.

March 2, 2022

Full: CCDL RNA-Seq Workshop, Virtual Pilot, May 4-8th, 2020

We know that pandemic-related university closures mean that the demand for opportunities for pediatric cancer researchers to increase their analytical skills has never been higher. As such, we are delighted to announce a pilot virtual workshop running from May 4-8, 2020!

March 2, 2022

OpenPBTA: Someone is wrong on the internet and it’s probably us (updated 9-9-2020)

Here at the Childhood Cancer Data Lab, we value transparency and the practice of open science. Much of the work we’ve done and the products that we build hinge on the generosity and openness of other scientists. In this post, as part of National Brain Tumor Awareness month, we want to talk about a project that our science team has been working on over the last few months (and to do so in a way that aligns with our values).

March 2, 2022

Full: CCDL RNA-Seq Workshop, Virtual, June 22nd - 26th, 2020

The workshop will take place on June 22 - 26, 2020 from noon - 5pm Eastern. Each day consists of lectures and designated time for attendees to work on exercise materials and their own projects with CCDL staff available for consultation.

January 11, 2020

How we train: Going remote

When the CCDL (along with everyone else) realized that we would have to conduct our bioinformatics training workshops remotely, we had to make some quick decisions about how we were going to do it. Most of the instructional materials for our in person workshops were already online, so we knew we had a good base to work from. We just needed to figure how to adapt the live instruction.

March 2, 2022

The Hack for NF Event in October/November 2020

At Alex’s Lemonade Stand Foundation’s Childhood Cancer Data Lab, we’re excited to be helping out with an upcoming event hosted by the Children’s Tumor Foundation. If you participate, you may meet members of our team who are mentoring and judging.

March 2, 2022

Full: CCDL RNA-Seq Workshop, Virtual, March 22nd - 26th, 2021

The workshop will take place on March 22 - 26, 2021 from noon - 5pm Eastern. Each day consists of lectures and designated time for attendees to work on exercise materials and their own projects with CCDL staff available for consultation

March 2, 2022

Full: CCDL Single-Cell RNA-Seq Workshop, Virtual, June 28th - July 2nd, 2021

The workshop will take place on June 28- July 2, 2021 from noon to 5pm eastern. Each day consists of lectures and designated time for attendees to work on exercise materials and their own projects with CCDL staff available for consultation.

March 2, 2022

The Hack4Rare Event in June/July 2021

Hack4Rare is a virtual event that calls for healthcare startups, developers, solutions architects, and hackathon enthusiasts to join researchers, clinicians and patients in developing solutions built around a number of rare diseases including neurofibromatosis, PTEN Hamartoma Tumor Syndrome, RASopathies and Desmoid Tumors.

March 2, 2022

CCDL RNA-Seq Workshop, Houston TX. March 27-29, 2019

March 2, 2022

Full: CCDL RNA-Seq Workshop, Virtual, September 20th - 24th, 2021

The workshop will take place on September 20 - 24, 2021 from noon - 5pm Eastern. Each day consists of lectures and designated time for attendees to work on exercise materials and their own projects with CCDL staff available for consultation.

March 2, 2022

Introducing Example Analyses for Use with refine.bio Data

Introducing refine.bio examples. Here, users can access a variety of example analyses implemented in R, such as clustering and heat maps, differential expression analysis, and pathway analysis, for use with refine.bio data.