This blog will usually host technical content such as descriptions of the architecture of software that we build and notes from the team on specific topics. However, before that starts, I want to take the opportunity to say hello. The Childhood Cancer Data Lab was announced alongside the National Cancer Moonshot Summit hosted with Joe and Jill Biden. At the time, the goal was to launch this lab within 12-18 months. This post notes the start of the group. The initial team includes me as well as two software engineers from my lab at Penn who will work on CCDL software, Dongbo Hu and Kurt Wheeler. We’re actively recruiting additional team members and have open searches for a UX Designer, Frontend Developer, a Backend Engineer, and Full Stack Developer.
In the data lab, our goal is to provide the childhood cancer research community with new tools that enable scientists to put data to use. At the outset, we’re particularly focused on publicly available data. This is because there are millions of publicly available genome-wide assays. The primary challenge they pose is that the whole collection of these assays remains difficult to use, in large part because the data aren’t harmonized. This means that results from one set of assays are not directly comparable to results from another set.
We are now in the process of designing, developing, and deploying software that we call refine.bio. This software will run, all the time, on commercial cloud servers. At times, cloud providers will have computers that are not being actively used. They sell time on these machines at a fraction of the normal cost. The refine.bio server will take advantage of these spare computers to process and harmonize public datasets, which will be provided to the research community for free.
The ability to think big and to provide resources for research is one of the reasons that we’re thrilled to be powered by Alex’s Lemonade Stand Foundation. In addition to providing data, the software developed in the Childhood Cancer Data Lab, which we call the CCDL, will be provided under open licenses as we develop it. For example, the Data Refinery source code is available while we’re developing it. This allows others in the community to participate in development, contributing their own technical skills to the fight. As the software is developed, others can extend it or repurpose it for new tasks without having the rebuild the foundations.
We’re also in the planning stages for other projects and initiatives focused on data access and analysis for childhood cancer researchers. We look forward to joining the community in the fight.