Over the past decade and a half, scientists have been able to measure biological systems in new ways. For example, while the genome provides a parts list of which genes are present, it’s now also possible to measure which of those genes are actually turned on at any given time. Pioneers in this field drove the expectation that these data should be freely shared.
Since that time, numerous datasets have accrued in public repositories. Numerous efforts by individual labs have demonstrated the promise of analyzing many of these data together. However, it remains challenging to analyze all of these data simultaneously because they have not been harmonized.
The Childhood Cancer Data Lab is currently building the refine.bio project to provide a constantly updated set of harmonized data to the research community. This is an open source effort, and we welcome contributors in all areas from software engineering to bioinformatics. We’re taking advantage of underutilized time on commercial cloud systems to perform this processing in a cost effective and scalable manner. We have a public chat system available for those who wish to discuss the project.
We will use the harmonized datasets produced by the refine.bio to study childhood cancers. And we’re making them available for download to anyone who wants to use them as well.
Participants in the hackathon both performed research into how cancer genomics datasets can be analyzed and developed a new web interface for use by cancer researchers. To date, hundreds of individuals from the Philadelphia area have attended one or more hackathons. More than forty individuals have contributed to the underlying source code for the project.