Over the past decade and a half, scientists have been able to measure biological systems in new ways. For example, while the genome provides a parts list of which genes are present, it’s now also possible to measure which of those genes are actually turned on at any given time. Pioneers in this field drove the expectation that these data should be freely shared.
Since that time, numerous datasets have accrued in public repositories. Numerous efforts by individual labs have demonstrated the promise of analyzing many of these data together. However, it remains challenging to analyze all of these data simultaneously because they have not been harmonized.
The Childhood Cancer Data Lab is currently building the refine.bio project to provide a constantly updated set of harmonized data to the research community. This is an open source effort, and we welcome contributors in all areas from software engineering to bioinformatics. We’re taking advantage of underutilized time on commercial cloud systems to perform this processing in a cost effective and scalable manner.
We will use the harmonized datasets produced by the refine.bio to study childhood cancers. And we’re making them available for download to anyone who wants to use them as well.