2019 In Review: Highlights from the CCDL
This year was a big one for the CCDL. In our mission to empower pediatric cancer experts poised for big discoveries with the knowledge, data and methods to reach them we launched a software product, developed and delivered training workshops on single-cell and bulk RNA-seq analysis, and hired our data science team among other milestones. For our final blog post of 2019, I’ve asked each member of the team to share a highlight from their year.
2019 Highlights

Ariel Rodriguez Romero:
A highlight of my year has to be writing the most popular blog post of 2019 for our blog about restoring the scroll position in single-page applications. Now, one could argue that it’s had the most chance to accrue clicks because it was posted in January, but it was also the most accessed post for this month! If you’re reading this post, make sure you don’t miss my thriller as well!

Candace Savonen:
A highlight for the year for me was writing training modules to introduce childhood cancer researchers to the R statistical computing language and single-cell RNA-seq analysis. I also got to train 58 different researchers across workshops in Houston, Chicago, the Bay Area, and Philadelphia! In addition to training, I enjoyed taking a deep dive into single nucleotide variant callers as part of the Open Pediatric Brain Tumor Atlas (OpenPBTA) project, which we’ll be wrapping up next year.

Dr. Casey Greene:
We’ve been part of a few peer reviewed papers this year. In one, we described the landscape of patient derived xenograft models of pediatric cancers, which is now published in Cell Reports. In another, published in Developmental Cell, we helped to outline the importance of an atlas of development at the single-cell level of resolution. However, the publication that reaches highlight status for me is our work on MultiPLIER, now published in Cell Systems. This work was led by Dr. Jaclyn Taroni and describes a new technique that is designed to address key challenges faced by researchers studying rare disease. I won’t re-describe the approach because Dr. Taroni has already written a blog post on the topic (with cupcakes!), but this work was a highlight of my year.

Chante Bethell:
This year I graduated from Rowan University where I worked with Stephanie Spielman. In the CCDL there were a number of highlights. I taught at our Bay Area and Philadelphia workshops and led the Intro to R materials in Philadelphia! A high point for me was going to ISCB’s Rocky Mountain Bioinformatics Conference (ISCB Rocky) where I delivered oral and poster presentations on our training workshops. I won second place in the poster competition!

David S Mejia:
My role with the CCDL started in an interesting way. I found the refine.bio open source project and started contributing. I think my first contribution was this one. Over the summer I continued to contribute. I ended up applying to an open developer position, getting an interview, and starting my job with the team in August. Since then it’s been exciting to launch refine.bio!

Deepa Prasad:
This year has been a year of user research for me. Last year, I learned a great deal about how pediatric cancer researchers gather data, but this year, I had the opportunity to explore other aspects of their work. My main focus was on understanding the attitudes, behaviors, and processes of the pediatric cancer community around sharing resources (a focus for ALSF, including a new resource sharing document for all grant proposals). I conducted over 40 user sessions, including semi-structured interviews, guerilla testing, and co-creation workshops. After wading through hours of audio and video transcription, I gained a much deeper understanding of not only their attitudes and behaviors but also the environment and other factors that influence members of the pediatric cancer community. Look for new CCDL efforts to enhance resource sharing in 2020! My admiration for the work that my users do has grown with the new insights I have gained this year.

Dr. Jaclyn Taroni:
My biggest highlight for the year has been growing our data science team to four members! We’ve been able to increase the number of attendees at our training workshops and efficiently tackle large projects like the OpenPBTA because of our team. I’ve also had opportunities to talk about the CCDL’s work at the national level and locally. I had the great privilege of representing the CCDL at the Childhood Cancer Data Initiative Symposium. I also spoke at some R-Ladies Philly Meetups. My presentation on open science at R-Ladies was included in a guest post for Technical.ly Philly. We even hosted an R-Ladies Philly meetup in the CCDL!

Dr. Josh Shapiro:
I joined the CCDL’s data science team! I’ve been excited to jump into the OpenPBTA project, learning and promoting open science through filing and reviewing many pull requests. Of the analyses I’ve worked on, I enjoyed one for making mutation co-occurrence plots by disease the most. To get to that point I had to suss out some issues in the data where multiple samples came from the same individual, and then I got to make some flexible, visually appealing plots. Another highlight was presenting a talk and poster at ISCB Rocky about the OpenPBTA to recruit more contributors!

Kurt Wheeler:
We launched refine.bio! To get to this point, we built software to download and process public microarray and gene expression data from public repositories (here’s my blog post on a deep dive into repositories and identifiers). We spent a lot of time tuning and optimizing our software against the public repositories, and at this point we can now download and successfully process roughly 2,000 RNA-seq samples per hour! In total, we downloaded and processed petabytes of data. Of the 2 million samples we attempted to analyze, we were able to successfully generate gene expression measurements for about 1.3 million of them. Another big milestone was getting those individual measurements wrapped up into what we call “compendia,” which are collections of all the data we have for an organism. Our compendium of human data is currently 45 GB and contains 430,000 measures for each of more than 14,000 genes. That’s a good-sized matrix!
Everyone:
If we had to pick one thing that was a highlight from the year, and also uniquely us, it would be disseminating our coffee protocol. Every key aspect to our mission (empowering pediatric cancer researchers with knowledge, data, and methods) requires reproducibility. If we wrote analyses that could only be used once, we wouldn’t be able to train others to use them. If we build methods but don’t provide reusable source code around them, other researchers wouldn’t pick them up and use them. Being able to do a one-off bespoke analysis doesn’t advance our mission, which can be different than the way things traditionally work in other research environments. Furthermore, advancing our mission requires disseminating the results of our work. We enjoyed having a more lighthearted moment when we took the same approach to distributing the protocol that we use to make coffee!
Ahead in 2020:
Over the last month we’ve been planning for next year. We recently received a 2019 AWS Imagine Grant from Amazon to support refine.bio. We’ll be using the support to build enhanced metadata filtering capabilities, including metadata elements estimated from gene expression values. We’re also looking forward to processing data from ALSF’s Single-cell Pediatric Cancer Atlas (ScPCA) RFA. As part of the conditions of the ScPCA grants, recipients will share data with the CCDL. We’ll be able process data using software that is as consistent as possible and share the gene expression data and deidentified metadata with the broader community.
This year was a big one for the CCDL. In our mission to empower pediatric cancer experts poised for big discoveries with the knowledge, data and methods to reach them we launched a software product, developed and delivered training workshops on single-cell and bulk RNA-seq analysis, and hired our data science team among other milestones. For our final blog post of 2019, I’ve asked each member of the team to share a highlight from their year.
2019 Highlights

Ariel Rodriguez Romero:
A highlight of my year has to be writing the most popular blog post of 2019 for our blog about restoring the scroll position in single-page applications. Now, one could argue that it’s had the most chance to accrue clicks because it was posted in January, but it was also the most accessed post for this month! If you’re reading this post, make sure you don’t miss my thriller as well!

Candace Savonen:
A highlight for the year for me was writing training modules to introduce childhood cancer researchers to the R statistical computing language and single-cell RNA-seq analysis. I also got to train 58 different researchers across workshops in Houston, Chicago, the Bay Area, and Philadelphia! In addition to training, I enjoyed taking a deep dive into single nucleotide variant callers as part of the Open Pediatric Brain Tumor Atlas (OpenPBTA) project, which we’ll be wrapping up next year.

Dr. Casey Greene:
We’ve been part of a few peer reviewed papers this year. In one, we described the landscape of patient derived xenograft models of pediatric cancers, which is now published in Cell Reports. In another, published in Developmental Cell, we helped to outline the importance of an atlas of development at the single-cell level of resolution. However, the publication that reaches highlight status for me is our work on MultiPLIER, now published in Cell Systems. This work was led by Dr. Jaclyn Taroni and describes a new technique that is designed to address key challenges faced by researchers studying rare disease. I won’t re-describe the approach because Dr. Taroni has already written a blog post on the topic (with cupcakes!), but this work was a highlight of my year.

Chante Bethell:
This year I graduated from Rowan University where I worked with Stephanie Spielman. In the CCDL there were a number of highlights. I taught at our Bay Area and Philadelphia workshops and led the Intro to R materials in Philadelphia! A high point for me was going to ISCB’s Rocky Mountain Bioinformatics Conference (ISCB Rocky) where I delivered oral and poster presentations on our training workshops. I won second place in the poster competition!

David S Mejia:
My role with the CCDL started in an interesting way. I found the refine.bio open source project and started contributing. I think my first contribution was this one. Over the summer I continued to contribute. I ended up applying to an open developer position, getting an interview, and starting my job with the team in August. Since then it’s been exciting to launch refine.bio!

Deepa Prasad:
This year has been a year of user research for me. Last year, I learned a great deal about how pediatric cancer researchers gather data, but this year, I had the opportunity to explore other aspects of their work. My main focus was on understanding the attitudes, behaviors, and processes of the pediatric cancer community around sharing resources (a focus for ALSF, including a new resource sharing document for all grant proposals). I conducted over 40 user sessions, including semi-structured interviews, guerilla testing, and co-creation workshops. After wading through hours of audio and video transcription, I gained a much deeper understanding of not only their attitudes and behaviors but also the environment and other factors that influence members of the pediatric cancer community. Look for new CCDL efforts to enhance resource sharing in 2020! My admiration for the work that my users do has grown with the new insights I have gained this year.

Dr. Jaclyn Taroni:
My biggest highlight for the year has been growing our data science team to four members! We’ve been able to increase the number of attendees at our training workshops and efficiently tackle large projects like the OpenPBTA because of our team. I’ve also had opportunities to talk about the CCDL’s work at the national level and locally. I had the great privilege of representing the CCDL at the Childhood Cancer Data Initiative Symposium. I also spoke at some R-Ladies Philly Meetups. My presentation on open science at R-Ladies was included in a guest post for Technical.ly Philly. We even hosted an R-Ladies Philly meetup in the CCDL!

Dr. Josh Shapiro:
I joined the CCDL’s data science team! I’ve been excited to jump into the OpenPBTA project, learning and promoting open science through filing and reviewing many pull requests. Of the analyses I’ve worked on, I enjoyed one for making mutation co-occurrence plots by disease the most. To get to that point I had to suss out some issues in the data where multiple samples came from the same individual, and then I got to make some flexible, visually appealing plots. Another highlight was presenting a talk and poster at ISCB Rocky about the OpenPBTA to recruit more contributors!

Kurt Wheeler:
We launched refine.bio! To get to this point, we built software to download and process public microarray and gene expression data from public repositories (here’s my blog post on a deep dive into repositories and identifiers). We spent a lot of time tuning and optimizing our software against the public repositories, and at this point we can now download and successfully process roughly 2,000 RNA-seq samples per hour! In total, we downloaded and processed petabytes of data. Of the 2 million samples we attempted to analyze, we were able to successfully generate gene expression measurements for about 1.3 million of them. Another big milestone was getting those individual measurements wrapped up into what we call “compendia,” which are collections of all the data we have for an organism. Our compendium of human data is currently 45 GB and contains 430,000 measures for each of more than 14,000 genes. That’s a good-sized matrix!
Everyone:
If we had to pick one thing that was a highlight from the year, and also uniquely us, it would be disseminating our coffee protocol. Every key aspect to our mission (empowering pediatric cancer researchers with knowledge, data, and methods) requires reproducibility. If we wrote analyses that could only be used once, we wouldn’t be able to train others to use them. If we build methods but don’t provide reusable source code around them, other researchers wouldn’t pick them up and use them. Being able to do a one-off bespoke analysis doesn’t advance our mission, which can be different than the way things traditionally work in other research environments. Furthermore, advancing our mission requires disseminating the results of our work. We enjoyed having a more lighthearted moment when we took the same approach to distributing the protocol that we use to make coffee!
Ahead in 2020:
Over the last month we’ve been planning for next year. We recently received a 2019 AWS Imagine Grant from Amazon to support refine.bio. We’ll be using the support to build enhanced metadata filtering capabilities, including metadata elements estimated from gene expression values. We’re also looking forward to processing data from ALSF’s Single-cell Pediatric Cancer Atlas (ScPCA) RFA. As part of the conditions of the ScPCA grants, recipients will share data with the CCDL. We’ll be able process data using software that is as consistent as possible and share the gene expression data and deidentified metadata with the broader community.

