Overcoming the steep data science learning curve in childhood cancer research using workshops

July 25, 2019

CANDACE SAVONEN

Have you ever felt so frustrated trying to learn something new on your computer that you felt like doing this?

If Ron Swanson was studying gene expression, looks like he should have attended a CCDL workshop.

Though technology can introduce great benefit into our lives, it is often accompanied by a substantial amount of time and some expected frustration before we can reap the rewards. The time spent learning a new technology is what we usually call a learning curve.

In our daily lives, overcoming a technology learning curve may resemble teaching my mother how to “snap a chat” or how to use “bookface”. Certainly these learning curves require some time and frustration, but society does not feel a particular need or urgency for my mother to ‘snap a chat’.

The usage of ‘big data’, however, has a particularly steep learning curve and the urgency behind childhood cancer researchers to be able to effectively utilize valuable data is clear and imminent with children and families who count on the next research breakthrough.

The benefit of childhood cancer researchers being able to fully utilize gene expression data is potentially invaluable. On the other hand, the world would be better off if my mother doesn’t learn to snapchat.

Childhood cancer researchers’ time is a valuable resource. Gene expression data analysis was identified by the pediatric research community as one of the most immediate and onerous learning curves so we created workshops to catalyze this learning process. (For a great explanation on what gene expression data is, see Dr. Taroni’s cupcake analogy in this blog).

You may ask why doesn’t the CCDL perform data analysis services for all childhood cancer researchers, since we know how to handle ‘big data’? One of the main motivations behind our approach has to do with scalability. There are nearly 1,000 ALSF grants funding pediatric cancer projects. Let’s estimate that of these even a quarter (250 projects) of these may have some sort of ‘big data’ needs (though we may say everyone could benefit from insights from ‘big data’). The CCDL wouldn’t be able to employ enough data scientists to meet this demand even if we tried!

This model would make it hard for the CCDL to keep up with the demand.

So rather than spread a few data scientists too thin and still not be able to cover this enormous demand, the CCDL opted to equip pediatric experts who are poised for the next big discoveries with more data science skills!

This model is more ‘scalable’ and it’s what the CCDL is implementing.

We also hope that as more researchers are trained, they can use our curricula to host their own workshops and train individuals throughout the childhood cancer research community. This model has often been called “train-the-trainer” and has been used with success by others like the folks at Software Carpentry. You can see this as being the good version of a pyramid scheme.

In the CCDL training workshops, childhood cancer researchers analyze data examples on their own computers, first by following along with the CCDL staff, but then by solving exercises. We keep this curriculum freely available online and update it as we obtain feedback from participants or as the standard practices and recommendations in the field evolve. Near the end of the workshop, time is set aside for researchers to apply what they have learned to their own data they have brought with them. Then, participants present preliminary results of often never-before-analyzed data to the workshop group. (This portion of the workshop is what we at the CCDL often view as the highlight!) For a full example schedule of our gene expression workshop, follow this link.

Our workshops teach the basics of R programming for data analysis. R is a programming language. And just as it is not possible to teach a non-Spanish speaker how to speak fluent Spanish in three days, our workshop cannot turn every researcher into a data science expert. Instead, we aim to equip childhood cancer researchers with a data analysis toolbox by teaching them the basics and how to search for possible solutions to their data analysis questions. In this way, even though we cannot cover every data type or situation that may arise, they will be able to perform initial steps for their analyses and can better communicate with a data science expert when the situation calls for it.

To date, the CCDL has hosted three workshops, which trained 33 researchers in total. All three workshops received overall positive evaluations from the childhood cancer researchers who have participated in them. We’ve also used ideas and requests from the participants to continue developing and honing our curriculum.

The goal of CCDL workshops is to make the scary steep learning curve of data science less scary and less steep so childhood cancer researchers have more tools to help them reach their next breakthrough! Side note: this hypothetical figure was made with ggplot2 - a tool that we teach our pediatric cancer experts how to use in our workshops.

The CCDL is uniquely positioned to help catalyze researchers’ approach the steep learning curve of gene expression analysis. Our goal is to continue developing our curriculum and passing it on to others who can use it to help speed up the process toward finding childhood cancer cures.

If you are a childhood cancer researcher and are interested in joining us for a workshop, sign up now!

Have you ever felt so frustrated trying to learn something new on your computer that you felt like doing this?

If you are a childhood cancer researcher and are interested in joining us for a workshop, sign up now!

Back To Blog

Projects

March 2, 2022

Introducing the first community-contributed datasets on the ScPCA Portal!

In March 2022, we launched the Single-cell Pediatric Cancer Atlas (ScPCA) Portal to make uniformly processed single-cell and single-nuclei RNA-Seq data widely available to the childhood cancer research community. Initially, all data available on the Portal was generated through grants funded by Alex’s Lemonade Stand Foundation (ALSF) as part of the ScPCA project. But enabling access to ALSF-funded data was just the beginning of our vision.Sharing is key to ensuring the Portal’s continued growth. Our sights were set on allowing more pediatric cancer researchers to contribute data to the ScPCA Portal.

Projects

March 2, 2022

Introducing the Open Single-cell Pediatric Cancer Atlas (OpenScPCA) Project!

The Data Lab has just launched the brand new Open Single-cell Pediatric Cancer Atlas (OpenScPCA) project! This open, collaborative project aims to analyze data from the ScPCA Portal, which currently holds 500 samples from over 50 pediatric cancer types. We are seeking contributors with diverse skills and expertise to join the project!

Overcoming the steep data science learning curve in childhood cancer research using workshops

Related Post