Overcoming the steep data science learning curve in childhood cancer research using workshops

July 25, 2019

Have you ever felt so frustrated trying to learn something new on your computer that  you felt like doing this?

If Ron Swanson was studying gene expression, looks like he should have attended a CCDL workshop.

Though technology can introduce great benefit into our lives, it is often accompanied by a substantial amount of time and some expected frustration before we can reap the rewards. The time spent learning a new technology is what we usually call a learning curve.

In our daily lives, overcoming a technology learning curve may resemble teaching my mother how to “snap a chat” or how to use “bookface”. Certainly these learning curves require some time and frustration, but society does not feel a particular need or urgency for my mother to ‘snap a chat’.

The usage of ‘big data’, however, has a particularly steep learning curve and the urgency behind childhood cancer researchers to be able to effectively utilize valuable data is clear and imminent with children and families who count on the next research breakthrough.

The benefit of childhood cancer researchers being able to fully utilize gene expression data is potentially invaluable. On the other hand, the world would be better off if my mother doesn’t learn to snapchat.

Childhood cancer researchers’ time is a valuable resource. Gene expression data analysis was identified by the pediatric research community as one of the most immediate and onerous learning curves  so we created workshops to catalyze this learning process. (For a great explanation on what gene expression data is, see Dr. Taroni’s cupcake analogy in this blog).

You may ask why doesn’t the CCDL perform data analysis services for all childhood cancer researchers, since we know how to handle ‘big data’? One of the main motivations behind our approach has to do with scalability. There are nearly 1,000 ALSF grants funding pediatric cancer projects. Let’s estimate that of these even a quarter (250 projects) of these may have some sort of ‘big data’ needs (though we may say everyone could benefit from insights from ‘big data’). The CCDL wouldn’t be able to employ enough data scientists to meet this demand even if we tried!

           This model would make it hard for the CCDL to keep up with the demand.

So rather than spread a few data scientists too thin and still not be able to cover this enormous demand, the CCDL opted to equip pediatric experts who are poised for the next big discoveries with more data science skills!

This model is more ‘scalable’ and it’s what the CCDL is implementing.

We also hope that as more researchers are trained, they can use our curricula to host their own workshops and train individuals throughout the childhood cancer research community. This model has often been called “train-the-trainer” and has been used with success by others like the folks at Software Carpentry. You can see this as being the good version of a pyramid scheme.

In the CCDL training workshops, childhood cancer researchers analyze data examples on their own computers, first by following along with the CCDL staff, but then by solving exercises. We keep this curriculum freely available online and update it as we obtain feedback from participants or as the standard practices and recommendations in the field evolve. Near the end of the workshop, time is set aside for researchers to apply what they have learned to their own data they have brought with them. Then, participants present preliminary results of often never-before-analyzed data to the workshop group. (This portion of the workshop is what we at the CCDL often view as the highlight!) For a full example schedule of our gene expression workshop, follow this link.

Our workshops teach the basics of R programming for data analysis. R is a programming language. And just as it is not possible to teach a non-Spanish speaker how to speak fluent Spanish in three days, our workshop cannot turn every researcher into a data science expert. Instead, we aim to equip childhood cancer researchers with a data analysis toolbox by teaching them the basics and how to search for possible solutions to their data analysis questions. In this way, even though we cannot cover every data type or situation that may arise, they will be able to perform initial steps for their analyses and can better communicate with a data science expert when the situation calls for it.

To date, the CCDL has hosted three workshops, which trained 33 researchers in total. All three workshops received overall positive evaluations from the childhood cancer researchers who have participated in them. We’ve also used ideas and requests from the participants to continue developing and honing our curriculum.

The goal of CCDL workshops is to make the scary steep learning curve of data science less scary and less steep so childhood cancer researchers have more tools to help them reach their next breakthrough! Side note: this hypothetical figure was made with ggplot2 - a tool that we teach our pediatric cancer experts how to use in our workshops.

The CCDL is uniquely positioned to help catalyze researchers’ approach the steep learning curve of gene expression analysis. Our goal is to continue developing our curriculum and passing it on to others who can use it to help speed up the process toward finding childhood cancer cures.

If you are a childhood cancer researcher and are interested in joining us for a workshop, sign up now!

Have you ever felt so frustrated trying to learn something new on your computer that  you felt like doing this?

If Ron Swanson was studying gene expression, looks like he should have attended a CCDL workshop.

Though technology can introduce great benefit into our lives, it is often accompanied by a substantial amount of time and some expected frustration before we can reap the rewards. The time spent learning a new technology is what we usually call a learning curve.

In our daily lives, overcoming a technology learning curve may resemble teaching my mother how to “snap a chat” or how to use “bookface”. Certainly these learning curves require some time and frustration, but society does not feel a particular need or urgency for my mother to ‘snap a chat’.

The usage of ‘big data’, however, has a particularly steep learning curve and the urgency behind childhood cancer researchers to be able to effectively utilize valuable data is clear and imminent with children and families who count on the next research breakthrough.

The benefit of childhood cancer researchers being able to fully utilize gene expression data is potentially invaluable. On the other hand, the world would be better off if my mother doesn’t learn to snapchat.

Childhood cancer researchers’ time is a valuable resource. Gene expression data analysis was identified by the pediatric research community as one of the most immediate and onerous learning curves  so we created workshops to catalyze this learning process. (For a great explanation on what gene expression data is, see Dr. Taroni’s cupcake analogy in this blog).

You may ask why doesn’t the CCDL perform data analysis services for all childhood cancer researchers, since we know how to handle ‘big data’? One of the main motivations behind our approach has to do with scalability. There are nearly 1,000 ALSF grants funding pediatric cancer projects. Let’s estimate that of these even a quarter (250 projects) of these may have some sort of ‘big data’ needs (though we may say everyone could benefit from insights from ‘big data’). The CCDL wouldn’t be able to employ enough data scientists to meet this demand even if we tried!

           This model would make it hard for the CCDL to keep up with the demand.

So rather than spread a few data scientists too thin and still not be able to cover this enormous demand, the CCDL opted to equip pediatric experts who are poised for the next big discoveries with more data science skills!

This model is more ‘scalable’ and it’s what the CCDL is implementing.

We also hope that as more researchers are trained, they can use our curricula to host their own workshops and train individuals throughout the childhood cancer research community. This model has often been called “train-the-trainer” and has been used with success by others like the folks at Software Carpentry. You can see this as being the good version of a pyramid scheme.

In the CCDL training workshops, childhood cancer researchers analyze data examples on their own computers, first by following along with the CCDL staff, but then by solving exercises. We keep this curriculum freely available online and update it as we obtain feedback from participants or as the standard practices and recommendations in the field evolve. Near the end of the workshop, time is set aside for researchers to apply what they have learned to their own data they have brought with them. Then, participants present preliminary results of often never-before-analyzed data to the workshop group. (This portion of the workshop is what we at the CCDL often view as the highlight!) For a full example schedule of our gene expression workshop, follow this link.

Our workshops teach the basics of R programming for data analysis. R is a programming language. And just as it is not possible to teach a non-Spanish speaker how to speak fluent Spanish in three days, our workshop cannot turn every researcher into a data science expert. Instead, we aim to equip childhood cancer researchers with a data analysis toolbox by teaching them the basics and how to search for possible solutions to their data analysis questions. In this way, even though we cannot cover every data type or situation that may arise, they will be able to perform initial steps for their analyses and can better communicate with a data science expert when the situation calls for it.

To date, the CCDL has hosted three workshops, which trained 33 researchers in total. All three workshops received overall positive evaluations from the childhood cancer researchers who have participated in them. We’ve also used ideas and requests from the participants to continue developing and honing our curriculum.

The goal of CCDL workshops is to make the scary steep learning curve of data science less scary and less steep so childhood cancer researchers have more tools to help them reach their next breakthrough! Side note: this hypothetical figure was made with ggplot2 - a tool that we teach our pediatric cancer experts how to use in our workshops.

The CCDL is uniquely positioned to help catalyze researchers’ approach the steep learning curve of gene expression analysis. Our goal is to continue developing our curriculum and passing it on to others who can use it to help speed up the process toward finding childhood cancer cures.

If you are a childhood cancer researcher and are interested in joining us for a workshop, sign up now!

Back To Blog