Exploring neurofibromatosis data with refine.bio

January 9, 2020

This is a guest post by Robert Allaway.

I’m a scientist at Sage Bionetworks, a nonprofit research organization in Seattle, WA. My work focuses on a family of rare pediatric diseases (NF): neurofibromatosis type 1, type 2, and schwannomatosis. At Sage, my work in this disease area requires wearing a few different hats. I have roles in community outreach (running events like hackathons and writing newsletters), data management (helping build resources like the NF Data Portal and coordinating data for the NF research community), and bioinformatics research (working with wet-lab biologists and clinical collaborators to analyze their data in novel and interesting ways).

Historically, one huge challenge in NF (and in other rare diseases) that affects all three of these roles has been a lack of publicly available data. This lack of data in NF can be attributed to many things, including scientific culture, funding challenges, and a rarity of well-characterized biosamples. The data that does exist can also be scattered across or within different resources. Thanks to a coalition of NF funding organizations/programs (like CTF, NTAP, NFRI, and CDMRP NFRP) this paradigm is rapidly changing - these funders are working hard to incentivize data sharing in NF via a few centralized repositories - and more datasets are becoming available on a regular basis.

Tracking what is newly available is important to me because I need to know what data are out there to fulfill my community outreach and data management roles - that is, to know what data exist, and to share it with others - but also to use these data to conduct my own analyses. refine.bio is one tool that I’ve returned to time and time again to make this easier. I am aware of a lot of the datasets that exist in my niche research area, but I can’t keep track of everything! As a result, I’ll frequently check in with refine.bio to see whether new gene expression datasets are available. When they are, I can easily note which samples I’m interested in, and aggregate them on the fly to create a pre-normalized downloadable dataset (with nicely formatted metadata files)! This lets me hit the ground running and get my work done faster. It’s also really useful from a community management perspective, because I can easily build datasets for other people, like scientific collaborators or hackathon participants.

Longer term, we’re planning on regularly retrieving data from refine.bio so that we can distribute nicely packaged, normalized NF gene expression datasets through the NF Data Portal. Not only are they are easy for people to use for learning and research, but this helps us keep tabs on other data repositories for datasets that we might not already be capturing in the NF Data Portal. I’ve found that refine.bio has been useful to me across multiple facets of my work, and I encourage anyone reading to check it out, and see how it can be useful to you as well!

This is a guest post by Robert Allaway.

I’m a scientist at Sage Bionetworks, a nonprofit research organization in Seattle, WA. My work focuses on a family of rare pediatric diseases (NF): neurofibromatosis type 1, type 2, and schwannomatosis. At Sage, my work in this disease area requires wearing a few different hats. I have roles in community outreach (running events like hackathons and writing newsletters), data management (helping build resources like the NF Data Portal and coordinating data for the NF research community), and bioinformatics research (working with wet-lab biologists and clinical collaborators to analyze their data in novel and interesting ways).

Historically, one huge challenge in NF (and in other rare diseases) that affects all three of these roles has been a lack of publicly available data. This lack of data in NF can be attributed to many things, including scientific culture, funding challenges, and a rarity of well-characterized biosamples. The data that does exist can also be scattered across or within different resources. Thanks to a coalition of NF funding organizations/programs (like CTF, NTAP, NFRI, and CDMRP NFRP) this paradigm is rapidly changing - these funders are working hard to incentivize data sharing in NF via a few centralized repositories - and more datasets are becoming available on a regular basis.

Tracking what is newly available is important to me because I need to know what data are out there to fulfill my community outreach and data management roles - that is, to know what data exist, and to share it with others - but also to use these data to conduct my own analyses. refine.bio is one tool that I’ve returned to time and time again to make this easier. I am aware of a lot of the datasets that exist in my niche research area, but I can’t keep track of everything! As a result, I’ll frequently check in with refine.bio to see whether new gene expression datasets are available. When they are, I can easily note which samples I’m interested in, and aggregate them on the fly to create a pre-normalized downloadable dataset (with nicely formatted metadata files)! This lets me hit the ground running and get my work done faster. It’s also really useful from a community management perspective, because I can easily build datasets for other people, like scientific collaborators or hackathon participants.

Longer term, we’re planning on regularly retrieving data from refine.bio so that we can distribute nicely packaged, normalized NF gene expression datasets through the NF Data Portal. Not only are they are easy for people to use for learning and research, but this helps us keep tabs on other data repositories for datasets that we might not already be capturing in the NF Data Portal. I’ve found that refine.bio has been useful to me across multiple facets of my work, and I encourage anyone reading to check it out, and see how it can be useful to you as well!

Back To Blog