Open sharing and easy access to DNA sequencing data from clinical samples is limited due to privacy concerns and interoperability difficulties. The lack of sharing impedes the progress of genomics research, affecting all genetic disease research from cancer to rare diseases.
DNAdigest was founded in 2013 for the purpose of promoting and enabling open access, interoperability and secure sharing of genomics data for research. We are developing a portal for custom querying into genomics data repositories, shortening the research time and effort for discovery, access and processing of genomics data.
We started by thinking through some of the challenges to getting good data and who the end users might be. One of the big issues is that long-term data entry and storage is a secondary concern to just doing your work. Other major concerns come from vetting the data, do you trust the sources you're looking at? How do we build trust by evidencing the people and institutions that are using a dataset?
There are too many user types to cover in a one-day workshop, we focused on academic researchers and broke down the types of meta-data they would want to search against and need to see in order to vet the data. Search criteria brought up an interesting point, as there several types of sequencers and alignment tools, the metadata and the UI need to allow researchers to define the pipeline(s) that they're using to derive their answers.
We started sketching out (roughly) some of the types of things that would need to be included in a search (output) UI. And that lead to a variety of questions about how the results might be visualized, and how we would understand the source of the data.