Turning big data into cancer research insights with the Genomic Data Commons

Lab Genomic data gives researchers enormous power to uncover how genomic changes drive cancer formation, metastasis, drug response, and recurrence. However, one of the challenges of identifying genetic abnormalities is that the cancer “drivers,” which promote tumor growth, naturally occur in a long-tail distribution. This means that many patients with cancer have causal genomic changes that only occur in a very small percentage of cancers. Characterizing these rare events requires big data.

Working with big data presents significant obstacles. One impediment is that large datasets cannot be downloaded and manipulated without expensive infrastructure and advanced tools, preventing some researchers from working with them. Another hurdle is that data generated from different institutions are often stored separately and cannot be directly compared or combined. To reduce the burden of cancer, we need diverse scientists from a wide range of disciplines and institutions working with large, compatible data.

The NCI launched the Genomic Data Commons (GDC) in June 2016 to bring cancer genomic datasets and associated clinical data into one location and to promote access by the research community.

The GDC accepts data submissions from NCI programs and independent groups such as clinical research consortia, companies, and advocacy organizations in order to increase the research community’s power of discovery. To make data generated by different research groups compatible, the GDC “harmonizes” the data by aligning them to the same reference genome (GRCh38). Since its launch, both Foundation Medicine, Inc, and the Multiple Myeloma Research Foundation have agreed to submit data to the GDC, adding to the petabytes of data already available in the GDC from The Cancer Genome Atlas (TCGA), and Therapeutically Applicable Research to Generate Effective Treatments (TARGET).

The GDC is also promoting data access by diverse researchers by providing online data visualization tools, a BAM file “slicing” function that reduces download size, and collaborating with NCI’s Cancer Genomics Cloud Pilots to enable data access in the cloud.

To learn more about the GDC, visit: https://www.cancer.gov/about-nci/organization/ccg/programs/gdc or the dedicated GDC tool page on LabWorm

This is guest blog contributed by the NCI press office.  Featured picture by Robert Kozloff (NCI visuals database)