The Cancer Genome Atlas, better known as the TCGA, began as a small pilot and has grown to become a precious resource for researchers and physicians in the field of cancer research. TCGA currently covers 33 cancer types, and harbors over 20,000 individual tumor samples, each contains a wealth of genetic, proteomic, histologic and clinical data. Exploring such unprecedented amounts of data presents promising avenues for cancer research, yet mining it without comprehensive computational skills is an almost impossible task. Luckily, several tools have recently been developed to aid ‘non-programmer’ researchers in exploring and analyzing TCGA data with ease and elegance.
Here are some of the best and most innovative tools to mine TCGA data:
1. Genomic Data Commons Data Portal
GDC data portal is the place to find and download raw and processed data as well as clinical data files from the TCGA (and additional) projects. The portal offers many options to filter the different samples and is quite easy to use, but there is currently no option to analyze the data, and this is where the other tools step into play.
Probably the holy grail when it comes to TCGA analysis tools, cBIoPortal enables scientists to easily explore, analyze and download the datasets. Using cBioPortal one can inspect the enrichment of certain genes, gene groups, mutations or alterations in different cancers, and associate these with certain clinical attributes and survival. The site allows the creation of a unique output type called ‘oncoprint’ to visually showcase these analyses. One can also use cBioPortal to predict co-expression of genes or mutations and observed networks for genes related to certain alteration. This great functionality and frequent updates make this tool a prime selection for beginners as well as advanced users exploring TCGA data.
3. Xena browser
This relatively new tool from UCSC allows to easily create informative heatmaps from TCGA data and display various types of information side by side. Xena also allow to download these sets or analyze them using a built in Galaxy interface.
MEXPRESS keeps things simple and intuitive – select a study and a single gene of interest and obtain a wealth of information arranges in a clear and minimalist interface. This tool shows the expression of the chosen gene across TCGA tumor samples, alongside clinical data and DNA methylation of the gene region that could illustrate important trends. One can also sort the data according to these different parameters and evaluate their statistical correlation with gene expression.
If you are looking to conduct some more ‘hard core’ statistical analysis checkout TCGA2STAT, which allows to import TCGA datasets from the Broad Institute GDAC Firehose data into ready R objects for analysis. This tool has somewhat of a learning curve, and requires knowhow in R, yet it allows to streamline and conduct a robust analysis while keeping the datasets updated.
This is just the beginning
These cherry-picked tools are just part of the ecosystem system that was created in recent years to accommodate TCGA analysis. And as additional large-scale genomic studies are conducted these tools will continue to evolve .
Know additional TCGA analysis tools? Drop us a line at the comment box below..