The Network of Cancer Genes, An in-Depth Look; courtesy of the developers
On the ‘On the Line’ section of our blog we recently interviewed the team behind the NCG Tool. To check the full interview, click here.
Below is a more in-depth look at this unique tool, what’s new in the latest 5.0 release, and a few practical pointers on how to put the NCG into good use.
The Network of Cancer Genes (NCG, http://ncg.kcl.ac.uk/) is a manually curated repository of cancer genes derived from the scientific literature. Due to the increasing amount of cancer genomic data, it introduces a more robust procedure to extract cancer genes from published cancer mutational screenings and two curators independently reviewed each publication.
NCG release 5.0 (August 2015) collects 1571 cancer genes from 175 published studies that describe 188 mutational screenings of 13,315 cancer samples from 49 cancer types and 24 primary sites. In addition to collecting cancer genes, NCG also provides information on the experimental validation that supports the role of these genes in cancer and annotates their properties (duplicability, evolutionary origin, expression profile, function and interactions with proteins and miRNAs).
The most important improvement regards the quality of the data and the number of publications reviewed, which more than doubled compared to the previous version. In particular a stricter criteria was adopted to classify a gene as a cancer driver, and evaluate all the NCG 4.0 publications again to make sure that the genes annotated matched these new criteria and that everything was annotated correctly.
To summarize, in NCG 5.0 more publications have been reviewed and have been more restrictive on the genes annotated as cancer drivers.
Other improvements regard the web interface and the code behind it. A new page was added with a list of the cancer cell lines where the gene is expressed, and updated the visualization and the data behind the expression in normal tissues. Also, the search interface was made smarter and reorganized the layout of the query results page, making it more compact and useful.
Practical examples of what can be accomplished with NCG
A researcher can query NCG to identify the context of a gene in cancer. This can be achieved easily from the main page, by using either the single gene or the multiple search form. After clicking “Submit”, the user will be shown a summary of the gene, with hyperlinks to databases of literature and disease mapping, to druggability and compound interactions, and much more.
The query results page also provides information on the systems-level properties of the cancer gene. By clicking on “Cancer Information” the user can get a list of the cancer types where the gene has been reported as a driver.
By clicking on “Duplicability”, the user gets a list of all the paralogs of the gene in the human genome: this information can be useful because mutations in a gene can be functionally compensated by its paralogs.
Another useful information on a cancer gene is its “Orthology” and its evolutionary origins. Previous literature suggested that cancer genes originated early in evolution, before the advent of complex metazoans (Domazet-Loso 2008, 10.1093/molbev/msn214).
The query results page also allows to identify the role of a gene in protein interaction network, by clicking on the “Network Properties” page. This is useful to identify which genes interact with a candidate, and also because it has been shown that cancer genes tend to be protein hubs (Rambaldi 2008, 10.1016/j.tig.2008.06.003).
Additionally, the query results page provides info on the “Expression in Normal Tissues” of a gene. This information is important as it allows to determine whether a gene would be normally be expressed in a given tissue, and helps identifying altered patterns of expression. The data for this page has been updated since the previous version, including GTEx and Protein Atlas.
Similarly, the user can click on “Expression in Cell Lines” to identify in which cancer cell lines the gene is expressed. This info is derived from three major datasets of cell lines and can be used to design functional validation experiments.
Finally, the results page also provides a description of the function of the gene, and of its interaction with miRNAs.