NCG is a web resource to analyze duplicability, orthology, and network properties of cancer genes. Manually curated, it introduces a more robust procedure to extract genes from published cancer mutational screenings. NCG wasn’t just developed to scratch a curiosity itch. In fact, Omer utilized this powerful tool to uncover fascinating biological insights throughout his research.
Brace yourselves for some numbers: NCG collected 1571 cancer genes from 175 published studies that describe 188 mutational screenings of 13,315 cancer samples from 49 cancer types and 24 primary sites. Phew!
Now let’s hear it from them directly.
Guys, welcome to On the Line with LabWorm. Please introduce yourselves.
Giovanni: I am a bioinformatician and data analyst with a background in human population and cancer genetics. My PhD focused on discovering signatures of genetic selection in the human genome, to study how our genome has changed since the appearance of the first anatomically modern humans.
As a postdoc I worked in Francesca Ciccarelli’s lab to learn about the genes involved in cancer, and the systems-level properties that characterize them. Apart from my main academic duties I am a moderator of Biostar, an important questions/answers community for bioinformatics related topics.
“NCG will be useful to researchers interested in interpreting cancer sequencing, developing new methods to identify cancer genes”
Thanos: I am a bioinformatician with a background in evolutionary biology and computational genomics. I am currently a PhD student in Francesca Ciccarelli’s lab in King’s College London and my work revolves around the discovery of patient-specific cancer genes in a number of cancer types. In particular, I am interested in machine learning algorithms and how these can be utilized for the prediction of driver genes in cancer.
Omer: I am a computational biologist with a background in protein structure and function, and lately in cancer genomics. I have worked on understanding the role of somatic copy number variations in cancer, analyzed systems-level properties of cancer genes and exploited these properties to identify novel therapeutic targets by using synthetic lethality approach. I will continue my academic career as a PostDoc at CSI Singapore from 2016 where I will be mainly working on transcriptome analysis.
Who would benefit the most from using NCG?
NCG will be useful to researchers interested in the following questions:
Interpreting cancer sequencing: NCG allows to quickly determine if a gene has already been reported as a driver gene in a given cancer type. This helps interpreting the mutational landscape of a tumor sample, to identify whether the genes mutated have a known role in cancer or if they are likely to be passengers.
Developing new methods to identify cancer genes: NCG provides a curated list of cancer genes. This can be used both as a training list for developing new methods for identifying cancer genes, and as a validation list to evaluate the affectivity of existing methods.
Designing new approaches to target cancer genes: the systems-level properties reported for each cancer gene provide information of each gene, and allow identifying new targets and strategies to target a mutated gene.
In creating NCG, you took a manual rather than computational approach to classify the cancer genes. Can you explain your choice?
Nowadays the number of publications describing cancer screenings is growing so rapidly that it is very difficult for a single researcher to review and study each single publication. We decided to create a central repository that researchers can refer to in order to obtain reliable annotation of cancer genes. To achieve this, we opted for manual curation, minimizing the spurious data that may occur from purely computational approach, and sharing this tedious work among the members of the group.
Can you give us a practical example of what can be accomplished with NCG?
A researcher can query NCG to identify the context of a gene in cancer. This can be achieved easily from the main page, by using either the single gene or the multiple search form. The user will be shown a summary of the gene, with links to databases of literature and disease mapping, druggability and compound interactions, and much more. (For more detailed examples, click here)
Let’s expand our discussion. What role do you perceive bioinformatics will play in life science research, in the near and far future?
Giovanni: The average scientist will be expected to be competent in at least some data analysis techniques, and most of the science will be done computationally instead of in the lab. The availability of large datasets will also facilitate communication with people from outside the life science field. We already see data analysts coming from the financial and marketing sectors working on biological questions, using publicly available data.
“When developing new tools, it is important to study the current literature and try all the existing tools available”
In this context repositories of tools such as LabWorm will be very useful, as they will allow a better organization and standardization of bioinformatics tools, and facilitate the identification of the correct tools for new researchers.
Thanos: Given the large amount of data that is currently produced by research labs around the world, bioinformatics already play a vital role in the analysis, interpretation and visualization of data. Additionally, the decreasing cost of DNA sequencing gives researchers the opportunity to apply cutting-edge technologies to even larger cohorts of patients.
As a matter of fact, personalized medicine initiatives are currently formed with the goal to expand available data on a wide range of rare diseases including cancer. I expect that the next decade will see a dramatic increase in demand for bioinformaticians able to both integrate multiple sources of information and develop new methods for data mining and visualization.
What burning issues do you think need change in science and how would you fix them?
Giovanni: When I started in the bioinformatics field a few years ago, there was a lot of confusion on how to develop bioinformatics pipelines and how to standardize results. Most worrying, many published papers were very difficult to reproduce and they did not include a version history of the analysis or a good testing strategy.
In recent years this situation has improved drastically and the code quality in many publication has increased. This is due to many factors: the development of new libraries (dplyr, bioconductor), the raise of open communities to discuss technical topics (stackoverflow, seqanswers, biostar), and in general an improvement in the quality of teaching. In this direction, I am very happy to see the efforts done by LabWorm and I hope you will succeed in improving the quality of bioinformatics even further.
Do you have any words of wisdom for starting computational biologists out there who want to develop their own research tool?
Giovanni: The most important thing in bioinformatics is to keep training. You should follow at least one online course every quarter, and dedicate at least an hour per week to reading blogs, practicing in Rosalind or similar projects, and following questions/answers website such as stackexchange or biostar.
“The average scientist will be expected to be competent in at least some data analysis techniques”
When developing new tools, it is important to study the current literature and try all the existing tools available before developing something on your own. If the existing tools are difficult to use, you should contact the developer and inform them, so they can improve the documentation and fix bugs. If you decide to develop your own tool, make sure you have other people trying it and that you have an automated testing infrastructure.
Would like to thank some folks who contributed to your research?
We would like to acknowledge all the members of our group, and Alex Mastroggianopolus, a student who helped us review some of the functional validation experiments and gave us important feedback on the NCG interface. We would also like to thank the other members of the lab, Matteo Cereda, Lorena Benedetti, Shruti Sinha, and Gennaro Gambardella for the feedback and patience during the data clubs, as well as our PI Francesca Ciccarelli.
Many thanks guys and keep up the fantastic work you’re doing with NCG. And most importantly, may the Worm be with you!
Want to get better acquainted with the NCG tool? Want to to know how to actually use it? Want to know what’s new in the 5.0 release? Check out our In Depth Look at NCG.
Follow the NCG developers: