On the line this time around is William Duddy, from The Pierre and Marie Curie University, who focuses on making sense of omics data in neuromuscular diseases. With the tragic loss of his brother to Duchenne muscular dystrophy, William is particularly interested in advancing our knowledge about this terrible disease.
Staying closely involved with wet-lab biologists, William discovered the need to associate gene networks to subcellular locations. This led to the development of CellWhere, a tool that structures omic data and is aimed to help anyone that feels lost when viewing a protein-protein interaction network.
Now let’s dive into hear the behind the scenes story of CellWhere and its creator.
Who is CellWhere built for?
I’d say anyone who feels lost when they look at a protein-protein interaction network. It helps a lot to know where a protein might be located in the cell.
“My work is in trying to make sense of omics data in neuromuscular diseases”
It’s an exploration tool. It draws on other databases (e.g. Mentha, Uniprot, the Gene Ontology) to help you to contextualize your gene(s)/protein(s) of interest. What proteins do they interact with? Where might these proteins be within the cell?
What brought you to develop CellWhere?
I was playing around in Cytoscape with protein-protein interaction networks but always struggling to interpret them because they were not anchored into any familiar context. I felt that structuring the graph on subcellular locations would help, but the few existing tools for that were quite limited in their degree of automation and in the depth of localizations that could be displayed. In particular, the process was quite laborious if I wanted to retrieve and highlight specific localizations that were of particular interest to my own research, such as the neuromuscular junction, or the muscle contractile parts. I started to automate some of this process and CellWhere just seemed to grow naturally from there. Eventually I realized that the tool could have general interest, since any localization could be highlighted in this way, not just muscle-related compartments.
What makes CellWhere, which you developed, better than similar existing tools?
It’s easy, free, and informative. It’s (fairly) fast. It’s also quite customizable. Most of all it lets you focus on the localizations that interest you.
Can you give us an example of what can be achieved with CellWhere?
Let’s say I’m working on some mitochondrial disease and I have a list of genes that were differentially expressed in patients. With a few clicks I can screen those genes to tell me if any of their proteins have been annotated to a mitochondrial subcellular location.
“I felt that structuring the graph on subcellular locations would help”
Also, I can see whether any are known to bind to each other and what other binding partners they are known to have, and where all of those binding partners might be localized. As well as being useful for data interpretation, the output graphs can also make for nice figures.
What were some of the challenges you faced while developing CellWhere?
Many proteins are annotated to multiple subcellular localizations but if you made a separate node for each localization and tried to display all of them on the graph it would be largely uninterpretable and (worse?) it would also look horrible! Early versions of CellWhere presented only one localization for each protein, but we were never quite satisfied with this. So, when one of our reviewers asked about this problem, we decided to add a feature giving users the possibility to see these alternative localizations, at least for any protein that had a notably ambiguous annotation list of low-specificity localizations (e.g. 60% of annotations mapping to the membrane, 40% to the cytosol). That’s why when you display localizations based on their annotation frequency, you now have the option to display these alternative localizations for frequencies above 33%. At some point we will extend this to the other localization methods.
Did you find implementation of Cytoscape visual elements difficult?
Not personally, since my own part in the graphical display was minimal! Most of that was done by a talented masters student, Lu Zhu, who is now a PhD student at the University of Bielefeld in Germany.
“It’s free, informative, fast, customizable, and most of all it lets you focus on the localizations that interest you”
You have created a visually appealing website interface for CellWhere. How did you actually develop the website and prepare the designs? What set of tools did you use to build CellWhere?
Thanks very much. Again, I’m new to web design. My motto was “keep it simple!”. It involved a lot of playing around with CSS (and a lot of hair-pulling) to arrive at the design I wanted. I’m a big fan of Redhat’s Openshift service on which CellWhere is hosted – for us, it avoided the headache of long-term server support, and it encourages the use of the Git version control system, which I found crucial for working as a team on the same code.
What were your considerations for putting in the extra effort to develop an API for CellWhere?
Once you have a website set up that uses the http methods POST or GET, it’s then relatively easy to give developers the information they need (a list of fields and their possible values) to submit their own requests to your site. I’m quite new to web design and was pleasantly surprised by how straightforward it was to present this as a basic API.
Do you have any words of wisdom for starting computational biologists out there who want to develop their own research tool?
Well, CellWhere grew out of our own analytical needs, so I guess that this represents one way to discover which tools are missing: by staying closely involved with wet-lab biologists.
Looking ahead, what are your aspirations?
Omics technologies let us measure so many things all at once, so that the potential scope of each dataset tends to range far wider than the scope of the grant application from which it was funded.
“One way to discover which tools are missing: by staying closely involved with wet-lab biologists”
So I’m scared of missing things – I want to make sure that all of the data get deeply analyzed and explored and for research on neuromuscular disorders to benefit as much as possible from the hours and money that are put into the generation of each dataset.
Do you have someone you would like to thank that contributed to your research?
I already mentioned Lu. The other person who was crucial to making CellWhere function robustly was Apostolos Malatras , a PhD student working with us. Apostolos revamped my amateur database efforts so that query results could be returned reasonably quickly. He also set up an automated update feature so that CellWhere always has the latest data from Uniprot and Mentha. Thanks to the Institute of Myology for its support, and to the MyoGrad program. Finally, I would also like to thank LabWorm for taking an interest in our work and for giving CellWhere a go.
Thank you for your time and this inspiring interview William. We wish you great success and may the Worm be with you!