Thursday, 20 January 2011

Cytoscape - Networking


The rise of network research and visualisation in the past ten years was dramatic. From initial ideas and clunky software programs we now see a number of great open source platforms appearing.

The concept of network visualisation is rather simple, there are two elements, nodes and they have a relationship, called edge. From here both nodes and edges between them are added and complicated systems can be represented in terms of how the identified elements are connected, simple.

Nicholas Christakis talks in his TED talk at the top of his voice about the basics of social networks and outlines the dreams, implicating the power of network.



However this is on the visualisation level, where it looks simple. The real task lies before and after this. How are the nodes and edges actually defined and identified in the run up to the funky visualisation of clusters and groups? This question in both a practical definition sense as well as in a technical sense of how is the input file generated is the real task.

This is to some extend reflected in the file standards of these network visualisation softwares, there aren't any. The whole area might be to young and the big player is missing, like ESRI in GIS or Autodesk in CAD. This might be part of the explanation, but the other part is that the simplicity of node and edge hasn't put pressure on the file formats.

Since last year the Gephi platform is setting standards for this group of open source network visualisation softwares. It offers great functions and juicy looking visuals with a easily manageable interface.

Developed by a consortium of universities and research companies, including the University of California San Francisco and the University of Toronto, comes a second very powerfull and flexible network software called Cytoscape. The software is not new as new, development reaches back to 2002 where version v0.8 was released. Currently version 2.8 is available for download and work on version 3.0 in underway although there is no release date as of now.


DAG
Image taken from cytoscape / Visualization of Gene Ontology Term Tree (DAG). More images can be fond in this flickr group

Cytoscape initially was developed for biology and molecular research, has however developed into a multipurpose network visualisation platform. The software is JAVA based and therefore rund across platforms with a lot of plugIns freely available. Basicaly everyone can contribute their own plugIns.

Cytoscape suport variety of standarts, see above, but for quick and dirty the text or table import is extremely useful. If you have a table or CSV with three columns, defining the start node end node and the type of relationship you are good to go. This addresses some of the issues discussed above.

Running the visualisation algorithms can be processing intensive especially once the network goes above 10'000 nodes. Here Cytoscape performs very good also from a interface perspective. The progress is clearly indicated and each process can be stopped at anytime. Usually it is very stable and would not crash on you all of a sudden, even with large network calculations.

The package comes with a lot of preset layout algorithms. These sets hold the definitions of how the graph is going to evolve and the nodes and edges are laid out. The selection ranges from force directed, weighted to circular or grid layouts. Each preset layout can be fully adjusted.

Regarding the visualisations graphically, here Cytoscape is extremely flexible and every single aspect of the graph can be manually set. This is great and makes for a dramatic flexibility, but on the other hand it is painstakingly difficult and time consuming. Especially if working on a dataset early on and results are not yet clear it is not were you put your effort and ugly visuals can be depressing.
Anyway the great examples on the website should be consulted for motivation.

Some of the other great features include Cytoscape works as a Web Service Client, great search functionality for nodes and edges as well as extensive filter functions, useful not only to hide or show, but to highlight. Furthermore it allows for custom node representation. This means Cytoscape can display images and icons individually for each node. Cytoscape also supports networks within networks, quite a tricky thing.

TNexample_text01g-text02r
Image by urbanTick / Visualisation of two text snippets as a network.

Two things are crucial now that the data is compiled and graphed, the analysis and the output. In both cases Cytoscape is very powerful. Extensive analysis function very detailed spits out numbers and even puts them on graphs. All these tables and calculations can be exported too to further analysis in external packages. Regarding the output of the graph a palett of formats are offered, covering both image formats as well as graphic formats such a s PDF, EPS and SVG.

If you are into network visualisation and keen on a good alternative to established packeges such a Pajek this might be one for you. More recources on Cytoscape can be found on pubMed or GenomeResearch.

No comments: