Input Formats

We have aimed to make data input options both flexible and simple. To this end, BioLayout Express3D supports the input of data in a number of different formats:

  1. Regular (.layout, .txt, .tgf)
  2. Cytoscape SIF format (.sif)
  3. Graphml (.graphml)
  4. Matrix (.matrix)
  5. Expression (.expression)
  6. GML (.gml)

These are the basic input formats for BioLayout Express3D graphs. Example graphs of each type are available here. We aimed to make input both flexible and simple. Before we describe the type of data formats currently readable by BioLayoutExpress3D, we will describe some conventions:

  • BioLayout Express3D files are normally text files representing columns of data.
  • Data points are separated by tabs or by spaces.
  • Each node should have a unique identifier.
  • Make sure text entries such as annotations are enclosed by quotations where they contain spaces (e.g. “Protein Kinase Alpha”).
  • Comments may be placed in the file by preceding them with ‘//’.
  • Advanced options such as upfront definition of //NODESIZE or //NODESHAPE are usually placed at the end of the input file.

For a full description of the data input formats for BioLayout Express3D see Manual.

Simple Pairwise

This will create a simple directional network where each line of the input file defines two nodes that are connected to each other by a new edge. If one desires a bi-directional graph then only one direction needs to be added and the other edge can be inferred according to the layout properties panel. The parser will create new nodes in the network as required. Singleton nodes can be added by adding a line where a node connects to itself. The format itself is shown below in bold.

Node1   Node2
NodeA   NodeB
NodeB   NodeC
NodeC   NodeD

In each line an edge is defined by connecting the node described in the first column with the node defined in the second column. The graph constructed here would consist of 4 nodes (A, B, C and D) each connected to the next by a single non-weighted edge. (For the link below please use right mouse click and ‘Save Target As…’ or ‘Save Link As…’ in IE/Mozilla Firefox)

Weighted Pairwise

This is an extension of the simple pairwise format which also adds a weight to each edge. Edge weights may be used for filtering, visualized in terms of colour or edge thickness and will also influence the layout algorithm. Nodes connected by higher weighted edges tend to be closer together in the resulting layout however connectivity is usually the major determinant of edge length. The format is a one-column extension to the previous format adding a single numeric weight. Weights should normally be in linear ranges and in whatever scale is appropriate as they will be re-centered. Non-linear weights can be log-scaled if desired. Negative weights are currently not supported.

Node1   Node2   Weight
NodeA   NodeB   1.0
NodeB   NodeC   0.95
NodeC   NodeD   0.86

(For the link below please use right mouse click and ‘Save Target As…’ or ‘Save Link As…’ in IE/Mozilla Firefox)

Weighted Pairwise With Edge Annotation

This format extends the previous pair-wise weighted format adding support for edge annotation. In this case weighted edges are constructed between node pairs and a pseudonode describing the edge is added to the edge. This allows pairs of nodes to be connected using different edge annotations of different weights. In the example below:

Node1   Node2   Weight  EdgeType
NodeA   NodeB   1.0     "Yeast 2-hybrid"
NodeB   NodeC   0.95    "Co-immunoprecipitation"
NodeC   NodeD   0.86    "Computationally Inferred"
NodeC   NodeD   0.95    "Yeast 2-hybrid"

It should be noted that annotations containing spaces should be quoted in double quotes for the parser to recognise them correctly. Additionally, characters such as quotes and other special characters should be removed from annotation lines.

(For the link below please use right mouse click and ‘Save Target As…’ or ‘Save Link As…’ in IE/Mozilla Firefox)

Cytoscape Simple Interaction Format (SIF)

The Cytoscape Simple Interaction Format (SIF) is compatible with BioLayout Express3D. This format looks as follows:

NodeA   "phosphorylation"   NodeB
NodeA   "binding"           NodeC
NodeB   "phosphorylation"   NodeC
NodeD   "binding"           NodeC

(For the link below please use right mouse click and ‘Save Target As…’ or ‘Save Link As…’ in IE/Mozilla Firefox)

Creation of Class Sets and Classes

A Class Set is a means of defining nodes as belonging different groupings. A Class Set is comprised of one or more Classes to which nodes belong. Once so defined, when a Class Set is selected the nodes belonging to the different Classes will be arbitrarily assigned colours so they can be visibly distinguished/selected in the graph. Examples of Classes in graphs derived from biological relationships might be Gene Ontology terms, statistical hits, genes of interest, species of origin, protein type etc. Each node may belong to only one Class within a Class Set. A file may have any number of associated Class Sets, each of which can contain any number of Classes. It is not required that all nodes are assigned as belonging to a Class in any given Class Set, with nodes not assigned belonging to the class “No Class”.

Class sets and classes may be created in a number of ways. See manual for details.

Graphml files (.graphml)

GraphML is a comprehensive and easy-to-use file format for graphs and consists of a .xml language core to describe the structural properties of a graph and a flexible extension mechanism to add application-specific data. It is used by a number of network editing programs including yEd (yFiles, Tübingen, Germany). This application has been used extensively from the authors for the editing and layout of networks and biological pathways. Any graph drawn using this package and saved as .graphml file can now be loaded directly into BioLayout Express3D.

Please see Example Datasets for sample GraphML files.

Matrix Files (.matrix)

We have recently implemented the support of matrix file import into for BioLayout Express3D. In principle a matrix file may be generated from any source but must have a “.matrix” extension in order for BioLayout Express3D to recognise it. On opening of a file a Matrix CutOff dialog will appear requesting the user to define the threshold above which relationships will be plotted.

A 1.00000 0.98663 0.93504 0.93464 0.92341 0.91745
B 0.98663 1.00000 0.93365 0.92930 0.92165 0.91817
C 0.93504 0.93365 1.00000 0.98991 0.96653 0.96679
D 0.93464 0.92930 0.98991 1.00000 0.96728 0.96799
E 0.92341 0.92165 0.96653 0.96728 1.00000 0.98699
F 0.91745 0.91817 0.96679 0.96799 0.98699 1.00000

Please see Example Datasets for sample Matrix files.

Expression Data Input Format (.expression)

The basic format is a header row, followed by a single row for each probe (set)/gene on the array. Each row must start with the unique identifier of that row (node). Annotation columns or Class Sets (see below) may then follow the identifier (these are optional but very useful), followed finally by the raw data columns which are usually numeric (integer or floating point). Columns are usually tab separated in this format and text entries that include whitespace characters are surrounded by double quotes.

Unique;ProbeID Description Annot.1 Annot.2 Data1 Data2 Data3
Tub;gnf1m00002 _f_at tubulin, alpha 7 Term1 Term1 245.6 278.9 364.6
Il16;gnf1m00009_s_at interleukin 16 Term2 Term1 125 203 235.2
Cul7;gnf1m00122_a_at c cullin 7 Term3 Term2 302 288 134.7

Please see Example Datasets for sample Expression files.

Layout files (.layout)

Layout files are generated by BioLayout Express3D when a graph is saved from the application. A layout file should preserve all the information from the saved network such that when reloaded it is an exact representation of the original graph. The basic format consists of the definition of the node-edge relationships followed by information pertaining to the visual and positional specification of nodes. It will also store associations with data files. See below for a more detailed presentation of the layout file format.

Please see Example Datasets for sample Layout files.

GML files (.gml)

GML (Graph Modelling Language) is a simple textual format used to exchange graphs. The format is widely supported by graphing software. It defines features such as nodes and edges that are used for graph drawing. GML support has been added in BioLayout Express3D Version 3.1.

This example is a graph with a 3D cylindrical form. To see the graph as a cylinder, select the FMMM layout algorithm option in Graph Properties > Layout > Algorithm before opening the file. Alternatively, use the Web Start LAUNCH link below, which opens BioLayout preloaded with a graph created from the example GML file.