Discrete Logic Modelling Optimization to Contextualize Prior Knowledge Gene Regulatory Networks Using PRUNET

Introduction

PRUNET is a user-friendly software tool designed to address the contextualization of a prior knowledge gene regulatory networks (PKN) to specific experimental conditions. As the input, the algorithm takes a PKN and the expression profile of two given stable states or cellular phenotypes. The PKN is iteratively pruned using an evolutionary algorithm to perform an optimization process. This optimization rests in a match between predicted attractors in a discrete logic model (Boolean) and a Booleanized representation of the phenotypes, within a population of alternative subnetworks that evolves iteratively. We validated the algorithm applying PRUNET to three biological examples and using the resulting contextualized networks to predict missing expression values by cross-validation. In addition, we simulated in the contextualized networks well-characterized perturbations for each example to evaluate the dynamical behaviour of the models. The results showed how a fraction of alternative contextualized networks were also suitable for describing the network response under such perturbations.

The general applicability of the implemented algorithm makes PRUNET suitable for a variety of biological processes, for instance cellular reprogramming, immune cell activation or transitions between healthy and disease states.

$Description: I:\xpred website figure.jpg$

PRUNET is particularly suitable to contextualize gene regulatory networks underlying cellular transitions

Description: Macintosh HD:Users:isaaccrespo:Desktop:XPRED_PLOSONE_SUBMISSION:figure_1.tiff

Flow chart of the algorithm. PRUNET takes as the input a prior knowledge gene regulatory network and a Booleanized representation of the gene expression profiles of stable phenotypes. After an iterative network pruning, PRUNET delivers as the output a single (or several) contextualized network(s) optimized to describe the phenotypes according to an adopted dynamical model (Boolean).

PRUNET, constitutes an implementation of the computational method proposed by Crespo et al. [1], which has been entirely implemented using Perl programming language in order to facilitate its modification and its cross-platform use. In addition, PRUNET provides a graphical user interface (GUI) to facilitate the interaction with the user.

PRUNET:

Download page

Links:

Computational Biology Group @LCSB

Requirements:

Installation

No installation is required to run the executables for Windows, linux and Mac OS platforms. There are versions for 32 and 64 bits architecture.

If the user prefers to run the perl program, then ActivePerl 5.14 or later and the perl package Tk-804.027 should be installed. In addition, running the source code in Mac OS also requires the installation of X11.app.

ActivePerl has an installer that reduces the installation process to a double click on the corresponding icon. In order to install Tk packages you should type in the command line,

>ppm install Tk

Usage

To run the executable version, just double click on the icon to initiate the graphical user interface.

To run the perl program, you should open a terminal, navigate to the directory where the perl program is located and type in the command line:

>perl PRUNET.pl

That is going to initiate the graphical user interface. Then, users have to provide the program with two different input files in plain text: a) network file, and b) list of differentially expressed genes.

Input files:

a) Network file. Network format consists of three columns separated by spaces. First column corresponds to the name of the source gene. Second column corresponds to the type of interaction, either ‘activation’ or ‘inhibition’, represented by ‘->’ and ‘-|’ respectively; each interaction should be in a different line. Third column corresponds to the name of the target gene. Example:

SNAI1 -> ZEB1

SNAI1 -> ZEB2

SNAI1 -| CDH1

…

b) List of differentially expressed genes. The format of this file should consist of two columns separated by spaces. First column corresponds to the name of the gene differentially expressed. Second column corresponds to the expression state (‘UP’ and ‘DOWN’ for up- and down-regulated genes respectively). Example:

SNAI1 UP

ZEB1 UP

ZEB2 UP

CDH1 DOWN

…

Once the input files have been loaded users should enter the contextualization options. Some values (maximum iterations, population size, selection number and elitism number) are provided by default as basic configuration but can be changed. A check-box opens (when active) new options for an advanced configuration.

Output files:

Contextualized networks can be saved in a single file containing all the networks within the final population separated by headers with the name of the network (Contextualized network 1, Contextualized network 2…). The network format is the same as in the input file with three columns and each interaction in a different line.

Predicted expression values can be saved in text file with the following format in three columns: column 1) name of the gene, column 2) predicted state, and column 3) frequency of such prediction among the selected contextualized networks.

Brief explanation about basic configuration parameters

Maximum iterations. This parameter refers to the maximum number of times the algorithm is going to be recursively applied or, in evolutionary terms, the number of subnetwork generations that are going to be sampled, scored and selected in order to yield better networks. It is the maximum because, despite users being able to stop the optimization process at anytime and collect ‘partial’ results, the program keeps working until it reaches this iteration number. In general, a bigger population size requires a higher number of iterations to obtain convergence (a final population of similar subnetworks).

Population size. This parameter refers to the number of subnetworks generated in each iteration. A high population size decreases the probability of a local optimum being reached but increases the computation time.

Selection number. This parameter refers to the number of top-scored subnetworks selected in each iteration. If the selection number is low the convergence to a population of similar subnetworks is quicker, but the optimization process could be slowed down due to the lack of variability.

Elitism number. This parameter refers to the number of the best historical subnetworks (in all iterations) that are directly transferred from one generation to the next to prevent the loss of best scoring subnetworks. We suggest using an elitism number not higher than half of the selection number in order to provide an optimization process with a degree of freedom.

Brief explanation about advanced configuration parameters

Updating scheme. When assuming the Boolean dynamical system an updating scheme has to be adopted. Synchronous updating scheme considers that all the genes that change from one step to the next change at the same time, whereas asynchronous updating scheme does not [2]. PRUNET offers the option of both synchronous and asynchronous sequential updating schemes. The latter one requires an updating sequence that should be provided by the user. In such sequence, the order is determined by the response time under regulation, which is the delay between the signal from the gene regulators and reaching functional levels of the gene product. Genes on the top of this list should have faster response than those on the bottom. When selecting asynchronous updating scheme the program provide a list by default without biological meaning (genes in alphabetic order). The user should replace this list with a valid one. In absence of information about gene responses, we suggest to use the synchronous updating scheme. Eventually, attractors computed using different updating schemes (and different updating sequence) could be different.

Fixed interactions. If the user is very confident on specific interactions, PRUNET allows to maintain fixed such interactions during the optimization process, so they will be included within all optimized subnetworks. The users only have to copy and paste in the corresponding text-box the interactions to be preserved.

Type of network. PRUNET allows working with fixed and variable networks. Fixed network means that both initial and final stable cellular phenotypes should correspond with different attractors of a unique network, whereas variable networks refer to the possibility of changes in network topology. In the latter case the algorithm is applied independently to optimize network topology to explain separately both stable cellular phenotypes, resulting in two populations of optimized networks. Despite we know that in reality certain re-wiring could happen and variable networks are closer to real biological processes, the main drawback when applying this concept for network contextualization purposes is the increase in the number of alternative solutions (subnetworks) equally capable of explaining experimental data, given that the dynamical model only have to explain one attractor. Our suggestion would be considering fixed networks unless you have experimental validation for loss or gain of specific interactions (which could be preserved as it is described above).

Type of optimization. PRUNET allows running the optimization process by sampling the probability distribution of positive circuits and independent edges (multivariate) or just independent edges (univariate). In the first case, considering interactions within positive circuits as unique entities allows capturing the interdependency between variables (interactions) regarding their contribution to the network stability; only if all the interactions within the positive circuit are present a multistable behavior is possible. Our suggestion would be to use the multivariate option only if the network is considered fixed (so multistability is required) and the network has been reconstructed based on physical interactions. If the network is a network of influence some of the circuits could be artifacts and it becomes tricky to constraint the optimization process based on such circuits that are abstractions of the flow of information along the network.

Examples

PRUNET includes three biological examples to help the user to get familiar with the formats and usage.

EMT. A transient phenomenon called epithelial to mesenchymal transition (EMT) occurs during both regular embryonic development and as a part of the metastatic cascade initiated by the breakdown of epithelial cell homeostasis in carcinomas. During the EMT, cells change their transcriptional programs, which leads to phenotypic and functional alterations, including the loss of epithelial features like cell-cell adhesions and cell polarity and the gain of cell motility and mesenchymal and stem-like properties. The EMT example consists of a previously published [3] regulatory core, with six genes and fifteen interactions and the phenotypes for the epithelial and mesenchymal states.

Th1-Th2. T lymphocytes are immune cells that can be roughly classified in two main categories: T-helper and T-cytotoxic. T-helper cells take part in cell- and antibody-mediated immune responses and they are sub-divided in Th0 (precursor), effector Th1, Th2, Th17 and Treg. The Th1-Th2 example includes the previousy published network [4], with thirty-six genes and sixty interactions, and the Th1 and Th2 phenotypes.

iPSC. Somatic cells can be converted back to the stem cell states (the so called induced pluripotent stem cells or iPSC) by overexpressing three to four genes in the somatic cells essential for regulating pluripotency. The iPSC example includes a previously published network [5] of 52 nodes and 123 interactions and the phenotypes corresponding to ‘differentiated’ and ‘pluripotent’.

Quick start

FAQ

Requirements

Downloads

References

1. Crespo I, Krishna A, Le Bechec A, del Sol A: Predicting missing expression values in gene regulatory networks using a discrete logic modeling optimization guided by network stable states. Nucleic Acids Research 2013, 41(1):e8.

2. Garg A, Di Cara A, Xenarios I, Mendoza L, De Micheli G: Synchronous versus asynchronous modeling of gene regulatory networks. Bioinformatics 2008, 24(17):1917-1925.

3. Moes M, Le Béchec A, Crespo I, Laurini C, Halavatyi A, Vetter G, del Sol A, Friederich E: A novel network integrating a miRNA-203/SNAI1 feedback loop which regulates epithelial to mesenchymal transition. PloS one 2012, 7(4):e35440.

4. Mendoza L, Pardo F: A robust model to describe the differentiation of T-helper cells. Theory Biosci 2010, 129(4):283-293.

5. Chang R, Shoemaker R, Wang W: Systematic search for recipes to generate induced pluripotent stem cells. PLoS computational biology 2011, 7(12):e1002300.

This work is supported by University of Luxembourg and the Luxembourg Centre for Systems Biomedicine

Last modified: 2013-10-03