Estimating the Numbers of Gene Gains and Losses

When you use the programs here, please cite the following paper:
Yoshihito Niimura, Masatoshi Nei (2007) Extensive gains and losses of olfactory receptor genes in mammalian evolution. PLoS ONE 2: e708.
Download: Niimura_Nei_2007.zip
Version: 1.0
Updated: 2011/6/15
Size: 12 KB

The method to obtain the result in Figure 1D of Niimura and Nei (2007) is explained here.

You first need to download LINTREE from Prof. Masatoshi Nei's website.
Then construct a phylogenetic tree using a program named 'njboot'.
'cladeP.njboot' is the output file of 'njboot' with 500 bootstrapping.

Then, please compile the C program named 'branchout.c'.
Perform 'branchout' by

branchout cladeP.njboot -o 1 2 3 4 5 6 7 8 > cladeP.branchout
You obtain the output file 'cladeP.branchout'.
'1 2 3 4 5 6 7 8' means the outgroup gene numbers.

Now you can use the program 'mrcacount.pl' for estimating the numbers of ancestral genes and gene gains/losses.
The usage is:

mrcacount.pl cladeP.branchout spetree.dat cladeP.genelist -b 350 > cladeP.mrca
'spetree.dat' indicates the phylogeny of the species examined.
In this example, the content of this file is:
Human~Macaq~Mouse~Rat~Dog~Cow~Oposs Platy 0
Human~Macaq~Mouse~Rat~Dog~Cow Oposs Platy
Human~Macaq~Mouse~Rat Dog~Cow Oposs~Platy
Human~Macaq Mouse~Rat Dog~Cow~Oposs~Platy
Human Macaq Mouse~Rat~Dog~Cow~Oposs~Platy
Mouse Rat Human~Macaq~Dog~Cow~Oposs~Platy
Dog Cow Human~Macaq~Mouse~Rat~Oposs~Platy
Each line corresponds to one node of a species tree.
This file contains seven lines, because there are seven nodes at which the number of ancestral genes was estimated (see Fig. 2A).
For example, the second line represents a node for the separation between opossum and placentals (including Human, Macaq, Mouse, Rat, Dog, Cow). Platypus is the outgroup for this node.
'0' indicates the outgroup of all mammals. Therefore, the first line corresponds to a node for the separation between the platypus and opossum+placentals.

'cladeP.genelist' shows correspondence between gene names and species names.
'350' represents a threshold bootstrap value, since in this case bootstrap resampling was done 500 times.
You will obtain a detailed explanation by typing 'mrcacount.pl'.

'cladeP.mrca' is the result in Figure 1D.


Last updated: June 15, 2011