Conjoined gene

A conjoined gene (CG) is defined as a gene, which gives rise to transcripts by combining at least part of one exon from each of two or more distinct known (parent) genes which lie on the same chromosome, are in the same orientation, and often (95%) translate independently into different proteins. In some cases, the transcripts formed by CGs are translated to form chimeric or completely novel proteins.

Figure 1: Cartoonic representation of the formation of conjoined gene A-B from parent genes A and B.

Several alternative names are used to address conjoined genes, including fusion gene, fusion protein, read-through transcript, co-transcribed genes, bridged genes, spanning genes, hybrid genes, locus-spanning transcripts, etc.

At present, 800 CGs have been identified in the entire human genome by different research groups across the world including Prakash et al., Akiva et al., Parra et al., Kim et al., and in the 1% of the human genome in the ENCODE pilot project. 36% of all these CGs could be validated experimentally using RT-PCR and sequencing techniques. However, only a very limited number of these CGs are found in the public human genome resources such as the Entrez Gene database, the UCSC Genome Browser and the Vertebrate Genome Annotation (Vega) database. More than 70% of the human conjoined genes are found to be conserved across other vertebrate genomes with higher order vertebrates showing more conservation, including the closest human ancestor, chimpanzee. Formation of CGs is not only limited to the human genome but some CGs have also been identified in other eukaryotic genomes, including mouse and drosophila. There are a few web resources which include information about some CGs in addition to the other fusion genes, for example, ChimerDB and HYBRIDdb. Another database, ConjoinG, is a comprehensive resource dedicated only to the 800 Conjoined Genes identified in the entire human genome.