Conserved sequence

In biology, conserved sequences are similar or identical sequences that occur within nucleic acid sequences (such as RNA and DNA sequences), protein sequences, protein structures or polymeric carbohydrates across species (orthologous sequences) or within different molecules produced by the same organism (paralogous sequences). In the case of cross species conservation, this indicates that a particular sequence may have been maintained by evolution despite speciation. The further back up the phylogenetic tree a particular conserved sequence may occur the more highly conserved it is said to be. Since sequence information is normally transmitted from parents to progeny by genes, a conserved sequence implies that there is a conserved gene.

It is widely believed that mutation in a "highly conserved" region leads to a non-viable life form, or a form that is eliminated through natural selection.

Conserved nucleic acid sequences
Highly conserved DNA sequences are thought to have functional value. The role for many of these highly conserved non-coding DNA sequences is not understood. One recent study that eliminated four highly-conserved non-coding DNA sequences in mice yielded viable mice with no significant phenotypic differences; the authors described their findings as "unexpected". .

Many regions of the DNA, including highly conserved DNA sequences, consist of repeated sequence (DNA) elements. One possible explanation of the null hypothesis above is that removal of only one or a subset of a repeated sequence could theoretically preserve phenotypic functioning on the assumption that one such sequence is sufficient and the repetitions are superfluous to essential life processes; it was not specified in the paper whether the eliminated sequences were repeated sequences.

The TATA promoter sequence is an example of a highly conserved DNA sequence, being found in most eukaryotes.

Conserved protein sequences and structures
Highly conserved proteins are often required for basic cellular function, stability or reproduction. Conservation of protein sequences is indicated by the presence of identical amino acid residues at analogous parts of proteins. Conservation of protein structures is indicated by the presence of functionally equivalent, though not necessarily identical, amino acid residues and structures between analogous parts of proteins.

Shown below is an amino acid sequence alignment between two human zinc finger proteins, with GenBank accession numbers AAB24882 and AAB24881. Alignment was carried out using the clustalw sequence alignment program. Conserved amino acid sequences are marked by strings of $$\mathrm{*}$$ on the third line of the sequence alignment. As can be seen from this alignment, these two proteins contain a number of conserved amino acid sequences (represented by identical letters aligned between the two sequences).



Conserved polymeric carbohydrate sequences
The monosaccharide sequence of the glycosaminoglycan heparin is conserved across a wide range of species.

Biological role of sequence conservation
Sequence similarities serve as evidence for structural and functional conservation, as well as of evolutionary relationships between the sequences. Consequently, comparative analysis is the primary means by which functional elements are identified.

Among the most highly conserved sequences are the active sites of enzymes and the binding sites of a protein receptors.