Homeodomain fold

The homeodomain fold is a protein structural domain that binds DNA or RNA and is thus commonly found in transcription factors. The fold consists of a 60-amino acid helix-turn-helix structure in which three alpha helices are connected by short loop regions. The N-terminal two helices are antiparallel and the longer C-terminal helix is roughly perpendicular to the axes established by the first two. It is this third helix that interacts directly with DNA. Homeodomain folds are found exclusively in eukaryotes but have high homology to lambda phage proteins that alter the expression of genes in prokaryotes. Many homeodomains induce cellular differentiation by initiating the cascades of coregulated genes required to produce individual tissues and organs, while homeodomain proteins like Nanog are involved in maintaining pluripotency.

Homeobox genes
The homeobox is a stretch of DNA about 180 nucleotides long that encodes a homeodomain. Homeobox genes code for homeodomain proteins in both vertebrates and invertebrates. The existence of homeoboxes was first discovered in Drosophila, where the radical alterations that resulted from mutations in homeobox genes were termed homeotic mutations. The most famous such mutation is Antennapedia, in which legs grow from the head of a fly instead of the expected antennae. Homeobox genes are critical in the establishment of body axes during embryogenesis.

The consensus 60-polypeptide chain is (typical intron position noted with dashes)

RRRKRTA-YTRYQLLE-LEKEFLF-NRYLTRRRRIELAHSL-NLTERHIKIWFQN-RRMK-WKKEN

The motif is highly conserved over hundreds of millions of years of evolutionary history, with typically 80% match in the corresponding nucleotide sequence to the consensus sequence across species, genera and phyla.

Sequence specificity
Homeodomains can bind both specifically and nonspecifically to B-DNA with the C-terminal recognition helix aligning in the DNA's major groove and the unstructured peptide "tail" at the N-terminus aligning in the minor groove. The recognition helix and the inter-helix loops are rich in arginine and lysine residues, which form hydrogen bonds to the DNA backbone; conserved hydrophobic residues in the center of the recognition helix aid in stabilizing the helix packing. Homeodomain proteins show a preference for the DNA sequence 5'-ATTA-3'; sequence-independent binding occurs with significantly lower affinity.

POU proteins
Proteins containing a POU region consist of a homeodomain and a separate, structurally homologous POU domain that contains two helix-turn-helix motifs and also binds DNA. The two domains are linked by a flexible loop that is long enough to stretch around the DNA helix, allowing the two domains to bind on opposite sides of the target DNA, collectively covering an eight-base segment with consensus sequence 5'-ATGCAAAT-3'. The individual domains of POU proteins bind DNA only weakly, but have strong sequence-specific affinity when linked. Interestingly, the POU domain itself has significant structural similarity with repressors expressed in bacteriophages, particularly lambda phage.

Dlx proteins
Vertebrates have six genes from the Dlx family of homeodomain transcription factors, arranged into three clusters: Dlx1/Dlx2, Dlx3/Dlx4 and Dlx5/Dlx6. All six are homologs of the fly gene Distal-less. Dlx genes are involved in the development of the nervous system and of limbs.