Genome-wide association study

In genetic epidemiology, a genome-wide association study (GWA study, or GWAS), also known as whole genome association study (WGA study, or WGAS), is an examination of all or most of the genes (the genome) of different individuals of a particular species to see how much the genes vary from individual to individual. Different variations are then associated with different traits, such as diseases. In humans, this technique has led to discovery of associations of particular genes with diseases such as the eye disease known as age-related macular degeneration, diabetes and Alzheimer's disease. In humans, hundreds or thousands of individuals are tested, usually for single DNA mutations (single-nucleotide polymorphisms, or SNPs). , over 1,200 human GWASs have examined over 200 diseases and traits, and found almost 4,000 SNP associations. They are useful in finding the molecular pathways of disease, but usually not useful in finding genes that predict risks of disease.

These studies normally compare the DNA of two groups of participants: people with the disease (cases) and similar people without (controls). Each person gives a sample of cells, such as swabs of cells from the inside of the cheek. DNA is extracted from these cells, and spread on gene chips, which can read millions of DNA sequences. These chips are read into computers, where they can be analyzed with bioinformatics techniques. Rather than reading the entire DNA sequence, these systems usually read SNPs that are variations in single nucleotides.

If genetic variations are more frequent in people with the disease, the variations are said to be "associated" with the disease. The associated genetic variations are then considered as pointers to the region of the human genome where the disease-causing problem is likely to reside. Two methods are used to search for disease-associated mutations: hypothesis-driven and non-hypothesis driven methods. Hypothesis-driven methods start with the hypothesis that a particular gene may be associated with a particular disease, and tries to find the association. Non-hypothesis-driven studies use brute force methods to scan the entire genome, and sees which of those genes demonstrate an association. GWASs are generally non-hypothesis-driven.

Surprisingly, most of the SNP variations associated with disease are not in the region of DNA that codes for a protein. Instead, they are usually in the large non-coding regions on the chromosome between genes, or in the intron sequences that are edited out of the DNA sequence when proteins are processed. These are presumably sequences of DNA that control other genes, but usually, their protein function is not known.

Background
The human genome contains many millions of single-nucleotide polymorphisms, and thousands more variations in the number of copies of large and small segments of the genome (copy number variation), which may either directly cause changes in phenotype or which tag nearby mutations containing the key differences that influence individual variation and susceptibility to disease. GWA studies allow researchers to sample 500,000 or more SNPs from each subject in a study capturing variation uniformly across the genome. To date, these studies have identified risk and protective factors for asthma, cancer, diabetes, heart disease, mental illness, and other human differences.

Most genetic variations are associated with the geographical and historical populations in which the mutations first arose. This ability of SNPs to tag surrounding blocks of ancient DNA (haplotypes) underlies the rationale for GWAS. However, because of this, studies must take account of the geographical and racial background of participants&mdash;controlling for what is called population stratification. As the peoples of the world have migrated and inter-married over many generations, these geographical variations also become broken down and mixed over time.

Genes identified
In 2005, a GWAS found an association between age-related macular degeneration (ARMD) and a variation in the gene for complement factor H (CFH). Complement is a protein that regulates inflammation. This association was unexpected from previous research in ARMD, and identified ARMD as an inflammatory process. Together with 4 other variants, these genes can predict half the risk of ARMD between siblings, and it is among the most successful examples of GWAS.

In 2007, a GWAS found an association between type 2 diabetes and a variation in several SNPs in the genes TCF7L2, SLC30A8 and others.

In 2007, the Wellcome Trust Case Control Consortium carried out genome-wide association studies for the diseases coronary heart disease, type 1 diabetes, type 2 diabetes, rheumatoid arthritis, Crohn's disease, bipolar disorder, and hypertension. This study was successful in uncovering many new disease genes underlying these diseases.

Genes in many traditional genetic diseases, such as hemophilia, are always associated with the disease. Other genes are associated with an increased risk. Disappointingly, most of the SNP variations found by GWAS are associated with only a small increased risk of the disease, and have only a small predictive value. The median odds ratio for a SNP is 1.33 per SNP, with some variants carrying odds ratios above 3.0, and some exceeding 12.0. A common pattern is that a few variants have a large effect, but most have small effects.

Clinical applications
One of the challenges for a successful GWAS in the future will be to apply the findings in a way that accelerates drug and diagnostics development, including better integration of genetic studies into the drug-development process and a focus on the role of genetic variation in maintaining health as a blueprint for designing new drugs and diagnostics. One of such successes is related to identifying the genetic variant associating with response to anti-hepatitis C virus treatment. For genotype 1 hepatitis C treated with Pegylated interferon-alpha-2a or Pegylated interferon-alpha-2b (brand names Pegasys or PEG-Intron) combined with ribavirin, a GWAS study has shown that genetic polymorphisms near the human IL28B gene, encoding interferon lambda 3, are associated with significant differences in response to the treatment. A later report demonstrated that the same genetic variants are also associated with the natural clearance of the genotype 1 hepatitis C virus.

Problems
GWA studies are necessarily hypothesis-free: that is they search the entire genome for associations rather than focusing on small candidate areas. This aspect of GWA has attracted the criticism as expensive "factory science". Robert Elston is a prominent proponent of linkage, although he does accept association may occasionally be useful. Methodologically, the power of association to localize a mutation translates directly into the need for extremely dense searches. This led Pearson and Manolio to note that "the GWA approach can also be problematic because the massive number of statistical tests performed presents an unprecedented potential for false-positive results". Alternative strategies such as linkage analysis act as systematic studies of variation, without needing variants at each region.