Haplotype Mapping: The Next Genomic Frontier
The Next Genomic Frontier

by Brenda Patoine

January, 2004


Steven E. Hyman, M.D.

Provost Harvard University

Q: The National Institutes of Health (NIH) in collaboration with research institutions in several other countries, has launched a major initiative to develop a Haplotype Map (Hap Map). Why is it important to have such a map, and what will it contribute to genomics research?  

SH: The planned Haplotype Map is the next logical step in mobilizing tools for gene discovery. The most common type of variation in the human genome is the single nucleotide polymorphism or SNP, a single-base difference at a genetic locus from person to person. Millions of SNPs have been found, making it imperative that we find efficient and cost-effective ways for using them. The Haplotype Map is based on the recognition that the development of genetic variation from ancestral chromosomes has not proceeded uniformly across the genome. 

Rather, there appear to be regions in which recombination is more likely to occur, thus shuffling the genetic deck at those points. There are other regions where is it less likely to occur, leaving relatively large blocks intact. These blocks or haplotypes can be identified by a small number of SNPs. Wise use of genetic markers will be enhanced by knowing the boundaries of these blocks. To be sure, a clear haplotype structure may not be apparent everywhere in the genome, but knowledge of the haplotype structure of the genome will speed the search for loci that confer disease risk. 

Q: How might the information gleaned from the Hap Map influence drug treatment in the future?  

SH: The Hap Map should help us use genetic markers wisely, to speed up (and to make affordable) association studies based on candidate genes and ultimately, whole-genome association studies. Without the Hap Map, the choice of markers for association studies will remain more or less a matter of guesswork. 

Q: What’s the rationale for focusing on haplotypes as opposed to SNPs? Aren't we better off cataloguing all of the SNPs in DNA?  

SH: Even though the costs of genotyping have come down, it would still be prohibitively expensive, not to mention quite cumbersome, to catalogue all SNPs in a DNA region for every person in a study. 

Q: What do you think will be the first clinically relevant advances that will come out of the Hap Map?  

SH: Hastening the discovery of genes that confer risk for disease—for which diseases one can only guess. 

Q: During your tenure at the National Institute of Mental Health (NIMH), you launched a program designed to help speed the search for mental illness genes. Though not for lack of try­ing, progress in this area remains slow. Why?  

SH: Since mental illness results from the inter­action of multiple genetic loci and nongenetic factors, the signal provided by any one locus is relatively small and therefore hard to detect. This is why we undertook to fund the collection of large samples to investigate the genetics of schizophrenia, bipolar disorder, early-onset major depression, autism, and other disorders. The careful phenotyping of individuals and collection of DNA is frustratingly slow, but is proceeding well. Recognizing the difficulties before us, NIMH has been a leader in support­ing fields such as statistical genetics and genomic technologies. The benefits of these investments are just beginning to become apparent with several compelling associations emerging for schizophrenia. 


David Altshuler, M.D., Ph.D

Assistant Professor of Genetics and of Medicine, Harvard Medical School, Massachusetts General Hospital

Founding Member Director, Medical and Population Genetics, The Broad Institute

Q: You are leading the international effort to create a Haplotype Map (Hap Map). Why is the Hap Map so critical to finding solutions to diseases that may involve multiple genes?

DA: Human genetics has been successful at identifying genes causing disease in cases where those genes are the sole cause of the disease. This has been very helpful in many diseases, including many neurological diseases, such as Huntington’s disease, Lou Gehrig’s disease, and others. But almost none of the common diseases are caused by a single gene. Rather, they are caused by the combination of many genes and environment and behavior and just bad luck. They’re much more complex.

So one of the approaches that has been made possible with the sequencing of the human genome is to actually go and collect all of those common variations in the human genome and ask whether any of them influence risk of disease. We now know the list of human genes, but we don’t know how they actually vary in the population. So the Haplotype Map project is part of a multi-year effort to character­ize how genes vary in the population.

Q: What do SNPs have to do with the Haplotype Map?

DA: SNPs, or single nucleotide polymorphisms, are places where people’s genomes typically differ, meaning there’s a fairly high frequency— more than 1 percent—of there being two different versions of that position in the human genome. So, some people have an A in the DNA code and other people have a T. That’s a variation by a single nucleotide, or letter in the DNA code.

There is estimated to be something like 10 million such variations in the human genome sequence —which is about 3 billion letters long in total. Of those 10 million variations in the DNA code, something like 6 million have been discovered over the past 5 years and are now available in a public database, the Public SNP Map, which we also played a role in creating.

Q: The Haplotype Map goes a step beyond the SNP Map, to try to decipher the relation­ships among SNPs. Why do we need to understand that?

DA: What has been recognized in theory for decades but empirically not understood in detail until the last year or two, is that nearby SNPs are actually very correlated with one another. So, if you know that a person has an A instead of a T in one position, and you look at another SNP nearby, there’ll be a strong corre­lation between the two.

So why would that be? It’s sort of puzzling. It turns out that when someone inherits a SNP, what they inherit is not actually that one SNP. They inherit a piece of a chromosome. When you look at human DNA, even at large popula­tions of apparently unrelated people, you recog­nize that some group of people—let’s say 20 percent—all inherited the exact same stretch of DNA, which might be tens of thousands of letters long. Every letter is the same in those 20 percent of people. But in another percentage of people, there’s a different copy. Our total DNA is still more than 99 percent identical, but there’s a whole bunch of SNPs that are inherit­ed identically by one set of people, and a whole bunch inherited a slightly different way by a second group of people, and then there’s a third pattern and a fourth pattern and a fifth pattern. These stretches of DNA that are inherited iden­tically by a whole bunch of people in the popu­lations are called haplotypes. The Haplotype Map is basically charting these patterns across the genome.

Q: What does the existence of such patterns of inheritance in human DNA suggest about human ancestry?

DA: What these patterns represent are actually shared ancestors of the human population that lived a long time ago. People with the exact same sequence across that stretch of DNA inherited that stretch identically from an ances­tor that lived say, 100,000 years ago. When you look at any given stretch of DNA, it looks like there are only 4 or 5 ancestors who are actually represented in the whole human population across that little stretch of DNA. That’s not to say there are only 4 or 5 ancestors in the whole human population. When you move down the chromosome, there are another 4 or 5 ances­tors, but they’re not the same 4 or 5.

“ If you add it all up, we’re pretty sure that the total number of ancestors of the current human population is on the order of 10,000.”

If you add it all up, we’re pretty sure that the total number of ancestors of the current human population is on the order of 10,000. That’s a very small number given that there are 6 billion people on the planet today. But of those 10,000, only 4 or 5 are represented in any appreciable frequency in any one region in the DNA.

The Haplotype Map is not being done to investigate human ancestry, even though that’s very interesting. The reason is more practical: If it’s the case that there are 100 SNPs in a row that all track together as a unit, then it’s not nec­essary in disease studies to test all 100. It’s only necessary to know that they track together, and to test a sufficient subset to capture the pattern of diversity. The idea is that you first type a lot of SNPs across the genome, maybe 10 or 15 percent of the 10 million, in hundreds of peo­ple. You can then see that in a certain region, there’s a particularly simple pattern, where these 20 SNPs form only 5 different combinations. Then you know that if you can pick a subset of SNPs sufficient to define those 5 patterns, you’ve got the information. You don’t need to type all 20. By doing that across the whole genome, you develop a resource by which people can con­duct much more efficient genetic studies. The Haplotype Map is in the same spirit as the Human Genome Project, which said “here’s the genome sequence; there are lots of things you can use in your experiments.”

Q: How does one then use that information to find the gene for a disease or some complex trait?

DA: Let’s say a scientist is studying a brain dis­ease, such as schizophrenia, and the goal is to determine if inheriting a particular gene, or region of DNA, influences risk. The ultimate answer to that question would be to sequence every base of every one of those genes in every person in the study, tally up all the changes and ask if any are different in frequency between the schizophrenics and those that didn’t have schiz­ophrenia. It’s prohibitively expensive, so people have to have an extremely narrow focus before they could start doing the sequencing. That’s prob­ably a big reason why there’s been so little progress in these genetic fields, because there was no way to narrow it down.

What’s been realized over the last 5 years or so, is that most of what varies among people is com­monly varying. You can discover it in a couple hundred people or even tens of people, and when you go back and look at hundreds or thousands more, you keep seeing the same things over and over again. The whole idea of haplotype mapping is to do only a subset and yet get most of the information.

Q: What do you think will be the first clinical appli­cations that will come out of the Haplotype Map?

DA: It’s important to say that I doubt very much that haplotypes and haplotype maps will ever be of interest to patients directly. They are a means to an end, where the end is to understand what’s wrong in disease. Once someone has found—and it’s going to take years—that this particular gene is playing a role in this disease, and then the next step beyond is this particular variation in that gene—then further proof will come from a mouse model, or biological experiment, or a drug. The patient will simply know that they’re given a treat­ment or diagnostic test.

My personal interest is really in knowing, what is schizophrenia? Why does somebody get sick with it? We have very few ideas—lots of theories, but very little proof. If you found that a gene or genes, when mutated, contributed to disease risk even in a subtle way, it would tell us something we don’t know today. My hope is that by discovering that, it will lead to better treatments or better pre­ventatives. The people who are benefiting from them won’t necessarily know we ever existed. They’ll just have a good treatment.