Those are among the key insights of a nine-year project to study the 97 percent of the human genome that’s not, strictly speaking, made up of genes.
The Encyclopedia of DNA Elements Project, nicknamed Encode, is the most comprehensive effort to make sense of the totality of the 3 billion nucleotides that are packed into our cells.
The project’s chief discovery is the identification of about 4 million sites involved in regulating gene activity. Previously, only a few thousand such sites were known. In all, at least 80 percent of the genome appears to be active at least sometime in our lives. Further research may reveal that virtually all of the DNA passed down from generation to generation has been kept for a reason.
“This concept of ‘junk DNA’ is really not accurate. It is an outdated metaphor,” said Richard Myers of the HudsonAlpha Institute for Biotechnology in Alabama.
Myers is one of the leaders of the project, involving more than 400 scientists at 32 institutions.
Another Encode leader, Ewan Birney of the European Bioinformatics Institute in Britain, said: “The genome is just alive with stuff. We just really didn’t realize that beforehand.”
“What I am sure of is that this is the science for this century,” he said. “In this century, we will be working out how humans are made from this instruction manual.”
The new insights are contained in six papers published Wednesday in the journal Nature. More than 20 related papers are appearing elsewhere.
The human genome consists of about 3 billion nucleotides — the “letters” — strung one to another in chains. Specific stretches of those nucleotides carry the instructions for making specific proteins. The proteins, in turn, become the blocks of tissues and the enzymes, hormones and carrier molecules that do most of the cell’s work.
The Human Genome Project identified the correct linear sequence of those letters. At its completion in 2003, only 21,000 genes had been identified — far fewer than most biologists predicted. Furthermore, the genes constituted only 3 percent of the cell’s DNA, leaving biologists to wonder about what function, if any, the remaining 97 percent had. Encode was created to try to answer that question.
“Back then we got the book, but we didn’t know how to read it,” said Elise Feingold of the National Human Genome Research Institute, the federal agency that provided about $200 million for the project.
The new research helps explain how so few genes can create an organism as complex as a human being. The answer is that regulation — turning genes on and off at different times in different types of cells, adjusting a gene’s output and coordinating its activities with other genes — is where most of the action is.
The importance and subtlety of gene regulation is not a new idea. Nor is the idea that parts of the genome once thought to be “junk” may have some use. What the Encode findings reveal is the magnitude of the regulation.
In one paper, a team led by Thomas R. Gingeras of Cold Spring Harbor Laboratory in New York reported that three-quarters of the genome’s DNA is “transcribed” into a related molecule, RNA, at some point in life. A small amount of that RNA is then “translated” into protein. Much of the rest appears to have gene-regulating activities that remain to be discovered.
In a telephone conference call with reporters, several of the researchers likened the 4 million regulatory sites to electrical switches in a hugely complex wiring diagram.
By turning switches on and off, and varying the duration of their activity, a nearly infinite number of circuits can be formed. Similarly, by activating and modulating gene function, immensely complicated events such as the development of a brain cell or a liver cell from the same starting materials is possible.
To see the switches in their different states, the researchers used more than 150 different cell types and conducted about 1,600 experiments altogether.
“There is a modest number of genes and an immense number of elements that choreograph how those genes are used,” said Eric D. Green, director of the National Human Genome Research Institute.
The Encode project created a catalogue of the stretches of DNA that show evidence of biological activity. But exactly what they do in most cases is not known.
The findings will be especially useful in interpreting “genome-wide association studies” (GWAS), which are key tools in the emerging field of personalized medicine.
In GWAS research, the genomes of people with a specific disease — Type 2 diabetes, coronary atherosclerosis, rheumatoid arthritis — are compared with the genomes of people without the disease. Usually a dozen or more small changes in DNA sequence are found to be more common in people with the disease than those without it.
Sometimes the importance of those DNA “variants” is obvious. But often they are found in perplexing regions of the genome, far from the gene for the enzyme or protein involved in the disease. Now, researchers suspect that 10 times as many risk variants occur in the regulatory “switches” as in the genes themselves.
The researchers cited the example of Crohn’s disease, an ailment that causes painful and dangerous inflammation of the intestines.
Previous studies had identified several mutations associated with the illness. Encode showed that some were sites where a specific “transcription factor” called GATA2 binds. But how that may trigger the disease is, for the moment, unknown.