![]() |
|
|
![]() |
World & I School | World & I Homeschool | World & I College | World & I Library | |
![]() |
||
|
|
||
|
|
|||||||||||||||||||||||||||||
![]() |
![]()
The project to determine the DNA sequence of every human chromosome is dramatically enhancing our ability to decipher our inner biological workings and to use the knowledge for our health and well-being. magine transporting yourself to the mid-nineteenth century and arriving in the serene garden of an Augustinian monastery in the city of Brƒnn (now Brno in the Czech Republic). Your gaze is drawn to a monk who is meticulously tending rows of pea plants. You notice that some plants are tall, others are short; some have purple flowers, others, white. Every so often, the monk opens the pods and examines the peas. Some peas are round, others are wrinkled; some are green, others are yellow. The monk is engrossed in his work and takes copious notes. But you fail to grasp much significance here. What does it all mean?
Returning to our current era, you dash off to a library, flip through a scientific text, and find his picture. He is identified as Gregor Mendel. You learn that after eight years of experimentation with tens of thousands of pea plants, he had arrived at a set of principles explaining how inherited traits are transmitted during sexual reproduction. But the significance of his work, published in 1866, went unrecognized until it was rediscovered in 1900. Over time, it became clear that Mendel had begun laying the foundations of modern genetics. Understanding the gene endel had realized that visible, inherited traits are manifestations of discrete though invisible units of heredity. In 1909, Danish botanist Wilhelm Johannsen proposed that each unit of heredity could be called a gene (from the Greek word genos, meaning birth). Later, the entire complement of genes in an organism was given the name genome.
During the first half of the twentieth century, a number of fundamental characteristics of genes were discovered. In particular, it became clear that (a) genes are located on chromosomes and are arranged in linear fashion; (b) each gene contains instructions for the production of a protein (or protein subunit); and (c) genes are constituted of a cellular substance called DNA (deoxyribonucleic acid). It was further learned that DNA occurs in long chains composed of building blocks, called nucleotides, each of which has three parts: a sugar (deoxyribose), a phosphate, and a base. Four types of nucleotides were found, each possessing one of four bases: adenine (A), thymine (T), guanine (G), and cytosine (C). Each DNA sample was found to contain nearly equal amounts of A and T and nearly equal amounts of G and C. Moreover, X-ray diffraction patterns suggested that DNA is a symmetrical, helical molecule with certain dimensions. Armed with this information, Francis Crick and James Watson began working on the three-dimensional structure of DNA. In a landmark paper published in 1953, they described DNA as a double helix, in which alternating sugar and phosphate groups formed the backbones of the two strands, while the bases on opposite strands pointed toward each other and were paired--A with T, G with C. Thereafter, DNA length was measured in terms of its number of base pairs (bp). Five years later, Matthew Meselson and Franklin Stahl showed that DNA is replicated in such a way that each strand in a parent molecule acts as a template on which a new, complementary strand is synthesized. Then in 1961, Sydney Brenner, Fran?is Jacob, and Jacques Monod found a type of RNA (ribonucleic acid) that carried information encoded in DNA to the site of protein synthesis.
Meanwhile, other experiments distinguished mRNA from two additional types: transfer RNA, which carries amino acids to the site of protein synthesis; and ribosomal RNA, which occurs in the "workbenches" (ribosomes) on which proteins are made. The synthesis of RNA from a DNA template was called transcription; the synthesis of a protein from an RNA template was called translation; and both processes were referred to as gene expression. Later, researchers studying the genes of animal viruses and eukaryotic cells discovered that a gene can be "interrupted" by segments that are not represented in the corresponding mRNA, so these segments do not code for the protein product. These noncoding regions were called introns; the coding regions, exons. Further experiments showed that a gene is first copied to form a long precursor RNA, which then undergoes cutting (to remove the introns) and splicing together of the exons to produce the final mRNA. Around 1970, Hamilton Smith and other researchers discovered "restriction" enzymes that could cut DNA at specific sites. Soon, Paul Berg pioneered the use of these molecular scissors to produce "recombinant" DNA, by cutting DNA segments from different species and sticking them together in desired fashion. Extending this approach, Herbert Boyer and Stanley Cohen showed in 1973 how a gene could be removed from animal DNA and transferred to bacterial cells, where it was copied and produced the corresponding animal protein. This technology to clone a gene--that is, to produce exact copies of a gene in a foreign host--gave birth to the genetic engineering industry. Road to genome sequencing n 1977, two research teams--Frederick Sanger's group and Walter Gilbert and Allan Maxam--announced new techniques for the rapid sequencing of nucleotides in DNA. These methods allowed researchers to obtain the sequences of genes and viral DNA, several thousand base pairs in length. Sanger's method was later modified and incorporated into automated sequencing machines.
Around 1985, the idea of sequencing the whole human genome was being discussed,
On the other hand, in 1986, a conference convened by Charles DeLisi and David Smith of the Department of Energy (DOE) endorsed what was called the Human Genome Initiative and pilot projects were begun. Two years later, the National Research Council (NRC) released a report advocating a 15-year program that would begin by mapping human chromosomes [see "What Is a Genome Map?" p.---- ] and finding the sequences of simpler genomes before moving to large-scale sequencing of human DNA. The tide had begun to turn. Some argued that the human genome was our "thread of life" or even "book of life," and that by determining the entire sequence we would discover
In September 1988, James Wyngaarden, then-director of the National Institutes of Health (NIH), established the Office of Human Genome Research and appointed Watson to lead it. The NIH and DOE then agreed to collaborate on what became known as the Human Genome Project (HGP). The NIH office was upgraded a year later to become the National Center for Human Genome Research (NCHGR). And in 1990, the NIH and DOE presented Congress with a five-year plan as the first phase of their 15-year project, with an overall price tag of $3 billion. October 1 of that year was designated as the official starting date. In April 1992, however, Watson resigned from the NCHGR, displeased that the NIH had been filing for patents on thousands of partial genes whose functions were mostly unknown. Those DNA sequences were being identified by J. Craig Venter and others at the NIH, using a new method involving what were called expressed sequence tags. Soon, Venter left the NIH and founded a nonprofit group--the Institute for Genomic Research (TIGR)--to continue his gene-identification approach and market his discoveries through a sister company. Watson's place was taken by Francis Collins in 1993. International participation in the genome project came early on, with efforts coordinated through the Human Genome Organization, founded in 1988. In 1993, Britain's Wellcome Trust and the Medical Research Council established the Sanger Centre for large-scale genome sequencing. Other research centers that joined the project were located in France, Japan, Germany, and China. At a 1996 meeting in Bermuda, the partners agreed to make their sequencing information available in public databases within 24 hours of generation. The first complete genome sequence of a free-living organism--the bacterium Haemophilus influenzae--was published in May 1995. It was a collaborative effort between Venter's group and Hamilton Smith's team, using a strategy called "whole-genome shotgun sequencing" [see "Genome Sequencing Strategies," p. ---- ]. Over the next few years, the DNA sequences of other small organisms were published by HGP researchers as well as Venter and his collaborators. These included the genome sequences of yeast (Saccharomyces cerevisiae), bacteria (Escherichia coli and others), and a worm (Caenorhabditis elegans). In January 1997, the NCHGR was elevated to become the National Human Genome Research Institute (NHGRI), and the DOE established the Joint Genome Institute. Then in May 1998, Venter made the startling announcement that he was starting a new business--later named Celera Genomics--that would produce the complete human genome sequence in three years for just $300 million. He planned to follow the whole-genome shotgun strategy, using 300 of the latest automated capillary sequencing instruments coupled with supercomputing technology. In response, the NIH and DOE announced new goals in September 1998: The HGP would have a "working draft" ready by 2001, followed by the finished genome sequence in 2003. Six months later, the first deadline was moved to spring 2000. But the HGP partners remained committed to the slower, more methodical approach known as "hierarchical shotgun sequencing" [see "Genome Sequencing Strategies"]. Soon thereafter, Venter's group and collaborators in Berkeley produced the genome sequence of the fruit fly Drosophila melanogaster. The 180-megabase sequence, published in March 2000, was the largest genome yet completed, and this project's success was taken to validate the whole-genome shotgun strategy. As the race to complete the human genome sequence heated up, the two sides began to
Eventually, analyses of the draft sequences were published at practically the same time but in separate journals. The results of the international partners--20 research groups--were reported in the February 15, 2001, issue of Nature, while Celera's work appeared in the February 16, 2001, issue of Science. These results, coupled with those obtained from earlier studies, cover about 95 percent of the human genome. Key discoveries erhaps the biggest surprise to emerge from these sequencing studies is in terms of the total number of genes, with estimates ranging from 25,000 to 40,000. These numbers fall dramatically short of earlier predictions of finding 80,000--100,000 genes. By comparison, the mustard weed has about 25,000 genes, the worm has roughly 20,000, the fruit fly has 14,000, and yeast has 6,000.
How, then, does a relatively small leap in the number of genes endow humans with far greater complexity than these other creatures? One explanation is that the RNA made from each gene can be spliced in alternative ways, producing several mRNAs that are translated into different proteins. The human body is thought to contain about 95,000 different types of proteins, so it would appear that, on average, each gene codes for three proteins. Another possibility is that genes expressed at low levels may have been missed by certain methods used. Whatever the case, it appears that the RNA-coding regions amount to about 28 percent of the genome, while the protein-coding sequences correspond to only about 1.4 percent of the genome. Some noncoding sequences probably contain signals that regulate gene expression, but the functions of other noncoding regions are unknown, so they are often labeled "junk" DNA. In the DNA sequence of each chromosome, some areas are rich in G and C nucleotides, others are rich in A and T. It appears that 50 percent or more of the genes occur as clumps in
The human genome encodes a more complex collection of proteins than the proteins found in invertebrates. This complexity is achieved partly by the presence of protein-coding sequences that are unique to vertebrates but even more by the rearrangement of protein-coding domains present in earlier species. Also, many genes have been duplicated to produce large protein families. HGP researchers found that over 200 human genes are closely related in sequence to certain bacterial genes, although similar sequences are missing in invertebrates such as the fruit fly and worm. It was suggested, therefore, that bacterial DNA may have been directly transferred to the chromosomal DNA of an early vertebrate ancestor of humans, perhaps during bacterial infections. This interpretation has been hotly disputed by two research teams--one at TIGR, the other at GlaxoSmithKline--that looked at additional data and offered alternative interpretations. As much as 50 percent of the human genome consists of repeated sequences, ranging in size from a few bases to large DNA segments. Most of the repeats occur in AT-rich regions, and their functional significance is unknown. But some, which lie in the GC-rich (and gene-rich) regions, may play a useful role. In addition, large repeated segments that occur at chromosomal ends (telomeres) and constrictions (centromeres) may be important in maintaining the chromosome's structural integrity. By comparison, repeat elements occupy just 3 percent of the fruit fly's genome and 7 percent of the worm's genome. Their phenomenal accumulation in the amoeba explains why
It appears that over the course of evolutionary history, the repeat elements have jumped around and rearranged their corresponding genome, producing new genes and modifying existing ones. Some repeats (called SINE and LINE elements) in human chromosomes still seem active, others (LTR retroposons) are nearly inactive, and yet others (DNA transposons) are totally inactive. Some sequence repeats appear to stretch back 800 million years. In 1999, an international consortium was formed to identify and map single-nucleotide polymorphisms (SNPs)--that is, single-nucleotide variations in the genomic sequences of individuals. This consortium, working with the genome sequencing partnership, has already catalogued about 1.4 million SNPs, many of which should be helpful in the study of genes linked to diseases. A comparison of the genomic sequences of various people would show a sequence similarity of about 99.9 percent. The difference of 0.1 percent can help explain the uniqueness of each person's physical traits, and it also provides information about the genetic basis of certain diseases. It should be noted, though, that most genetic differences among us are distributed among people of all ethnicities and races. Thus, these results provide no basis to draw racial boundaries between peoples. Looking ahead aving published their respective working drafts of the human genome, both HGP and Celera
Our knowledge of the human genome sequence, coupled with the newly produced genetic and SNP maps, should provide a number of potential benefits, particularly in the area of medicine. For instance, new diagnostic tests can be devised for earlier detection of genetic diseases and predispositions to diseases. The knowledge will also help in the design and production of new types of drugs and therapies. In addition, it will enable scientists to evaluate the risks of exposure to radiation and toxic agents. DNA sequence data can be of further use in forensics--to help identify both crime suspects and victims and to establish familial relationships. The genome sequences of individuals and SNP maps should also be useful in matching organ donors with recipients. Knowledge of an individual's genetic makeup also raises serious ethical and social concerns. For instance, how do we safeguard the privacy of genetic information and prevent it from becoming an issue in insurance coverage or employment situations? In anticipation of such concerns, the HGP has devoted about 3 percent of its budget to study the impact of the project on these areas. The genome project naturally leads to the field called comparative genomics, which involves comparing the human genome sequence with the DNA sequences of other species. This approach should be helpful in determining gene functions, disease mechanisms, and developments in evolutionary history. Recently, both Celera and the HGP announced draft sequences of the mouse genome. This information is especially important, given that mice and humans share similar sets of genes and the mouse is a model organism for the study of human diseases. A new project being discussed is studying the proteome--the full complement of proteins encoded by the genome; or, in a narrower sense, the set of proteins made in a cell at a given time. The technology for large-scale purification and identification of proteins is still under development, but an international Human Proteome Organization has already been formed. A more readily doable project is studying the transcriptome, the full set of mRNAs transcribed from a genome. Recently, the DOE proposed another ambitious program, named "Genomes to Life," with the goal of moving from our knowledge of DNA sequences to obtaining a comprehensive perspective on whole biological systems. In light of proposals such as these, observers predict that the twenty-first century will probably be an "era of biology." On the Internet Celera Genome News Network www.celera.com/genomics/genomics.cfm Ensembl Genome Server www.ensembl.org/genome/central/ National Human Genome Research Institute www.nhgri.nih.gov/ Nature Genome Gateway www.nature.com/genomics/ Science The Human Genome www.sciencemag.org/feature/plus/sfg/human/index.shtml U.S. Department of Energy, Human Genome Project Information www.ornl.gov/hgmis/ Dinshaw K. Dadachanji is an editor for the Natural Science section of The World & I. |
|
Copyright © 2003 The World & I. All rights reserved. Terms of Use | Privacy Policy |