The importance of sequencing the vast human genome is illustrated in this detailed article from International Human Genome Consortium (IHGC); a great team effort to gain maximal information of the genetic make-up in human beings. Basically, the progress made in this field took place in different phases, which were divided into categories focusing research on: (i) cellular basis (chromosomes), (ii) molecular basis (DNA double helix), (iii) genetic (cloning & sequencing), and (iv) very first genes & entire genome. Human genome is the largest genome sequenced so far, comprising of complexities of genes, transposable elements, GC-content (found to be of great interest), CpG islands and recombination rate which is comparative with genomes of simplest organisms. This genome project was basically inspired from sequencing of lower organisms like bacterial viruses, locating of disease genes of unknown function, and finally, physical maps of yeast, worms, flies and mice. A two-stage method was deployed for sequencing the genome: (1) Shotgun phase and (2) Finishing phase. Briefly, the genome was divided into sized fragments, cloned into large fragment cloning vectors, individually sequenced, gap-filled, assembled and other complications resolved through directed analysis before laying out a final version. In this first effort to understand the human genome, a draft genome sequence was released with the aid of various computational strategies, which turned out to be quite informative in terms of the actual human genome sequence. Eventually, a combinatorial effort from IHGC and Celera Genomics helped this venture because it involved combining some of the shotgun data information from the company with that of the publicized hierarchal shotgun data. In addition to vast collaborations, improvizations made in terms of detection of sequences using fluorescence, dye-terminators, specifically designed sequences, cycle sequencing and some added novel features were key factors operating in this success story. A remarkable number (~1.4 million) of Single Nucleotide Polymorphisms (SNP’s) identified in this project, have aided the identification of several genes, especially related to human diseases (relevant links were provided to browse cytogenetic aberrations and cancer) and a number of STSs, mRNAs, ESTs, helped construct the physical maps of specific regions of chromosomes by analyzing subsets of these sequence contigs. With error rate of less than 1/10,000 bases, this draft sequence was considered to be very accurate with several salient features. Also, the junk DNA identified here was mentioned to be of great importance because it rendered important clues about evolution of complex human genome and resolution of several other long-standing problems. The human genomic sequence was mostly derived from a wide range of transposable elements and was compared to other complementary eukaryotic genomes like that of the fruitfly. An emphasis on GC (Guanine-Cytosine) content of the genome stretch where a functional relationship of high gene density to that of high GC content or influx of transposable elements was established in context of X and Y chromosomes existing in female and male genetic make-up respectively. The existence of transposons as creative forces in the genome and the origin of new genes was found to be as intriguing as their innovative power.
Also, single sequence repeats (SSRs) seemed to play equally critical role in the human genome, especially acting as useful tools in mapping human disease genes. Duplications, Pericentromeric regions, Subtelomeres, and several SNP’s presented a unique landscape for various chromosomes and were largely helpful in predicting actual gene content of this complex genome. Apart from identifying a wide range of RNA genes, a major application of this project was to identify protein-coding genes and also exploring the properties of already known genes. The properties of human genes were compared withh that of worm and fly. This, in itself, was interpreted as a very useful information because with this effort, research on simplest organisms again proved to be crucial in the determination and understanding of a complicated genome like that of human. The gene and protein datasets obtained from this report were far from complete, but provided important clues and served as start points for experimental and computational research. A comparative approach of genomic and proteomic analysis led to several evolutionary innovations especially in relation to human, yeast, fly and worm genomes. Several groups of proteins; each one of them containing a minimum of one orthologue of each species were identified and their relationships were derived by sequence and functional comparisons. In addition, quantification of architectural differences in these organisms gave more clues to the evolution of higher vertebrates, for e.g., the fact that gene duplications were the major cause for evolution of new protein families and consequently new species, and/or complex genomes with advanced functional relevance but with somewhat common origins. These facts determined evolutionary ages of different species and their various relationships in terms of conservation, ancestry and assembly of sequences. For e.g., the draft sequence reported here represented the first global comparison for human and mouse genomes (remarkably, two identical genes per chromosome were identified and homology between human chromosome 4 to mouse chromosome 5 was observed). Apart from these basic applications, several direct applications immerged out of this sequence data; namely, identification of human disease gene regions, drug targets, regulatory regions, sequencing of additional large genomes and exploration of sequence to respective function. Although much remains to be seen, this information greatly aided further investigative studies on the human genome.