bioinfromatics researchers
SUMMARY OF WORK
The sequencing of the human genome has raised important questions about the nature of genomic complexity. It was widely anticipated that the human genome would contain a much larger number of genes (estimates based on expressed-sequence clustering ran as high as 150,000 genes) than Drosophila (14,000 genes) or Caenorhabditis elegans (19,000 genes). The report of only 32,000 human genes thus came as a surprise. This basic disparity indicated that the number of human expressed-sequence (mRNA) forms was much higher than the number of genes. How can the much greater size and complexity of humans be encoded in only twice the number of genes required by a fly? One way to explain this paradox is to point out that the number of possible proteins from the genome can far exceed the possible number of genes if a large percentage of the genes have the ability to encode multiple proteins. This expansion of the proteome can be accomplished through alternative precursor messenger RNA (pre-mRNA) splicing, which can allow one gene to encode multiple proteins . This mechanism of generation of several transcripts from a single mature RNA using different combination of exons was called alternative splicing. Thus alternative splicing is defined as creation of multiple m RNA products from a single gene product or AS takes place when the introns of a certain pre-m RNA can be sliced (removed) in more than one way, yielding several possibilities i.e. mature m RNA from the same gene. Alternative splicing of pre mRNAs is a powerful and versatile regulatory mechanism that can affect quantitative control of gene expression and functional diversification of proteins. It contributes to major developmental decisions and also to fine-tuning of gene function. Recent genomic and bioinformatics analyses of vast amount of transcript data in human and other organisms suggest that alternative splicing is widespread almost all higher eukaryotic genomes, with most information derived from well-studied organisms like Caenorhabditis elegans, Drosophila , Mouse and Humans, thus emphasizing the importance of alternative splicing throughout evolution. The 97 Mb genomic sequence of the eukaryotic, soil dwelling, free living nematode C.elegans is complete and was the first multicellular organism to be sequenced. Deciphering the biological information from these sequenced genomes is of great importance and use, as the information that we get from these C.elegans is directly applicable to more complex organisms like humans because 30-40% of C.elegans genes share a direct homology with human genes. More than 50% of the genes of recently sequenced eukaryotic genomes are now believed to undergo alternative splicing to generate different transcript and protein isoforms under different developmental, tissue-specific, and disease conditions, thus bringing a new set of challenges to gene prediction programs and the encompassing annotation processes. Most of the discovery of alternative splicing has relied on the use of ESTs, which may underestimate alternative splicing because of their incomplete coverage and lack of information regarding combinations of exons that are utilized, Secondly, the EST based approach is successful only in case of organism which have extensive EST coverage specially humans. In organisms like C.elegans where the EST coverage is limited and only 1% of the total coding genes have EST coverage, detection of alternatively spliced variants is rather difficult. Although several approaches for the ab initio prediction of gene structure have been developed, the ab initio prediction of alternative splicing using a combination of gene/exon finding programmes has not been considered. A general problem of identifying alternative splices with current gene finding programs is that they usually search for optimal exons, splice sites, and gene structure. Alternative splice sites are usually weaker than constitutive sites, and alternative exons or introns may possibly show an atypical composition (e.g. hexamer frequencies), therefore, they are hard to detect with most gene finding programs. Thus, methods that facilitate the identification of alternative exons would be quite useful to assist in genome annotation.Our studies comprised of complete analysis of the un-annotated intronic, 5' and 3'untranslated (UTR) genomic regions of chromosome one of C. elegans with major thrust on new exons and genes encoded by chromosome one by using a combination of various gene/exon finding tools, ORF finding programmes and several other bioinformatics tools, so as to predict new undetected alternatively spliced transcripts in C. elegans genes. Around 120-150 new alternatively spliced variants and exons were identified during the chromosome one analysis. Following the computational predictions of new spliced isoforms, Yuji Kohara's C. elegans EST database was searched for putative EST/cDNA support for possible occurrence of these new exons/transcripts. A search of Yuji Kohara's C. elegans EST database didn't yield any EST match for these new transcripts which is expected keeping in mind the problems and limitations of the EST database as the available EST database for C. elegans is not adequately representative and so far at least 40% of the genes in the organism are not reflected in this database NCBI BLAST search was accomplished for finding out homology of these new spliced variants, however, no significant information was available about the prospective similarity with other polypeptides. Due to non-availability of required information for supporting EST/cDNA matches for the new prediction, the other approach to confirm the findings was to validate them in lab. RT-PCR amplification was employed to investigate the possible occurrence of these new exons in transcripts using gene specific primers and RNA isolated from mixed population of C. elegans. Based on our findings and approach we are sure that employing above technology and tools, around 1000-1200 new alternatively spliced variants from the full genome of C. elegans could be identified. These new un-reported spliced variants point towards the complex mechanism of alternative splicing in C. elegans genes and their role in downstream regulatory steps. Further studies in this direction will enhance our knowledge about the biological and functional significance of these spliced transcripts. These findings could be very useful to biologists in several ways: Firstly our data not only increases the available database for alternatively spliced genes in C. elegans but also point towards the complex mechanism of alternative splicing in C. elegans genes and their role in downstream regulatory steps. Secondly, similar studies can be conducted in several other organisms specially humans with whom C. elegans share a close gene homology. Thirdly, it could be suggested that the conventional methods of detection of spliced variants of a gene are not good enough to detect all possible spliced isoforms of a gene, so we propose to combine the computational prediction of alternative splice isoforms with experimental validation for efficient delineation of all possible spliced variants of a gene. Lastly, it could be concluded that due to limited domain of work, there is an ample scope to carry out further studies on the functional and biological significance of these spliced transcripts and their prospective role in functioning of the C. elegans genes.