bioinfromatics researchers

Wednesday, November 22, 2006

bioinfromatics researchers

SUMMARY OF WORK
The sequencing of the human genome has raised important questions about the nature of genomic complexity. It was widely anticipated that the human genome would contain a much larger number of genes (estimates based on expressed-sequence clustering ran as high as 150,000 genes) than Drosophila (14,000 genes) or Caenorhabditis elegans (19,000 genes). The report of only 32,000 human genes thus came as a surprise. This basic disparity indicated that the number of human expressed-sequence (mRNA) forms was much higher than the number of genes. How can the much greater size and complexity of humans be encoded in only twice the number of genes required by a fly? One way to explain this paradox is to point out that the number of possible proteins from the genome can far exceed the possible number of genes if a large percentage of the genes have the ability to encode multiple proteins. This expansion of the proteome can be accomplished through alternative precursor messenger RNA (pre-mRNA) splicing, which can allow one gene to encode multiple proteins . This mechanism of generation of several transcripts from a single mature RNA using different combination of exons was called alternative splicing. Thus alternative splicing is defined as creation of multiple m RNA products from a single gene product or AS takes place when the introns of a certain pre-m RNA can be sliced (removed) in more than one way, yielding several possibilities i.e. mature m RNA from the same gene. Alternative splicing of pre mRNAs is a powerful and versatile regulatory mechanism that can affect quantitative control of gene expression and functional diversification of proteins. It contributes to major developmental decisions and also to fine-tuning of gene function. Recent genomic and bioinformatics analyses of vast amount of transcript data in human and other organisms suggest that alternative splicing is widespread almost all higher eukaryotic genomes, with most information derived from well-studied organisms like Caenorhabditis elegans, Drosophila , Mouse and Humans, thus emphasizing the importance of alternative splicing throughout evolution. The 97 Mb genomic sequence of the eukaryotic, soil dwelling, free living nematode C.elegans is complete and was the first multicellular organism to be sequenced. Deciphering the biological information from these sequenced genomes is of great importance and use, as the information that we get from these C.elegans is directly applicable to more complex organisms like humans because 30-40% of C.elegans genes share a direct homology with human genes. More than 50% of the genes of recently sequenced eukaryotic genomes are now believed to undergo alternative splicing to generate different transcript and protein isoforms under different developmental, tissue-specific, and disease conditions, thus bringing a new set of challenges to gene prediction programs and the encompassing annotation processes. Most of the discovery of alternative splicing has relied on the use of ESTs, which may underestimate alternative splicing because of their incomplete coverage and lack of information regarding combinations of exons that are utilized, Secondly, the EST based approach is successful only in case of organism which have extensive EST coverage specially humans. In organisms like C.elegans where the EST coverage is limited and only 1% of the total coding genes have EST coverage, detection of alternatively spliced variants is rather difficult. Although several approaches for the ab initio prediction of gene structure have been developed, the ab initio prediction of alternative splicing using a combination of gene/exon finding programmes has not been considered. A general problem of identifying alternative splices with current gene finding programs is that they usually search for optimal exons, splice sites, and gene structure. Alternative splice sites are usually weaker than constitutive sites, and alternative exons or introns may possibly show an atypical composition (e.g. hexamer frequencies), therefore, they are hard to detect with most gene finding programs. Thus, methods that facilitate the identification of alternative exons would be quite useful to assist in genome annotation.Our studies comprised of complete analysis of the un-annotated intronic, 5' and 3'untranslated (UTR) genomic regions of chromosome one of C. elegans with major thrust on new exons and genes encoded by chromosome one by using a combination of various gene/exon finding tools, ORF finding programmes and several other bioinformatics tools, so as to predict new undetected alternatively spliced transcripts in C. elegans genes. Around 120-150 new alternatively spliced variants and exons were identified during the chromosome one analysis. Following the computational predictions of new spliced isoforms, Yuji Kohara's C. elegans EST database was searched for putative EST/cDNA support for possible occurrence of these new exons/transcripts. A search of Yuji Kohara's C. elegans EST database didn't yield any EST match for these new transcripts which is expected keeping in mind the problems and limitations of the EST database as the available EST database for C. elegans is not adequately representative and so far at least 40% of the genes in the organism are not reflected in this database NCBI BLAST search was accomplished for finding out homology of these new spliced variants, however, no significant information was available about the prospective similarity with other polypeptides. Due to non-availability of required information for supporting EST/cDNA matches for the new prediction, the other approach to confirm the findings was to validate them in lab. RT-PCR amplification was employed to investigate the possible occurrence of these new exons in transcripts using gene specific primers and RNA isolated from mixed population of C. elegans. Based on our findings and approach we are sure that employing above technology and tools, around 1000-1200 new alternatively spliced variants from the full genome of C. elegans could be identified. These new un-reported spliced variants point towards the complex mechanism of alternative splicing in C. elegans genes and their role in downstream regulatory steps. Further studies in this direction will enhance our knowledge about the biological and functional significance of these spliced transcripts. These findings could be very useful to biologists in several ways: Firstly our data not only increases the available database for alternatively spliced genes in C. elegans but also point towards the complex mechanism of alternative splicing in C. elegans genes and their role in downstream regulatory steps. Secondly, similar studies can be conducted in several other organisms specially humans with whom C. elegans share a close gene homology. Thirdly, it could be suggested that the conventional methods of detection of spliced variants of a gene are not good enough to detect all possible spliced isoforms of a gene, so we propose to combine the computational prediction of alternative splice isoforms with experimental validation for efficient delineation of all possible spliced variants of a gene. Lastly, it could be concluded that due to limited domain of work, there is an ample scope to carry out further studies on the functional and biological significance of these spliced transcripts and their prospective role in functioning of the C. elegans genes.

bioinfromatics researchers

Finding novel alternative spliced variants in Cadherin encoding genes from C. elegans genome
Luv Kashyap1, Mohammad Tabish1*, Subramaniam Ganesh2, and Deepti Dubey2
1Department of Biochemistry, Faculty of Life Sciences, A. M. University, Aligarh, India and 2Department of Biological Sciences and Bioengineering, Indian Institute of Technology, Kanpur, India
Cadherins are calcium-dependent, homophilic, cell-cell adhesion receptors that regulate morphogenesis, pattern formation and cell migration. The C. elegans Genome Sequencing Consortium has reported 12 genes encoding a total of 13 members of the cadherin superfamily. We have studied alternative splicing in this cadherin group and have found that 7 out of the 12 genes encoding the cadherin superfamily undergo alternative splicing giving rise to possibly 21 new alternatively spliced transcripts at both 5’and 3’end with a higher rate towards 5’end. These newly discovered spliced variants in C.elegans cadherin superfamily, which were earlier not reported or missing could play a vital role in explaining the way cadherins act to control vital processes like cell adhesion and morphogenesis. To further validate our findings we have done RT-PCR experiments to confirm the occurrence of these new alternatively spliced variants.

bioinfromatics researchers

Finding novel alternative spliced variants in Cadherin encoding genes from C. elegans genome
Luv Kashyap1, Mohammad Tabish1*, Subramaniam Ganesh2, and Deepti Dubey2
1Department of Biochemistry, Faculty of Life Sciences, A. M. University, Aligarh, India and 2Department of Biological Sciences and Bioengineering, Indian Institute of Technology, Kanpur, India
Cadherins are calcium-dependent, homophilic, cell-cell adhesion receptors that regulate morphogenesis, pattern formation and cell migration. The C. elegans Genome Sequencing Consortium has reported 12 genes encoding a total of 13 members of the cadherin superfamily. We have studied alternative splicing in this cadherin group and have found that 7 out of the 12 genes encoding the cadherin superfamily undergo alternative splicing giving rise to possibly 21 new alternatively spliced transcripts at both 5’and 3’end with a higher rate towards 5’end. These newly discovered spliced variants in C.elegans cadherin superfamily, which were earlier not reported or missing could play a vital role in explaining the way cadherins act to control vital processes like cell adhesion and morphogenesis. To further validate our findings we have done RT-PCR experiments to confirm the occurrence of these new alternatively spliced variants.

Saturday, September 16, 2006

bioinfromatics researchers

Alternatively spliced new variants of RhoGEF domain containing gene in Caenorhabditis elegans

Luv Kashyap and Mohammad Tabish*
Department of Biochemistry, Faculty of Life Sciences, A.M. University, Aligarh, U.P. 202002, India

Abstract
Alternative splicing is an important means of regulating gene expression that determines cell fate in various organisms such as sexual differentiation in drosophila and apoptosis
in mammals. An aberrant regulation of alternative splicing has also been implicated in human disease. Analysis of genomic sequence and transcripts data from human and other organisms suggested widespread occurrence of alternatively spliced transcripts in almost all higher eukaryotic organisms including Caenorhabditis elegans, Drosophila, Mouse and Human. Most of the findings of alternative splicing have relied on the use of ESTs database, which may underestimate alternatively spliced transcripts because of their incomplete coverage and lack of information regarding combinations of exons that are utilized in the process. In C. elegans, a gene Y95B8A.12 encoding RhoGEF domain is a novel module in the Guanine nucleotide Exchange Factors (GEFs). It encodes a protein of approximately 200 amino acid residue long. RhoGEF’s are regulators of the Rho proteins that act as molecular switches, cycling between inactive (GDP-bound) and active (GTP-bound) states. Genefinder prediction by the C. elegans sequencing consortium of genomic sequence of Y95B8A.12 has reported two spliced variants Y95B8A.12a and Y95B8A.12b that arise as a result of alternative splicing in the pre-mRNA.

Detailed analysis of the gene Y95B8A.12 using bioinformatics tools e.g. various gene/exon finding programmes, we predicted the existence of at least four alternatively spliced variants Y95B8A.12a, Y95B8A.12b, Y95B8A.12c and Y95B8A.12d, including two variants reported by C. elegans sequencing consortium. These were subsequently confirmed by the presence of different transcripts by RT-PCR using gene specific primers and RNA isolated from mixed population of C. elegans. All these new spliced transcripts arise due to alternative splicing of Y95B8A.12 pre-mRNA in 5' untranslated region. These new unreported spliced variants which were not detected earlier point towards the complex mechanism of alternative splicing in C. elegans genes and their role in downstream regulatory steps. Further studies in this direction will enhance our knowledge in the biological and functional significance of these spliced transcripts.

bioinfromatics researchers

Alternatively spliced new variants of RhoGEF domain containing gene in Caenorhabditis elegans

Luv Kashyap and Mohammad Tabish*
Department of Biochemistry, Faculty of Life Sciences, A.M. University, Aligarh, U.P. 202002, India

Abstract
Alternative splicing is an important means of regulating gene expression that determines cell fate in various organisms such as sexual differentiation in drosophila and apoptosis
in mammals. An aberrant regulation of alternative splicing has also been implicated in human disease. Analysis of genomic sequence and transcripts data from human and other organisms suggested widespread occurrence of alternatively spliced transcripts in almost all higher eukaryotic organisms including Caenorhabditis elegans, Drosophila, Mouse and Human. Most of the findings of alternative splicing have relied on the use of ESTs database, which may underestimate alternatively spliced transcripts because of their incomplete coverage and lack of information regarding combinations of exons that are utilized in the process. In C. elegans, a gene Y95B8A.12 encoding RhoGEF domain is a novel module in the Guanine nucleotide Exchange Factors (GEFs). It encodes a protein of approximately 200 amino acid residue long. RhoGEF’s are regulators of the Rho proteins that act as molecular switches, cycling between inactive (GDP-bound) and active (GTP-bound) states. Genefinder prediction by the C. elegans sequencing consortium of genomic sequence of Y95B8A.12 has reported two spliced variants Y95B8A.12a and Y95B8A.12b that arise as a result of alternative splicing in the pre-mRNA.

Detailed analysis of the gene Y95B8A.12 using bioinformatics tools e.g. various gene/exon finding programmes, we predicted the existence of at least four alternatively spliced variants Y95B8A.12a, Y95B8A.12b, Y95B8A.12c and Y95B8A.12d, including two variants reported by C. elegans sequencing consortium. These were subsequently confirmed by the presence of different transcripts by RT-PCR using gene specific primers and RNA isolated from mixed population of C. elegans. All these new spliced transcripts arise due to alternative splicing of Y95B8A.12 pre-mRNA in 5' untranslated region. These new unreported spliced variants which were not detected earlier point towards the complex mechanism of alternative splicing in C. elegans genes and their role in downstream regulatory steps. Further studies in this direction will enhance our knowledge in the biological and functional significance of these spliced transcripts.

Friday, June 30, 2006

bioinfromatics researchers

New findings from researchers at UT Southwestern Medical Center help explain how the 20,000 to 25,000 genes in the human genome can make the hundreds of thousands of different proteins in our bodies. Genes are segments of DNA that carry instructions for making proteins, which in turn carry out all of life's functions. Through a natural process called "alternative splicing," information contained in genes is modified so that one gene is capable of making several different proteins. "Alternative splicing is a key mechanism for achieving a diverse range of proteins, which contributes to the complexity of higher organisms," said Dr. Harold "Skip" Garner, professor of biochemistry and internal medicine at UT Southwestern and senior author of a new study aimed at understanding how and why alternative splicing occurs in humans. The study is available online and will be published in the April 15 issue of the journal Bioinformatics. Errors in alternative splicing can result in truncated or unstable proteins, some of which are responsible for human diseases such as prostate cancer and schizophrenia, Dr. Garner said. But errors also can result in proteins with new functions that help drive evolutionary changes. "Alternative splicing appears to occur in 30 percent to 60 percent of human genes, so understanding the regulatory mechanisms guiding the process is fundamentally important to almost all biological issues," said Dr. Garner. Alternative splicing can be likened to alternative versions of a favorite cookie recipe. If the original recipe (the gene) calls for raisins, walnuts and chocolate chips, and you copy the recipe but leave out the raisins, you'll still get a cookie (protein) from your version, just a different cookie. Omit a necessary ingredient, such as flour, and you'll have a mess (nonfunctioning or malfunctioning protein). Similarly, the information in genes is not directly converted into proteins, but first is copied by special enzymes into RNA, or more specifically, pre-messenger RNA. While the entire gene is copied into pre-mRNA, not all of that information will be used to make a protein. RNA segments called exons carry the protein-making information, while the segments between exons, called introns, are snipped out of pre-mRNA by special proteins. Exons also may be snipped out. Once snipping is complete, the remaining exons are spliced back together to form a fully functional, mature mRNA molecule, which goes on to create a protein. Using computers, the UT Southwestern researchers scanned the human genome and found that the presence of certain DNA sequences called "tandem repeats" that lie between exons are highly correlated with the process of alternative splicing. They found a large number of tandem repeats on either side of exons destined to be spliced out of the pre-mRNA. The tandem repeat sequences also were complementary and could bind to each other. "The complementary tandem repeat sequences on either side of an exon allow the DNA to loop back on itself, bind together, pinch off the loop containing a particular exon and then splice it out," Dr. Garner explained. The chemical units that make up an organism's DNA are abbreviated with the letters A, C, T and G. Strings of these letters form genes and spell out genetic instructions. Tandem repeats have DNA sequences with the same series of letters repeated many times, such as CACACACACACA. Tandem repeats are "hot spots" where errors can easily be made during the copying process; for example, an extra CA could be added or deleted from the correct sequence. These errors could then result in a gene improperly splicing out an exon, thus making the wrong protein, Dr. Garner said. His research group has previously shown that these sequences are highly variable in cancer, and he said the new findings could go a long way toward understanding the genetic nature of how cancers start and progress. "With this new understanding, we can now predict all genes that can re-arrange in this way and even predict which might splice improperly, resulting in disease," he said.

bioinfromatics researchers

ISMB2006 Alternative Splicing Special Interest Group meetingAugust 4-5, 2006

The organizers of AS-SIG would like to invite you to participate in the second ISMB Special Interest Group meeting on Alternative Splicing, on August 4-5, 2006 at Fortaleza, Brazil. This workshop is scheduled immediately before ISMB2006, Aug. 6-10, 2006 and is jointly sponsored by the state of Sao Paulo, Brazil and ISCB.
AS-SIG 2006 follows on from the highly successful Alternative Splicing SIG meeting held at the Pacific Symposium of Biocomputing, 2004, and the ISMB2005 AS-SIG meeting.Background and Aims
Alternative splicing generates multiple products from a single eukaryotic gene and is a major mechanism responsible for diversity in the transcriptome of higher organisms, using combinations of "genes in pieces" to assemble transcripts.This is an exciting time in the field of alternative splicing, combining new discoveries from genomics, bioinformatics and molecular biology. Long considered to be an interesting but less common form of regulation, alternative splicing has emerged as a ubiquitous mechanism of regulation, thanks to genome analysis of human and other higher organisms. Whereas the Human Genome Project has produced a net result of 25,000 – 30,000 genes, alternative splicing evidently produces over 100,000 distinct transcript forms. Identifying, quantifying and analyzing the regulation, function and evolution of these forms constitutes a “Human Transcriptome Project”, and will require as remarkable and as concerted an effort as the Human Genome Project. Above all, it will require close collaboration between bioinformaticists and experimentalists, to build a community of shared tools, databases, nomenclature and standards that permit everyone to contribute what they do best, while benefiting from what everyone else has done. The AS-SIG aims to establish a permanent forum for bioinformaticists and experimentalists to come discuss collaboratively what needs to be done in transcriptomics. AS-SIG will address the latest results and questions in this exciting field, and to bring together bioinformaticists and experimentalists, focusing on questions that demand their collaborative inputs.
Besides oral and poster presentation sessions, the workshop will have a panel discussion on human transcriptome analysis.
The purpose of this SIG is to cover the latest results and questions in this exciting field, and to bring together bioinformaticists and experimentalists, focusing on questions that demand their collaboration.The SIG will include studies of alternative splicing both in human and other organisms, and will consist of two days of talks (approximately 20 minutes each), and a poster session.
Sessions will focus on the following major themes:
Bioinformatics: algorithms and analysis of alternative splicing, including topics such as analysis of alternative splicing evidence, products, and functional impact; comparative genomics; alternative splicing regulation; and data-mining.
Biology: Biological mechanisms of splicing and regulation; biological functions such as the impact of splice variants on protein structure and biological pathways; phenomena such as nonsense-mediated decay and disease associations.
Splicing and Diseases: Identification and characterization of splice variants as a consequence of disease; diagnostic tools and therapeutic stratagies based on splicing pattern variations between normal and diseases states; classification of splice forms based on disease progression.
Databases, and Standards for the “Human Transcriptome Project”: Transcript repositories; data interchange formats; standards for annotating the transcriptome.
Venue, Registration and Accommodation
The meeting will be held at the Fortaleza Convention Center, Fortaleza, Brazil.
AS-SIG registration will be along with ISMB2006 registration.
ISMB2006 conference rates have been extended to cover pre-conference days required to attend AS-SIG. For more information on hotels offering special rates, please see ISMB2006 Housing. Hotel reservation will have to be made along with the SIG registration.
Organizing Committee
Prof. Shoba Ranganathan, Organizing Chair
Macquarie University, Sydney, Australia
Prof. Sandro de Souza, Co-Chair
Ludwig Insitute of Cancer Research, Sao Paulo, Brazil
Prof. Roderic Guigo, Co-Chair
Institut Municipal d'Investigacio Medica, Barcelona, Spain

bioinfromatics researchers



bioinfromatics researchers


Tuesday, June 20, 2006

bioinfromatics researchers

ABC OF Alternative Splicing ?
Transcription
Transcription is a process in which one DNA strand is used as a template to synthesize a complimentary RNA. The DNA strand which serves as the template is called template strand, while the other DNA strand is termed as coding strand Figure 2.2 illustrates the process of transcription. Since both DNA coding strand and RNA strand are complimentary to the template strand , they have identical sequence except that the nucleotide “T” in the DNA coding strand is replace by the nucleotide “U” in the RNA strand.

The process of transcription consists of four essential steps:
1. Unwinding of the DNA double helix: The DNA double helix needs to be unwinded so that it is accessible to the transcription machinery. In case of prokaryotes, the polymerase themselves direct the unwinding activity. However, for eukaryotes the enzyme helicases catalyzes the unwinding of the DNA double helix






FIGURE 2---------TRANSCRIPTION





2. Binding of polymerase to the initiation site: The binding of polymerase to the initiation site is a highly regulated process. It involves several proteins (transcription factors) that bind to the DNA in the proximity to the transcription start site called promoter region. Some of these proteins bind selectively to regulatory motifs. The motifs in the promoter region are different genes. Therefore, a combinatorial binding of the transcription factors to the promoter region implicates the regulation of expression of the individual genes. Figure 2.3 illustrates the current knowledge about the transcriptional regulation machinery.

3. Synthesis of RNA based on the sequence of the DNA template strand: The synthesis of RNA involves the catalytic activity of enzymes called RNA polymerases. In prokaryotes, transcription is carried out by single type of RNA polymerase or core enzyme .In this case, the promoter specificity of RNA polymerase can be altered by different types of sigma factors, which bind to the core enzyme to form a holoenzyme. In eukaryotes, most protein-coding genes are transcribed by RNA polymerase II (Pol II)

4. Termination of synthesis: The poly-adenylation site marks the end of the transcript. The RNA polymerase changes its elongation capacity as it passes the poly-adenylation signal, which finally leads to the termination of transcription.

Post transcriptional modification
Splicing of pre-mRNA
In most eukaryotic genes, the product of transcription is only a precursor molecule called pre-mRNA. This pre-mRNA is subject to the process of splicing which removes the no-coding or the junk region referred to as” introns”. The introns are marked by splice signals that allow their identification by the splicing machinery. Subsequently, the remaining sequence blocks called exons are joined together to form the mature messenger RNA.

Splicing signals in all eukaryotes, introns contain and are bordered by highly conserved sequences called the spice signals. The most conserved splice signals bordering the introns are the di-nucleotides GU and AG at the 5’ and the 3’ end, respectively. These are considered as the consensus splice signals (GU-AG rule). A second splice signal is the presence of an adenosine moiety within the intron (called the branch site). Usually , there is also a U-rich sequence (poly –pyrimidine tract) between the branch site and the 3’ splice site . figure 2.3 shows the mechanism of splicing
Pre m-RNA splicing is a precisely regulated process. The process begins with the ordered assembly of several small nuclear ribonucleoproteins (snRNP) molecules as well as some non-snRNPs on the pre-mRNA. These proteins identify the splice signals and form the core of spliceosome complex . Subsequent assembely of several additional proteins on this core leads to the formation of splicesome. The entire splicing process involves two major stages:

1.Formation of the commitment complex: the process of spliceosome formation starts with the identification of the 5’ splice signal , the branch site and the poly-pyrimidine tract by the U1,U2 and U2AF snRNPs respectively . Another snRNP(U5) interacts with the exons surrounding the intron boundaries . This is followed by the binding of other snRNPs(U4 and U6) as well as some non-snRNPs (RNA helicases and SR proteins) to form the core of spliceosome compex. Subsequently , more than 60 additional proteins assemble on this core and form the spliceosome( Stevens et al. 2002) , reviewed in Burge et al.,1999, Will and luhrmann 2001. Dynamic interactions between the spliceosomal proteins and the pre-mRNA bring the reactive sites in close proximity , thereby creating catalytic sites for trans-esterification reactions.
Figure 4: The assembled commitment complex during pre-mRNA splicingThis complex can be converted into the active spliceosome and involves the recognition of the 5' splice site by U1 snRNP and the branch-point sequence and 3' splice site by SF1 and U2AF, respectively with the aid of SR proteins. The arrows indicate that interactions may have to occur across introns (intron bridging – red arrows) or across exons (exon bridging - blue arrows) in order to achieve correct pairing of 5’ and 3’ splice sites.


2.The trans-esterification reactions: After the formation of the commitment complex, the cleavage of introns and joining of exon ends proceeds via a couple of trans-esterification reactions. First the 5’exon is cleaved and the 5’ end of the intron joins the branch point , creating an intron lariat structure. In a second step , the free 3’ end of the 5’ exon connects to the downstream exon leading to exon ligation and subsequent release of the intronic sequence.
Figure 5: The two trans-esterification reactions during pre-mRNA splicingThe first step involves cleavage at the 5’ splice site to yield a 5’ exon intermediate (exon 1) with a free 3’ OH group. Simultaneously, the 5’ end of the intron is joined, via a 2’ – 5’ phosphodiester bond, to the branch-point adenosine residue within the intron. This forms the lariat intermediate containing the intron with the attached 3’ exon (exon 2). During the second step, the lariat intermediate is cleaved at the 3’ splice site and the two exons are ligated together via a 3’ – 5’ phosphodiester bond. This results in the two products: spliced exons and the lariat intron. The spliced mRNA is exported into the cytoplasm and the lariat intron is degraded in the nucleus.



Figure 6: The mammalian consensus sequences at the 5’ splice site and the 3’ splice site in the pre-mRNA The initial steps in the spliceosomal assembly are directed by several consensus sequences in the pre-mRNA. These sequences are located at the intron-exon junctions and in the intron. The 5’ splice site is defined by the consensus sequence - MAG/GURAGU (M = A or C; R = A or G and the / indicates the exon – intron junction). The 3’ splice site is defined by three sequence elements going 5’ to 3’: the branch site (YNYURAC, where A is indicates the adenosine used to form the lariat intermediate structure during splicing; Y = U or C; N = A or G or U or C) the polypyrimidine tract, and the 3’ splice site consensus (YAG/G; Y = U or C). The branchpoint consensus sequence is usually located 18 to 38 nucleotides upstream of the 3’ splice site.



Figure------ mechanism of splicing





Alternative splicing
In all eukaryote and higher organisms, genes have a typical intron-exon arrangement. Exons (expressed sequences) contain Coding regions of the protein while Introns (interspersed sequences) are the non-coding regions that are removed after transcription. But what is an exon and what is an intron is not yet decided because this decision is made during a process which occurs just after the transcription of DNA into an immature pre-m RNA, during which introns are cut out by a process referred to as splicing, a huge RNA–protein complex called a spliceosome (Will and Luhrmann 2001; Newman 2001; Valadkhan and Manley 2001) recognizes conserved sequences (splice sites) at the intron exon boundaries and perform what is called “Splicing”, since their can be several combinatorial ways of removal of these introns thus several mature m-RNA can be produced from the same transcript. Thus alternative splicing is defined as creation of multiple m RNA products from a single gene product (Black 2000; Graveley 2001) or AS takes place when the introns of a certain pre-m RNA can be sliced (removed) in more than one way, yielding several possibilities i.e. mature m RNA from the same gene. (Fig 1)
Alternative splice events that affect the protein coding region of the mRNA will give rise to proteins which differ in their sequence and therefore in their activities. Alternative splicing within the non-coding regions of the RNA can result in changes in regulatory elements such as translation enhancers or RNA stability domains, which may have a dramatic effect on the level of protein expression (see the figure impact of alternative splicing). It is therefore important that the regulation of RNA splicing is at a comparable level to that observed for RNA transcription or translation. This is the case as RNA splicing occurs within a tightly regulated, multi-component molecular, machines called spliceosomes, which is under the control of intra- and extra-cellular signalling pathways.

Figure 2: The impact of alternative RNA splicingAlternative splicing can occur in any region of the nascent messenger RNA, in the 3’ or 5’ untranslated regions (UTRs) or in the protein coding sequence. The 5’UTR sequence contains regulatory regions that control protein expression. Insertion or deletion of these regions will have a consequence on protein expression. The 3’ UTR region contains mRNA stability domains. Insertion or deletion of these domains have consequences on mRNA stability and therefore protein expression. Alternative splicing within the protein coding sequence results in altered protein structure and function.



Alternative Splicing is an important mechanism for modulation and fine tuning of gene. It is a powerful and versatile regulatory mechanism that can affect quantitative control of gene expression and functional diversity of proteins. Alternative Splicing is widespread in almost all higher eukaryotes with most information from well studied organisms like C.elegans ( Park et al .,2003; Tabish et al.,1999) , Drosophila , mouse( Hirano et al.,2004; Tabish and Ticku,2004) and humans., thus emphasizing the importance of alternative splicing throughout evolution E.g. One of the most dramatic examples of alternative splicing is the Dscam gene in Drosophila. (Celotto and Graveley 2001; Schumuker et al., 2000). This single gene contains some 116 exons of which 17 are retained in the final mRNA. Some exons are always included; others are selected from an array. Theoretically this system is able to produce 38,016 different proteins. And, in fact, over 18,000 different ones have been found in Drosophila hemolymph. These Dscam proteins are involved in guiding neurons to their proper destination and probably, recognition and phagocytes of invading bacteria so, whether a particular segment of RNA will be retained as an exon or excised as an intron can vary under different circumstances. The estimated frequency of Alternative Splicing in humans genes increased dramatically from 5%(Sharp ,1994) to 59%(Consortium 2001),recently Bioinformatics analysis has revealed that as many as 74% of human genes undergo Alternative Splicing (Johnson et al. 2003). So Alternative Splicing in now known as a Rule and not as exception .On an average, a human gene generates 2 to 3 transcripts, so taking this fact into consideration we can say that Alternative Splicing is the best mechanism to explain why number of human proteins (~90,000) far exceeds the number of known protein coding genes (~26,000) .About half of all mammalian genes are estimated to have more than one spliced form (Brett et al.,2002; Mironov et al.,1999,2001; Xu et al., 2002; Sorek et al.,2004; Modrekk and Lee 2003; Resch et al., 2004). Alternative splicing is controlled by the binding of trans-acting protein factors to cis-acting sequences within the pre-mRNA leading to differential use of splice sites. Many such sequences have been identified and are grouped as either enhancer or suppressor elements (for a review on splicing regulatory elements, see Ladd and Cooper 2002). These elements are generally short (8-10 nucleotides long) and are even less conserved than those present at exon-intron junctions. Control of alternative splice site recognition is mediated by members of the SR (serine rich) protein family of splicing factors which bind to the splicing enhancer and inhibitor elements. The interactions of these proteins with the pre-mRNA substrate and with snRNP proteins have been intensively studied. Their role in regulating splice site selection is believed to occur in two (perhaps non-exclusive) modes – arginine-serine (RS) domain dependent and RS domain independent (Cartegni et al., 2002)




Figure demonstrating alternative splicing


Types of alternative splicing




Figure for types of alternative splicing

bioinfromatics researchers

What is the HAPMAP?
The HapMap is a catalog of common genetic variants that occur in human beings. It describes what these variants are, where they occur in our DNA, and how they are distributed among people within populations and among populations in different parts of the world. The International HapMap Project is not using the information in the HapMap to establish connections between particular genetic variants and diseases. Rather, the Project is designed to provide information that other researchers can use to link genetic variants to the risk for specific illnesses, which will lead to new methods of preventing, diagnosing, and treating disease.Figure 1: When DNA sequences on a part of chromosome 7 from two random individuals are compared, two single nucleotide polymorphisms (SNPs) occur in about 2,200 nucleotides.The DNA in our cells contains long chains of four chemical building blocks -- adenine, thymine, cytosine, and guanine, abbreviated A, T, C, and G. More than 6 billion of these chemical bases, strung together in 23 pairs of chromosomes, exist in a human cell. (See http://www.dnaftb.org/dnaftb/ for basic information about genetics.) These genetic sequences contain information that influences our physical traits, our likelihood of suffering from disease, and the responses of our bodies to substances that we encounter in the environment.The genetic sequences of different people are remarkably similar. When the chromosomes of two humans are compared, their DNA sequences can be identical for hundreds of bases. But at about one in every 1,200 bases, on average, the sequences will differ (Figure 1). One person might have an A at that location, while another person has a G, or a person might have extra bases at a given location or a missing segment of DNA. Each distinct "spelling" of a chromosomal region is called an allele, and a collection of alleles in a person's chromosomes is known as a genotype.Differences in individual bases are by far the most common type of genetic variation. These genetic differences are known as single nucleotide polymorphisms, or SNPs (pronounced "snips"). By identifying most of the approximately 10 million SNPs estimated to occur commonly in the human genome, the International HapMap Project is identifying the basis for a large fraction of the genetic diversity in the human species.For geneticists, SNPs act as markers to locate genes in DNA sequences. Say that a spelling change in a gene increases the risk of suffering from high blood pressure, but researchers do not know where in our chromosomes that gene is located. They could compare the SNPs in people who have high blood pressure with the SNPs of people who do not. If a particular SNP is more common among people with hypertension, that SNP could be used as a pointer to locate and identify the gene involved in the disease.However, testing all of the 10 million common SNPs in a person's chromosomes would be extremely expensive. The development of the HapMap will enable geneticists to take advantage of how SNPs and other genetic variants are organized on chromosomes. Genetic variants that are near each other tend to be inherited together. For example, all of the people who have an A rather than a G at a particular location in a chromosome can have identical genetic variants at other SNPs in the chromosomal region surrounding the A. These regions of linked variants are known as haplotypes (Figure 2).In many parts of our chromosomes, just a handful of haplotypes are found in humans. [See The Origins of Haplotypes.] In a given population, 55 percent of people may have one version of a haplotype, 30 percent may have another, 8 percent may have a third, and the rest may have a variety of less common haplotypes. The International HapMap Project is identifying these common haplotypes in four populations from different parts of the world. It also is identifying "tag" SNPs that uniquely identify these haplotypes. By testing an individual's tag SNPs (a process known as genotyping), researchers will be able to identify the collection of haplotypes in a person's DNA. The number of tag SNPs that contain most of the information about the patterns of genetic variation is estimated to be about 300,000 to 600,000, which is far fewer than the 10 million common SNPs.Once the information on tag SNPs from the HapMap is available, researchers will be able to use them to locate genes involved in medically important traits. Consider the researcher trying to find genetic variants associated with high blood pressure. Instead of determining the identity of all SNPs in a person's DNA, the researcher would genotype a much smaller number of tag SNPs to determine the collection of haplotypes present in each subject. The researcher could focus on specific candidate genes that may be associated with a disease, or even look across the entire genome to find chromosomal regions that may be associated with a disease. If people with high blood pressure tend to share a particular haplotype, variants contributing to the disease might be somewhere within or near that haplotype

Monday, June 19, 2006

bioinfromatics researchers

IS ALTERNATIVE SPLICING :THE CASUE OF AUTO-IMMUNE DISEASES
Alternative splicing - a natural method by which a single gene makes different forms of proteins - could be the key to development of autoimmmune diseases, such as lupus, rheumatoid arthritis, or type 1 diabetes, said a Baylor College of Medicine researcher in this month's issue of the Journal of Allergy and Clinical Immunology. Alternative splicing multiplies the coding capacity of genes, which provides unparalleled complexity to the transcriptome and proteome. This increased complexity has clear repercussions for the regulation of gene expression in many organisms and for the balance between human health and disease. alternative splicing results in a form of protein that is sufficiently different from that made in early development, the immune system might mistake it for a foreign protein and initiate an attack that results in an immune disease.


When these isoforms (different protein forms) get above a threshold of difference, immune tolerance to self-proteins breaks down. The immune system starts to attack the proteins and the cells in which they are found,

bioinfromatics researchers

DATAS Technology-----NEW TECHNOLOGY FOR ALTERNATIVE SPLICING DETECTION
DATAS technology allows the identification of all the functionally distinct mRNA variants that are differentially expressed between any two biologic samples (healthy vs. diseased) or cell lines treated by a compound.
Click on the figure for larger image

Unlike EST sequencing, DATAS gives access to the middle of the genes and splice events that affect the coding sequence. Indeed, bioinformatic analysis of our entire DATAS fragment database indicates that greater than 75% of the DATAS clones sequenced to date that derive from known genes overlap the coding sequence of those genes. Our analysis also indicates that greater than 80% of the known genes identified by DATAS have homology to ESTs that are alternatively spliced, indicating that DATAS enriches for alternatively spliced genes above the background of 60% detected in EST databases. By directly targeting alternatively spliced mRNAs, the DATAS technology gets around the limitations of EST mapping and gene prediction programs to find novel exons in the genomic DNA and delivers splicing events specific to the samples analysed. DATAS as well as certain uses of identified alternative splicing events isolated using DATAS are protected under US Pat. No. 6,251,590 as well other patents granted and pending worldwide.

Friday, May 05, 2006

bioinfromatics researchers

Why Do We Need Our Introns?
Ninety five percent of human genomic DNA does not code for proteins or functional RNA molecules, and is frequently referred to as “junk” or “selfish” DNA. The vast majority of this noncoding DNA has no documented role in the cell. However, according to recent analyses, three quarters of the human genome is transcriptionally active. We discuss whether the expression of non-coding genomic sequences is valuable for the cell or if it is a second-hand “junk” because of the incompleteness in transcriptional machinery organization and functioning. Introns constitute a major fraction of the noncoding DNA, representing over 40% of mammalian genomes. They are ambivalent elements that cause several problems and at the same time bring benefits to their host cells. There is a strong correspondence between the average length of introns and the size of the genome. Here we review the latest summary statistics on human introns, the evolution of introns in mammals, and the distribution of genes that encode functional RNAs within introns. We also suggest that splicing is an important filter for organisms with large genomes, serving to distinguish between functional mRNAs and arbitrary RNA transcripts generated from random loci.

Saturday, April 29, 2006

bioinfromatics researchers: bioinfromatics researchers: January 2006

bioinfromatics researchers: bioinfromatics researchers: January 2006
Abstract for AS-SIG 2006
Hundreds of new alternatively spliced variants discovered in C.elegans genome
Luvkashyap, M.Tabish*
Dept. of biochemistry, faculty of life sciences
AM University, Aligarh, Uttar Pradesh, India
*To whom correspondence should be addressed: luvtabish@gmail.com


INTRODUCTION

The 97 Mb genomic sequence of the eukaryotic, soil dwelling, free living nematode C.elegans is complete and was the first multicellular organism to be sequenced(1) . Deciphering the biological information from these sequenced genomes is of great importance and use, as the information that we get from these C.elegans is directly applicable to more complex organisms like humans because 30-40% of C.elegans genes share a direct homology with human genes(2). Secondly, it has been shown that human genes replace their C.elegans homologs when introduced into transgenic C.elegans. Conversely, many C.elegans genes can function similarly to mammalian genes. Thus studies in C.elegans genes can directly be used for better understanding of human genes. Alternative splicing of pre mRNAs is a powerful and versatile regulatory mechanism that can affect quantitative control of gene expression and functional diversification of proteins (3). It contributes to major developmental decisions and also to fine-tuning of gene function. More than 50% of the genes of recently sequenced eukaryotic genomes (4, 5, 6, 7) are now believed to undergo alternative splicing to generate different transcript and protein isoforms under different developmental, tissue-specific, and disease conditions, thus bringing a new set of challenges to gene prediction programs and the encompassing annotation processes. Alternative splicing is found extensively in all higher eukaryotes with most information from well-studied organisms like C.elegans, Drosophila, Mouse and Humans (8,9,10,11,12). EST based approach for detection of alternatively spliced variants of a gene , was considered the best approach till now but this approach is successful only in case of organism which have extensive EST coverage specially humans. In organisms like C.elegans where the EST coverage is limited and only 1% of the total coding genes have EST coverage , detection of alternatively spliced variants is rather difficult , so several scientist have successfully used Bioinformatics methods a mixed approach involving computational analysis and experimental verification to detect these spliced variants in C.elegans (13), mouse(14) etc.

RESULTS
Taking motivation from these experiments we did complete analysis of chromosome 1 of C.elegans to look for new exons and genes encoded by chromosome 1 using a combination of various gene finding, exon predicting, ORF finding programmes and other bioinformatics tools to predict alternatively spliced transcripts in C.elegans genes. We found roughly 120-150 new alternatively spliced variant and exons from chromosome 1 analysis, to experimentally verify our findings we have done RT-PCR experiments for few of the predicted spliced variants of the genes. . I am sure that using this methodology if we continue our work on the full genome of C.elegans, we would get around 1000-1200 new alternatively spliced variants .These new coding sequences, not annotated or identified earlier will not only enhance the available splice data base of C.elegans but will also enhance the our knowledge about understanding of the genome structure and evolution of higher eukaryotes specially in context to humans




REFERENCES

1. C. elegans Sequencing Consortium 1998. Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 282, 2012–2018

2 euGenes: Homologous Genes Summary Table August-2005
http://eugenes.org/all/hgsummary.html

3 Lopez AJ. 1998 Alternative splicing of pre-mRNA: developmental consequences and mechanisms of regulation Annu Rev Genet.;32:279-305.

4 Mironov, A.A., Fickett, J.W., and Gelfand, M.S. 1999. Frequent alternative splicing of human genes. Genome Res. 9: 1288-1293

5 Kan, Z., Rouchka, E.C., Gish, W.R., and States, D.J. 2001. Gene structure prediction and alternative splicing analysis using genomically aligned ESTs. Genome Res. 11: 889-900

6 Modrek, B., Resch, A., Grasso, C., and Lee, C. 2001. Genome-wide detection of alternative splicing in expressed sequences of human genes. Nucleic Acids Res. 29: 2850-2859

7 Zavolan, M., van Nimwegen, E., and Gaasterland, T. 2002. Splice variation in mouse full-length cDNAs identified by mapping to the mouse genome. Genome Res. 12: 1377-1385

8 Modrek, B. and Lee, C. 2003 Alternative splicing in the human, mouse and rat genomes is associated with an increased frequency of exon creation and/or loss. Nature Genetics 34: 177-180.


9 Black, D. L. 2000 Protein diversity from alternative splicing: a challenge for bioinformatics and post-genome biology. Cell 103:367-370

10. Graveley, B. R.. 2001 Alternative splicing: increasing diversity in the proteomic world. Trends Genet. 17:100-107

11 .Schmucker, D., J. C. CLEMENS, H. SHU, C. A. WORBY, and J. XIAO et al., 2000 Drosophila Dscam is an axon guidance receptor exhibiting extraordinary molecular diversity. Cell 101:671-684

12. Alternative splicing in C. elegans Alan M. Zahler http://www.wormbook.org/chapters/www_altsplicing/altsplicing.html
13 Mohammad TABISH, Roger A. CLEGG, Huw H. REES and Michael J. FISHER (1999) Organization and alternative splicing of the Caenorhabditis eleganscAMP-dependent protein kinase catalytic-subunit gene (kin-1) Biochem. J. 339 :209–216


14 Tabish M, Ticku MK 2004. Alternate splice variants of mouse NR2B gene Neurochem Int. ;44:339-43