"Frontmatter". In: Plant Genomics and Proteomics
Download 1.13 Mb. Pdf ko'rish
|
Christopher A. Cullis - Plant Genomics and Proteomics-J. Wiley & Sons (2004)
Fugu
(puf ferfish) Map the query sequence.Determine the Fugu rubripes Genome BLAST pag e genomic str uctur e. Identify novel genes. Zebrafish Map the query sequence. Determine the Zebrafish Genome BLAST page genomic str uctur e. Identify novel genes. Arabidopsis thaliana Map the query sequence. Determine the A rabidopsis thaliana BLAST page genomic str uctur e. Identify novel genes. Oryza sativa (rice) Map the query sequence. Determine the O ryza sativa BLAST page genomic str uctur e. Identify novel genes. Anopheles gambiae Map the query sequence. Determine the A nopheles gambiae BLAST page (mosquito) genomic str uctur e. Identify novel genes. Other eukaryotes Map the query sequence. Determine the Other Eukaryotes BLAST page Plasmodium, genomic str uctur e. Identify novel genes. Amoeba, fungi, etc. Micr obial Genomes Map the query sequence or identify novel Micr obial Genome BLAST genes Nucleotide UniV ec Scr een for vector contamination Find matches V ecScr een T race Ar chives to unassembled, raw sequence data T race MEGABLAST Sear ch Fr om http://www .ncbi.nlm.nih.gov/BLAST/pr oducttable.html activity is the annotation of the features of the genome. However, because the task is not charged to a single individual or group, how does the genome annotation get accomplished and how do the various features identified by different individuals get integrated? This can be done in a number of ways. One way is the Distributed Annotation System (DAS). This is a client-server system in which a single machine (the client) gathers genome annotation information from multiple distant websites (the reference and annotation servers), collates that information, and displays it to the user in a single view. Little coordination is needed among the various information providers. Further information regarding DAS can be gained at http://biodas.org. A second method involves a group or small sets of groups who generate the annotation and then release the data at regular intervals. If there are multiple annotations of the same region these can be integrated or displayed independently, as is the case for the Human Genome browser (http://genome.ucsc.edu/cgi-bin/hgGateway?org=human). One of the par- ticularly useful functions is the ability to add a personal track to those already established to incorporate individually important features. Finally, for specific gene families, an annotation jamboree could be an appropriate vehicle for annotation. This is where experts in a gene family get together to annotate as many as possible of the representatives of their particular gene I N F O R M AT I C S T O O L S 1 7 7 Key to graphics The Sequence Inverted Repeat Tandem Repeat Tandem Oligo Repeat Palindrome 1 2 3 4 5 6 7 8 9 KB FIGURE 9.2. Miropeats analysis of a genomic region from flax showing inverted repeats, tandem oligo repeats, and a palindrome. family. Clearly, these approaches are not mutually exclusive. The distributed annotation allows experts in various areas to contribute remotely. The programs that can be used to annotate genomes, and the ways the results are presented, are available at various sites listed in the references for this chapter. Among the most common software tools used are BLAST, FASTA, HMM profiling and motif finding, with Prosite and other pattern/motif combinatorial search tools applied to the various databases listed in Table 9.2. Used alone or in various combinations, they permit the identification and subsequent description of homologous sequences accord- ing to various criteria such as sequence, structural, and functional proper- ties. As the available data set that can be searched increases, more identified genes will be available for training the gene finding programs, thereby making their predictions more accurate. The predicted genes can be searched for in the EST databases as well as experimentally by RT-PCR with primers designed from the annotated genomic sequences. In particular, the use of 1 7 8 9. B I O I N F O R M AT I C S KBases P T P P 100.0 P P T T G GGGGG GG T T T T T T T T T T T FIGURE 9.3. Miropeats analysis of the Arabidopsis thaliana BAC clone F16P2 at a threshold of 150. The organization of genes on BAC F16P2 showing the 3 tandem gene duplications. The glutathione-S-transferase and tropinone reductase genes are labeled G and T, respectively. A smaller duplication of pumilio-like protein (P) is also present. Figure 1.4 converted to a single linear read is given below the Miropeats pattern. primers across predicted splice sites would be a direct test of the accuracy of the informatic analyses. E XPRESSION D ATA At the highest (most complete) level the genome of an organism should be annotated completely, with all the possible features included in this annota- tion. Because this cannot be done directly at the outset, it will be built up by using other information to identify interesting and important regions of the genome. Some of these classes of additional data would include the expression data derived from EST sequences, SAGE, and MPSS TM and from comparative genomics. EST C LUSTERING As described above, the clustering of ESTs is an important function in under- standing the expressed portions of the genomes. One of the sites where this is done is www.tigr.org, where the gene indices are available for 18 differ- ent higher plants (Arabidopsis, barley , cotton, grape, ice plant, lettuce, Lotus, maize, Medicago truncatula, Pinus, potato, rice, rye, Sorghum bicolor, soybean, sunflower, tomato, and wheat). The gene indices at TIGR are the results of the clustering of transcripts into tentative consensus (TC) sequences. These TCs are built with a variety of programs including: ∑ Megablast (Zhang et al., 2000) ∑ CAP3 ∑ Paracel TranscriptAssembler TM ∑ DNA-Protein Search program (dps) developed by Dr. Xiaoqiu Huang An alternative clustering of plant ESTs is available from PlantGDB (http://www.plantgdb.org/). Here the clustering is done with a clustering tool, PaCE, resulting in tentative unigenes (TUGs). The overall data are similar, as can be seen from Table 9.3, although the dates of the relevant assemblies are different, which would explain some of the differences in the numbers in the various classes. What is clear from both of these sites is that there is still a large number of singletons in addition to the assembled TCs/TUGs. This is especially true for the wheat data, with >70,000 singletons in addition to the 29,000+ TCs/TUGs. HarvEST (http://harvest.ucr.edu/) is another EST database-viewing software that emphasizes gene function and is oriented ultimately to com- parative genomics. This software is downloadable from the website. The EST sequences in HarvEST have also been assembled with CAP3. The fully enabled versions of HarvEST allow the user to examine the actual CAP3 I N F O R M AT I C S T O O L S 1 7 9 sequence alignment and so determine whether and where individual sequences deviate from a consensus sequence. Therefore, HarvEST and the TIGR Gene indices are similarly assembled with CAP3, whereas the PlantsGDB data are assembled with different soft- ware (PaCE). F INDING G ENES IN G ENOMIC S EQUENCES The GeneSeqer web service (at the PlantsGDB website) is intended primar- ily for the purpose of performing the spliced alignment of query sequences (sequences representing transcribed genes, i.e., ESTs, cDNAs, and proteins) with a target sequence (genomic DNA). The input sequences can be deter- mined based on similarity to other sequences (both genomic and tran- scribed), or one may already possess an uncharacterized sequence one needs to know more about. An exhaustive alignment of “All Plant” ESTs and cDNAs is possible, or a more efficient approach using the Tentative Unique Gene clusters (TUGs) assembled with the PlantGDB contiging method can be made. Download 1.13 Mb. Do'stlaringiz bilan baham: |
Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling
ma'muriyatiga murojaat qiling