"Frontmatter". In: Plant Genomics and Proteomics
Download 1.13 Mb. Pdf ko'rish
|
Christopher A. Cullis - Plant Genomics and Proteomics-J. Wiley & Sons (2004)
P
ROTEIN C HARACTERIZATION AND C OMPARISONS Again, as with nucleic acid comparisons, a large number of tools are available. One listing with a short description is at http://www. bioinformatik.de/cgi-bin/browse/Catalog/Software/Online_Tools/. 1 8 0 9. B I O I N F O R M AT I C S TABLE 9.3. C LUSTERING OF EST D ATA FROM TIGR AND P LANTS GDB Source of Total Assembled EST EST data Plant ESTs ESTs Contigs Singlets Plant GDB Zea mays 206,015 203,358 21,063 19,350 TIGR Gene Zea mays 192,436 173,826 20,459 15,147 index Plant GDB Arabidopsis 178,538 178,464 19,874 29,282 TIGR Gene Arabidopsis 232,136 216,159 22,485 15,977 index Plant GDB Wheat 415,818 415,642 29,933 77,623 TIGR Gene Wheat 415,125 343,891 38,548 71,234 index Assembled ESTs are different in the two sets of data. For the TIGR gene index the assembled number is those ESTs included in contigs, whereas for PlantGDB it is all those ESTs used in the contigs and singletons. Prosite is a database of protein families and domains (http://us.expasy.org/ prosite/) that also makes available an extensive suite of proteomics tools. The database consists of biologically significant sites, patterns, and profiles that help to reliably identify to which known protein family (if any) a new sequence belongs. The underlying basis is that the many different proteins can be grouped into a limited number of families on the basis of similarities in their sequences. Proteins or protein domains belonging to a particular family generally share attributes important for the function of the protein and/or for the maintenance of its three-dimensional structure. This type of analysis of such domains can lead to a protein signature that can be used to assign a newly sequenced protein to a specific family of proteins and thus to formulate hypotheses about its function. Prosite currently contains pat- terns and profiles specific for more than 1000 protein families or domains. Each of these signatures comes with documentation providing background information on the structure and function of these proteins. However, the programs, sequence motifs, and domains defined in Prosite and other protein databases have been developed and trained primarily on fungal and animal proteins. Many of these motifs and domains may differ significantly in plants, and so any functional inferences may be substantially improved by retraining the motif descriptions with plant sequences. As the character- ization of plant proteins continues it is expected that descriptions for novel plant-specific sequence motifs will be discovered and be useful in predict- ing the function of unknown plant proteins. In the same way that there are specialized nucleic acid databases, there are also specialized protein sites. The PlantsP database (http:// plantsp.sdsc.edu/), for example, is dedicated to understanding phosphory- lation processes in plants, because protein phosphorylation and dephos- phorylation are fundamental to cellular regulation. The protein kinase and protein phosphatase families in Arabidopsis contain more than 1300 members. The same site has information on the rice protein kinases, where each protein has been assigned to a class, a group, and a family. The assign- ments are based on the PlantsP Kinase Classification (PPC) (a bottom-up systematic classification based on sequence comparisons using the entire sequence so that sequences that share domains outside of the kinase catalytic domain should cluster together before sequences that only have the catalytic domain in common) based on BLAST searches with an E value cutoff of 1e –30 . These assignments result in five groups: ∑ Clear (assignment is unambiguous) ∑ Strong (assignment is highly likely) ∑ Weak (assignment is to best group, but E values were >1e -50 ) ∑ Mixed (assignment to more than 1 group possible) ∑ No assignment for proteins that do not match at an E value <1e -30 . P R O T E I N C H A R A C T E R I Z AT I O N A N D C O M PA R I S O N S 1 8 1 |
Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling
ma'muriyatiga murojaat qiling