"Frontmatter". In: Plant Genomics and Proteomics


Download 1.13 Mb.
Pdf ko'rish
bet68/87
Sana23.02.2023
Hajmi1.13 Mb.
#1225741
1   ...   64   65   66   67   68   69   70   71   ...   87
Bog'liq
Christopher A. Cullis - Plant Genomics and Proteomics-J. Wiley & Sons (2004)

P
ROTEIN
C
HARACTERIZATION AND
C
OMPARISONS
Again, as with nucleic acid comparisons, a large number of tools are 
available. One listing with a short description is at http://www. 
bioinformatik.de/cgi-bin/browse/Catalog/Software/Online_Tools/.
1 8 0
9. B
I O I N F O R M AT I C S
TABLE 9.3. C
LUSTERING OF
EST D
ATA FROM
TIGR 
AND
P
LANTS
GDB
Source of 
Total 
Assembled
EST
EST
data
Plant
ESTs
ESTs
Contigs
Singlets
Plant GDB
Zea mays
206,015
203,358
21,063
19,350
TIGR Gene
Zea mays
192,436
173,826
20,459
15,147
index
Plant GDB
Arabidopsis
178,538
178,464
19,874
29,282
TIGR Gene
Arabidopsis
232,136
216,159
22,485
15,977
index
Plant GDB
Wheat
415,818
415,642
29,933
77,623
TIGR Gene
Wheat
415,125
343,891
38,548
71,234
index
Assembled ESTs are different in the two sets of data. For the TIGR gene index the assembled
number is those ESTs included in contigs, whereas for PlantGDB it is all those ESTs used in
the contigs and singletons.


Prosite is a database of protein families and domains (http://us.expasy.org/
prosite/) that also makes available an extensive suite of proteomics tools.
The database consists of biologically significant sites, patterns, and profiles
that help to reliably identify to which known protein family (if any) a new
sequence belongs. The underlying basis is that the many different proteins
can be grouped into a limited number of families on the basis of similarities
in their sequences. Proteins or protein domains belonging to a particular
family generally share attributes important for the function of the protein
and/or for the maintenance of its three-dimensional structure. This type of
analysis of such domains can lead to a protein signature that can be used to
assign a newly sequenced protein to a specific family of proteins and thus
to formulate hypotheses about its function. Prosite currently contains pat-
terns and profiles specific for more than 1000 protein families or domains.
Each of these signatures comes with documentation providing background
information on the structure and function of these proteins. However, the
programs, sequence motifs, and domains defined in Prosite and other
protein databases have been developed and trained primarily on fungal and
animal proteins. Many of these motifs and domains may differ significantly
in plants, and so any functional inferences may be substantially improved
by retraining the motif descriptions with plant sequences. As the character-
ization of plant proteins continues it is expected that descriptions for novel
plant-specific sequence motifs will be discovered and be useful in predict-
ing the function of unknown plant proteins. 
In the same way that there are specialized nucleic acid databases
there are also specialized protein sites. The PlantsP database (http://
plantsp.sdsc.edu/), for example, is dedicated to understanding phosphory-
lation processes in plants, because protein phosphorylation and dephos-
phorylation are fundamental to cellular regulation. The protein kinase and
protein phosphatase families in Arabidopsis contain more than 1300
members. The same site has information on the rice protein kinases, where
each protein has been assigned to a class, a group, and a family. The assign-
ments are based on the PlantsP Kinase Classification (PPC) (a bottom-up 
systematic classification based on sequence comparisons using the entire
sequence so that sequences that share domains outside of the kinase catalytic
domain should cluster together before sequences that only have the catalytic
domain in common) based on BLAST searches with an E value cutoff of 1e
–30
.
These assignments result in five groups:
∑ Clear (assignment is unambiguous) 
∑ Strong (assignment is highly likely)
∑ Weak (assignment is to best group, but E values were >1e
-50
)
∑ Mixed (assignment to more than 1 group possible)
∑ No assignment for proteins that do not match at an E value <1e
-30

P
R O T E I N
C
H A R A C T E R I Z AT I O N A N D
C
O M PA R I S O N S
1 8 1



Download 1.13 Mb.

Do'stlaringiz bilan baham:
1   ...   64   65   66   67   68   69   70   71   ...   87




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling