"Frontmatter". In: Plant Genomics and Proteomics
Download 1.13 Mb. Pdf ko'rish
|
Christopher A. Cullis - Plant Genomics and Proteomics-J. Wiley & Sons (2004)
I
NTERPRETATION OF D ATA The analysis of sequence data usually results in an extensive output that must be scrutinized for its applicability to the question being asked. A BLAST search of nucleotide data will generate many matches at differing E values (the expected number of chance alignments), and a decision must be made as to the relevance of each. In other words, what is the appropriate cutoff point for considering a match to be significant? The probability of the match being made by chance depends on the size of the query sequence and the size of the target database. If the aim is to find other plant sequences that are related to the one under consideration, then a search of that subset of the sequence databases would be most appropri- ate. If the aim is to find any related sequence, then a more inclusive set of sequence databases, or the whole nonredundant sequence database, would be most appropriate. N UCLEIC A CID V ERSUS P ROTEIN H OMOLOGY S EARCHES The difference in the search for nucleic acid homology versus protein homol- ogy also impacts the type of comparison being done. The data in Figure 9.4 I N T E R P R E TAT I O N O F D ATA 1 8 3 1 8 4 9. B I O I N F O R M AT I C S FIGURE 9.4. Blast 2 sequences results version blastn 2.2.5 and part of the output from blastx of the same flax fragment highlighting the matches found for polyubiq- uitin from Cucumis melo. One ubiquitin coding unit from flax polyubiquitin gene was used for the searches. The same flax unit was used with Blast align against the C. melo polyubiquitin mRNA, partial cds accession number AF436850. Results for nucleic acid and protein comparisons are shown. are for one of the repeat units from the flax polyubiquitin gene previously described (Agarwal and Cullis, 1991). Here the search was done with both blastn and blastx. The output shows that although many of the related plant ubiquitin genes are only homologous at 80% of the nucleotides, the proteins are identical. The Cucumis melo polyubiquitin mRNA is only 79% homolo- gous to the flax fragment and was not represented in the list of nucleic acid matches resulting from the blastn analysis, whereas it was present in the blastx results. Here the E value varies again with the length of the homol- ogy, so that a match with all 75 of 76 amino acids identical (but 76 out of 76 positive) gives an E value of 1e –35 , whereas 39 out of 39 identities gives an E value of 5e –15 and 28 out of 29 identical (but 29 out of 29 positive) gives an E value of 3e –7 . Similarly, the data from gene prediction programs must be scrutinized carefully. The data from a flax sequence (the same one used for the Miro- peats analysis in Figure 9.2) were used with the GeneSeqer program at www.PlantsGDB.org for the prediction of possible transcripts from this region. This sequence was from a region of the genome that has an insertion in some flax varieties. Three possible transcripts were identified, none of which was very long (174 bases, 86 bases, and 170 bases, respectively). The possible transcript of 170 bp crossed the boundary between one end of the insertion sequence and the surrounding region. However, when inspected only three of the matched bases were included in the coding sequence of the matched EST, whereas the remaining corresponding nucleotides were between an A-rich region in the genomic sequence and the polyA tail of the EST. Again this reinforces the necessity that informatic analysis conclusions, purely by themselves, not be taken as correct and conclusions not be drawn on this basis alone. Download 1.13 Mb. Do'stlaringiz bilan baham: |
Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling
ma'muriyatiga murojaat qiling