"Frontmatter". In: Plant Genomics and Proteomics

bet	70/87
Sana	23.02.2023
Hajmi	1.13 Mb.
	#1225741

1 ... 66 67 68 69 70 71 72 73 ... 87

Bog'liq
Christopher A. Cullis - Plant Genomics and Proteomics-J. Wiley & Sons (2004)

I
NTERPRETATION OF
D
ATA
The analysis of sequence data usually results in an extensive output that
must be scrutinized for its applicability to the question being asked. A
BLAST search of nucleotide data will generate many matches at differing E
values (the expected number of chance alignments), and a decision must be
made as to the relevance of each. In other words, what is the appropriate
cutoff point for considering a match to be significant?
The probability of the match being made by chance depends on the size
of the query sequence and the size of the target database. If the aim is to find
other plant sequences that are related to the one under consideration, then
a search of that subset of the sequence databases would be most appropri-
ate. If the aim is to find any related sequence, then a more inclusive set of
sequence databases, or the whole nonredundant sequence database, would
be most appropriate.
N
UCLEIC
A
CID
V
ERSUS
P
ROTEIN
H
OMOLOGY
S
EARCHES
The difference in the search for nucleic acid homology versus protein homol-
ogy also impacts the type of comparison being done. The data in Figure 9.4
I
N T E R P R E TAT I O N O F
D
ATA
1 8 3

1 8 4
9. B
I O I N F O R M AT I C S
FIGURE 9.4.
Blast 2 sequences results version blastn 2.2.5 and part of the output
from blastx of the same flax fragment highlighting the matches found for polyubiq-
uitin from Cucumis melo. One ubiquitin coding unit from flax polyubiquitin gene was
used for the searches. The same flax unit was used with Blast align against the C.
melo polyubiquitin mRNA, partial cds accession number AF436850. Results for
nucleic acid and protein comparisons are shown.

are for one of the repeat units from the flax polyubiquitin gene previously
described (Agarwal and Cullis, 1991). Here the search was done with both
blastn and blastx. The output shows that although many of the related plant
ubiquitin genes are only homologous at 80% of the nucleotides, the proteins
are identical. The Cucumis melo polyubiquitin mRNA is only 79% homolo-
gous to the flax fragment and was not represented in the list of nucleic acid
matches resulting from the blastn analysis, whereas it was present in the
blastx results. Here the E value varies again with the length of the homol-
ogy, so that a match with all 75 of 76 amino acids identical (but 76 out of 76
positive) gives an E value of 1e
–35
, whereas 39 out of 39 identities gives an E
value of 5e
–15
and 28 out of 29 identical (but 29 out of 29 positive) gives an
E value of 3e
–7
.
Similarly, the data from gene prediction programs must be scrutinized
carefully. The data from a flax sequence (the same one used for the Miro-
peats analysis in Figure 9.2) were used with the GeneSeqer program at
www.PlantsGDB.org for the prediction of possible transcripts from this
region. This sequence was from a region of the genome that has an insertion
in some flax varieties. Three possible transcripts were identified, none of
which was very long (174 bases, 86 bases, and 170 bases, respectively). The
possible transcript of 170 bp crossed the boundary between one end of the
insertion sequence and the surrounding region. However, when inspected
only three of the matched bases were included in the coding sequence of the
matched EST, whereas the remaining corresponding nucleotides were
between an A-rich region in the genomic sequence and the polyA tail of the
EST. Again this reinforces the necessity that informatic analysis conclusions,
purely by themselves, not be taken as correct and conclusions not be drawn
on this basis alone.

Download 1.13 Mb.

Do'stlaringiz bilan baham:

1 ... 66 67 68 69 70 71 72 73 ... 87