"Frontmatter". In: Plant Genomics and Proteomics

bet	67/87
Sana	23.02.2023
Hajmi	1.13 Mb.
	#1225741

1 ... 63 64 65 66 67 68 69 70 ... 87

Bog'liq
Christopher A. Cullis - Plant Genomics and Proteomics-J. Wiley & Sons (2004)

Fugu
(puf
ferfish)
Map the query sequence.Determine the
Fugu rubripes
Genome BLAST
pag
e
genomic str
uctur
e. Identify novel genes.
Zebrafish
Map the query sequence. Determine the
Zebrafish Genome BLAST page
genomic str
uctur
e. Identify novel genes.
Arabidopsis thaliana
Map the query sequence. Determine the
A
rabidopsis thaliana
BLAST page
genomic str
uctur
e. Identify novel genes.
Oryza sativa
(rice)
Map the query sequence. Determine the
O
ryza sativa
BLAST page
genomic str
uctur
e. Identify novel genes.
Anopheles gambiae
Map the query sequence. Determine the
A
nopheles gambiae
BLAST page
(mosquito)
genomic str
uctur
e. Identify novel genes.
Other eukaryotes
Map the query sequence. Determine the
Other Eukaryotes BLAST page
Plasmodium,
genomic str
uctur
e. Identify novel genes.
Amoeba, fungi, etc.
Micr
obial Genomes
Map the query sequence or identify novel
Micr
obial Genome BLAST
genes
Nucleotide
UniV
ec
Scr
een for vector contamination Find matches
V
ecScr
een
T
race
Ar
chives
to unassembled, raw sequence data
T
race MEGABLAST Sear
ch
Fr
om http://www
.ncbi.nlm.nih.gov/BLAST/pr
oducttable.html

activity is the annotation of the features of the genome. However, because
the task is not charged to a single individual or group, how does the genome
annotation get accomplished and how do the various features identified by
different individuals get integrated? This can be done in a number of ways.
One way is the Distributed Annotation System (DAS). This is a client-server
system in which a single machine (the client) gathers genome annotation
information from multiple distant websites (the reference and annotation
servers), collates that information, and displays it to the user in a single view.
Little coordination is needed among the various information providers.
Further information regarding DAS can be gained at http://biodas.org. A
second method involves a group or small sets of groups who generate the
annotation and then release the data at regular intervals. If there are
multiple annotations of the same region these can be integrated or displayed
independently, as is the case for the Human Genome browser
(http://genome.ucsc.edu/cgi-bin/hgGateway?org=human). One of the par-
ticularly useful functions is the ability to add a personal track to those
already established to incorporate individually important features. Finally,
for specific gene families, an annotation jamboree could be an appropriate
vehicle for annotation. This is where experts in a gene family get together to
annotate as many as possible of the representatives of their particular gene
I
N F O R M AT I C S
T
O O L S
1 7 7
Key to graphics
The Sequence
Inverted Repeat
Tandem Repeat
Tandem Oligo Repeat
Palindrome
1
2
3
4
5
6
7
8
9
KB
FIGURE 9.2.
Miropeats analysis of a genomic region from flax showing inverted
repeats, tandem oligo repeats, and a palindrome.

family. Clearly, these approaches are not mutually exclusive. The distributed
annotation allows experts in various areas to contribute remotely.
The programs that can be used to annotate genomes, and the ways the
results are presented, are available at various sites listed in the references for
this chapter. Among the most common software tools used are BLAST,
FASTA, HMM profiling and motif finding, with Prosite and other
pattern/motif combinatorial search tools applied to the various databases
listed in Table 9.2. Used alone or in various combinations, they permit the
identification and subsequent description of homologous sequences accord-
ing to various criteria such as sequence, structural, and functional proper-
ties. As the available data set that can be searched increases, more identified
genes will be available for training the gene finding programs, thereby
making their predictions more accurate. The predicted genes can be searched
for in the EST databases as well as experimentally by RT-PCR with primers
designed from the annotated genomic sequences. In particular, the use of
1 7 8
9. B
I O I N F O R M AT I C S
KBases
P
T
P
P
100.0
P
P
T
T
G
GGGGG GG
T
T T T
T T
T
T
T T T
FIGURE 9.3.
Miropeats analysis of the Arabidopsis thaliana BAC clone F16P2 at a
threshold of 150. The organization of genes on BAC F16P2 showing the 3 tandem
gene duplications. The glutathione-S-transferase and tropinone reductase genes are
labeled G and T, respectively. A smaller duplication of pumilio-like protein (P) is also
present. Figure 1.4 converted to a single linear read is given below the Miropeats
pattern.

primers across predicted splice sites would be a direct test of the accuracy
of the informatic analyses.
E
XPRESSION
D
ATA
At the highest (most complete) level the genome of an organism should be
annotated completely, with all the possible features included in this annota-
tion. Because this cannot be done directly at the outset, it will be built up by
using other information to identify interesting and important regions of
the genome. Some of these classes of additional data would include the
expression data derived from EST sequences, SAGE, and MPSS
TM
and from
comparative genomics.
EST C
LUSTERING
As described above, the clustering of ESTs is an important function in under-
standing the expressed portions of the genomes. One of the sites where this
is done is www.tigr.org, where the gene indices are available for 18 differ-
ent higher plants (Arabidopsis, barley , cotton, grape, ice plant, lettuce, Lotus,
maize, Medicago truncatula, Pinus, potato, rice, rye, Sorghum bicolor, soybean,
sunflower, tomato, and wheat). The gene indices at TIGR are the results of
the clustering of transcripts into tentative consensus (TC) sequences. These
TCs are built with a variety of programs including:
∑ Megablast (Zhang et al., 2000)
∑ CAP3
∑ Paracel TranscriptAssembler
TM
∑ DNA-Protein Search program (dps) developed by Dr. Xiaoqiu Huang
An alternative clustering of plant ESTs is available from PlantGDB
(http://www.plantgdb.org/). Here the clustering is done with a clustering
tool, PaCE, resulting in tentative unigenes (TUGs). The overall data are
similar, as can be seen from Table 9.3, although the dates of the relevant
assemblies are different, which would explain some of the differences in the
numbers in the various classes.
What is clear from both of these sites is that there is still a large number
of singletons in addition to the assembled TCs/TUGs. This is especially true
for the wheat data, with >70,000 singletons in addition to the 29,000+
TCs/TUGs.
HarvEST (http://harvest.ucr.edu/) is another EST database-viewing
software that emphasizes gene function and is oriented ultimately to com-
parative genomics. This software is downloadable from the website. The EST
sequences in HarvEST have also been assembled with CAP3. The fully
enabled versions of HarvEST allow the user to examine the actual CAP3
I
N F O R M AT I C S
T
O O L S
1 7 9

sequence alignment and so determine whether and where individual
sequences deviate from a consensus sequence.
Therefore, HarvEST and the TIGR Gene indices are similarly assembled
with CAP3, whereas the PlantsGDB data are assembled with different soft-
ware (PaCE).
F
INDING
G
ENES IN
G
ENOMIC
S
EQUENCES
The GeneSeqer web service (at the PlantsGDB website) is intended primar-
ily for the purpose of performing the spliced alignment of query sequences
(sequences representing transcribed genes, i.e., ESTs, cDNAs, and proteins)
with a target sequence (genomic DNA). The input sequences can be deter-
mined based on similarity to other sequences (both genomic and tran-
scribed), or one may already possess an uncharacterized sequence one needs
to know more about. An exhaustive alignment of “All Plant” ESTs and
cDNAs is possible, or a more efficient approach using the Tentative Unique
Gene clusters (TUGs) assembled with the PlantGDB contiging method can
be made.

Download 1.13 Mb.

Do'stlaringiz bilan baham:

1 ... 63 64 65 66 67 68 69 70 ... 87