"Frontmatter". In: Plant Genomics and Proteomics

bet	28/87
Sana	23.02.2023
Hajmi	1.13 Mb.
	#1225741

1 ... 24 25 26 27 28 29 30 31 ... 87

Bog'liq
Christopher A. Cullis - Plant Genomics and Proteomics-J. Wiley & Sons (2004)

T
HE
M
OST
E
FFICIENT
A
PPROACH
(
ES
)
Two competing strategies for the complete sequencing of large genomes
have been described, one in which physical maps are developed followed
by the selection of a minimal tiling path of clones to sequence, and the other
using a whole genome shotgun (WGS) approach. A test of the power of the
two methods was essentially carried out during the sequencing of the human
genome. The International Sequencing Consortium used the human FPC
map that had been developed by the International Mapping Consortium,
and the draft sequence published by Celera used the WGS approach.
T
H E
M
O S T
E
F F I C I E N T
A
P P R O A C H
(
E S
)
6 3

However, the Celera sequence included draft sequence from the public con-
sortium and, as has been mentioned above, might not have been assembled
as well without the public, anchored data. Because these two methodologies
are not mutually exclusive, a combination of them would appear to be the
best approach for the future sequencing of large genomes. The BAC-by-BAC
approach results in easier to assemble sequences where ambiguities can be
resolved and the location of the resulting sequence is known. The advantage
of the WGS approach is that it is more amenable to high-throughput automa-
tion and also covers regions that cannot be cloned by BACs. The WGS
approach, especially if the aim is not to produce finished sequence, will be
much less expensive.
L
IKELY
T
ARGETS FOR
“C
OMPLETE
” G
ENOME
S
EQUENCING
Sequencing strategies must be developed to account for the information
described in Chapter 2 related to the structure of the genome of the partic-
ular plant under investigation. For example, in small genomes where much
of the genome is present in long stretches of genes with relatively few repet-
itive sequences, such as Arabidopsis thaliana, the acquisition and analysis of
the sequence data will be less complicated than in a large genome with a
very high proportion of complex repeats and many related copes of a spe-
cific gene. Shotgun sequencing, the acquisition of random reads of sequence,
could be assembled in the former case, whereas in the latter case it would
be much more difficult. So the question is, what is the added value of such
endeavors, or will much of the sequencing of additional genomes be a re-
discovery or confirmation of rules gleaned from the Arabidopsis and rice
sequence data? Despite the reduction in sequencing costs, generating
enough sequence reads over large genomes is still an expensive proposition.
This cost is then compounded by the problem of assembly, which is still a
major concern is the cases of complex genomes and may be even more
intractable for many of the very large polyploid plant genomes. With more
than a single copy of a gene present in the genomes that are closely related,
how can the different members be distinguished and differentiated from
sequencing errors? If two copies of a gene are only minimally different, then
how do you distinguish them? If the level of similarity is set too low, mul-
tiple copies will be merged into a single gene, whereas if it is too high, then
sequencing errors will generate additional phantom copies. These are some
of the considerations that come into play when trying to deal with the
sequencing of complex plant genomes. Physical mapping followed by
sequencing of the overlapping BAC clones was the strategy adopted for Ara-
bidopsis thaliana. The sequence was then assembled into the final map. Even
in this relatively simple genome, there are still runs of repetitive sequence
that were not fully sequenced, although the lengths of these are regions are
6 4
3. S
E Q U E N C I N G
S
T R AT E G I E S

known. However, adopting this strategy requires the whole physical map
and a detailed genetic map before the sequence can be assembled on the scaf-
fold. With larger genomes it may not even be possible to develop the phys-
ical map, much less be able to obtain the whole genome sequence.
Relatively few plants have huge sequence databases associated with
them, and even fewer have large tracts of contiguous sequence. Compar-
isons across species can be very valuable, but the degree of relatedness in
the comparison affects the kinds of questions that can be asked. In general
a rule of pairs has been developed to allow the characterization of processes
that have evolved within lineages. A wide range of plants have been char-
acterized genetically and physiologically, and so all are potential subjects for
detailed extensive genome sequencing. Sequencing projects already under
way include those for the cabbage Brassica oleraceae, for two legumes, Lotus
japonicus and Medicago truncatula, as well as for maize. Discussions are also
under way to develop sequencing projects for soybean, tomato, barley, and
banana.
The maize research community has organized and developed a plan for
the needs of that community. A genome sequence was their highest priority
(Bennetzen et al., 2001). In a sense this was similar to the way in which both
the Arabidopsis and rice sequencing projects were started. Maize has a wealth
of genetic data collected over the last century, and many important agro-
nomic traits have been mapped. Therefore, the information derived from a
genome sequence would be applicable to crop improvement as well as a
basic understanding of how plants work. The initial efforts for maize are to
generate the sequence of the gene space, rather than a complete genome
sequence, mainly because of financial considerations as to the cost of a com-
plete maize sequence.
Medicago truncatula, a relative of alfalfa and also a legume, already
has an international effort to obtain a whole genome sequence (http://
medicago.toulouse.inra.fr/EU/documents/whitepapergensequ.pdf). This
species has prominence as a model legume, so an understanding of all the
genes should aid in the understanding of the control of the symbiotic rela-
tionship between legumes and Rhizobia. The species also has a relatively
small genome that should make the task of assembly easier but still not
trivial. The first rounds of shotgun sequencing in this species have resulted
in the complete sequence of the chloroplast and the definition of repetitive
sequence classes. A BAC-anchored effort is also under way.
The poplar genome is a subject of a shotgun sequence to be done by
the Joint Genome Institute (http://genome.jgi-psf.org/poplar0/poplar0.
home.html). This effort is to generate a large number of reads, but
the poplar community will have to assemble the sequence as a separate
effort.
The other species listed above are also likely only to have a gene space
sequencing effort initially because the genome is very large or the research
L
I K E LY
T
A R G E T S F O R
“ C
O M P L E T E
” G
E N O M E
S
E Q U E N C I N G
6 5

community is insufficient to support a whole genome sequencing effort.
Tomato is likely to become the reference species for the Solanaceae and
barley one of the grass reference species.
How many genes do plants have? If the actual size of the gene space is
relatively constant, then the need for gene enrichment rises dramatically as
the genome size increases. Thus wheat may have less than 5% of its genome
as genes, and sequencing the rest may not be particularly useful or instruc-
tive. However, it is still a major and expensive undertaking to develop the
sequence resources for a particular species. As the databases are populated
with sequence data and the understanding of how genes are organized and
distributed, it may become possible to devise improved strategies for gen-
erating genomic sequences efficiently. Until then, for most species syntenic
relationships, supplemented by some EST sequence and perhaps the
sequences of a few selected BACs, will have to suffice. In many cases,
sequence data for most species will be generated in response to specific ques-
tions concerning the structure of particular genes across the plant kingdom
where the sequences are generated with PCR and primers in conserved
regions of the genes under study.
However, despite all the hurdles, the amount of sequence from higher
plants will continue to rise at an accelerating rate for the foreseeable future.
The first steps down this path were supported by funds from the National
Plant Genome Initiative (through the Plant Genome Research Program at the
National Science Foundation and the USDA). The work done with the addi-
tional funding provided under this umbrella has fundamentally altered the
way plant research can and will be done.

Download 1.13 Mb.

Do'stlaringiz bilan baham:

1 ... 24 25 26 27 28 29 30 31 ... 87