"Frontmatter". In: Plant Genomics and Proteomics

bet	23/87
Sana	23.02.2023
Hajmi	1.13 Mb.
	#1225741

1 ... 19 20 21 22 23 24 25 26 ... 87

Bog'liq
Christopher A. Cullis - Plant Genomics and Proteomics-J. Wiley & Sons (2004)

F
RACTIONATING THE
G
ENOME
At present, the most important genomic information to acquire is the com-
position and number of the actual genes in a particular plant. The remain-
der of the genome does not hold quite the same importance because its
contribution to the final phenotype is not thought to be substantial, and cer-
tainly less than that of the actual genes. Therefore, rather than trying to
generate a complete genome sequence, strategies to enrich the regions that
contain genes have been devised. These strategies usually rely on a demon-
strated difference between the gene space, that is, the regions that contain
F
R A C T I O N AT I N G T H E
G
E N O M E
5 3

genes, and the rest of the genome. For plants that have very large genomes,
this gene space is probably arranged in islands of gene-rich sequence sepa-
rated by stretches of the genome that contain few, but greater than zero,
genes (Panstruga et al., 1998). If such islands really exist and can be identi-
fied, then a large fraction of the genes could be isolated and sequenced apart
from the rest of the “uninteresting” sequences. Characteristics that differen-
tiate the genes from the rest of the nuclear DNA include the degree of methy-
lation (Bird, 1986, 1992; Gruenbaum et al., 1981; Martienssen, 1999), the
degree of repetition of the sequence within the genome, whether or not it is
transcribed or contains an open reading frame, and, for maize at least,
whether or not it is a target for transposable elements.
M
ETHODS OF
F
RACTIONATING THE
G
ENOME
E
XPRESSED
S
EQUENCE
T
AGS
(EST
S
)
The last few years have seen an enormous growth of the number of ESTs in
the databases for some of the major crop plants (Figure 3.3). This is clearly
one source of genomic sequence for the genes. Obviously, any genes whose
expression is either very low or restricted to tissues that were not sampled
in the generation of the ESTS will be missed. Examples of this under-
achievement are the EST collections for human (3,500,000), C. elegans
(150,000), and Arabidopsis (135,000) where only 35–65% of the genes pre-
dicted by genome sequencing were found in the EST collections. Addition-
ally, various members of multigene families that do not differ in the region
sequenced will be missed. Many of the ESTs have been placed on the genetic
maps, but many have not because of the lack of any polymorphisms within
the studied germplasm. For some plants, such as wheat, chromosome dele-
tion lines that have already been developed can be used to localize these
nonpolymorphic ESTs to a region of the chromosome. The development of
maize/oat addition lines and radiation hybrids of these lines may serve the
same purpose in corn (Kynast et al., 2001). However, this type of genetic
resource is not likely to be developed for many other plant species.
R
EASSOCIATION
K
INETICS
Up until the 1980s many genomes could only be characterized by reassoci-
ation kinetics, but this type of analysis went out of fashion with the arrival
of easier, quicker modern molecular methods. These experiments physically
separated the various classes of sequences, on the basis of the frequency with
which they were present in the genome, by separating single- and double-
stranded molecules after various incubation times. The more frequently a
sequence was present in the genome, the more rapidly it reformed a duplex.
5 4
3. S
E Q U E N C I N G
S
T R AT E G I E S

Therefore, a parameter designated Cot (for concentration times time) could
be defined whereby various classes of repetitive sequences could be elimi-
nated, or isolated physically, from the reaction. A reassociation experiment
is carried out by shearing nuclear DNA into small fragments (200–500 bp)
by high-speed blending and checking the fragment size by gel elec-
trophoresis. The sheared DNA is precipitated, redissolved in the appropri-
ate buffer, denatured, and allowed to reanneal at the appropriate
temperature for various lengths of time. The single- and double-stranded
fractions are physically separated with a hydroxyapatite column, and the
amounts of the total starting DNA in each fraction are determined. The
single-stranded fraction can be incubated again and the newly reassociated
strands again isolated. This results in the physical isolation of the part of the
genome that has sequences present with a particular range of copy numbers.
The resulting Cot curve, such as that shown in Figure 3.4, can be used to
assess the appropriate parameters to isolate a particular fraction of the
genome. An example would be that by choosing the appropriate annealing
M
E T H O D S O F
F
R A C T I O N AT I N G T H E
G
E N O M E
5 5
100,000
200,000
300,000
400,000
Number of ESTs
Arabidopsis
wheat
maize
rice
barley
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
FIGURE 3.3.
Increase in ESTs in dbEST. The different colors are for various species.
1—number of ESTs in 1998; 2—number of ESTs in 2000; 3—number of ESTs in 2003.

times and concentrations for the first and second incubations, those
sequences that are present in the genome at between 1 and 10 copies could
be isolated. Most of the most of the low-copy-number sequences are
expected to be genes, but the proportion of all the genes that are in this frac-
tion still must be determined. However, because of the interspersion of repet-
itive sequences adjacent to low-copy sequences, the fragments involved in
the reassociation studies must be fairly short (500 bp or less for most complex
genomes). A second consideration concerning the proportion of genes that
will be in any particular high Cot fraction (the higher the Cot value, the lower
the copy number of the sequences in the genome) is the stringency at which
the reassociation is performed. In general, to achieve a reasonable rate of
reassociation, the stringency of the reaction is set at about T
m
– 25°C, which
will allow about 25% of the nucleotides in any duplex to be mismatched.
Therefore, even relatively distantly related sequences will appear to be
present in multiple copies because of the cross-reaction under these condi-
tions and so might be missing from the high-Cot fractions.
This is another example of a technique that had fallen into disuse but
can be applied in a new context to provide vital information. Cot fractiona-
tion is a strategy for the fractionation of the genome that should be relatively
unbiased and may result in the identification of genes not uncovered in any
other fashion short of a whole genome sequencing effort (Peterson et al.,
2002; Yuan et al., 2002).
5 6
3. S
E Q U E N C I N G
S
T R AT E G I E S
100
Fold Back
Highly repeated
% ssDNA
Intermediately repeated
Low copy
0
Log Cot
FIGURE 3.4.
Cot curve for DNA from a higher plant of large genome size.

M
ETHYL
F
ILTRATION
As with most higher eukaryotes, a portion of the cytosine residues at
CpG or CpNpG sites in plant genomes are methylated (Bird, 1986, 1992;
Gruenbaum et al., 1981; Martienssen, 1999). Methylation at these sites is
known to modify DNA structure and regulate gene expression. This methy-
lation is therefore variable within the genome, being lower in transcribed
regions than in transcriptionally inactive regions. High rates of methylation
(hypermethylation) are associated with transcriptionally inactive hete-
rochromatin, whereas hypomethylation is usually associated with the
transcriptionally active euchromatin. Therefore, elimination of the highly
methylated regions would enrich the remaining sequences with genes. This
discrimination becomes even more useful as the genome size increases. For
example, most of the differences between methyl-C levels in corn and Ara-

Download 1.13 Mb.

Do'stlaringiz bilan baham:

1 ... 19 20 21 22 23 24 25 26 ... 87