"Frontmatter". In: Plant Genomics and Proteomics
Download 1.13 Mb. Pdf ko'rish
|
Christopher A. Cullis - Plant Genomics and Proteomics-J. Wiley & Sons (2004)
F
RACTIONATING THE G ENOME At present, the most important genomic information to acquire is the com- position and number of the actual genes in a particular plant. The remain- der of the genome does not hold quite the same importance because its contribution to the final phenotype is not thought to be substantial, and cer- tainly less than that of the actual genes. Therefore, rather than trying to generate a complete genome sequence, strategies to enrich the regions that contain genes have been devised. These strategies usually rely on a demon- strated difference between the gene space, that is, the regions that contain F R A C T I O N AT I N G T H E G E N O M E 5 3 genes, and the rest of the genome. For plants that have very large genomes, this gene space is probably arranged in islands of gene-rich sequence sepa- rated by stretches of the genome that contain few, but greater than zero, genes (Panstruga et al., 1998). If such islands really exist and can be identi- fied, then a large fraction of the genes could be isolated and sequenced apart from the rest of the “uninteresting” sequences. Characteristics that differen- tiate the genes from the rest of the nuclear DNA include the degree of methy- lation (Bird, 1986, 1992; Gruenbaum et al., 1981; Martienssen, 1999), the degree of repetition of the sequence within the genome, whether or not it is transcribed or contains an open reading frame, and, for maize at least, whether or not it is a target for transposable elements. M ETHODS OF F RACTIONATING THE G ENOME E XPRESSED S EQUENCE T AGS (EST S ) The last few years have seen an enormous growth of the number of ESTs in the databases for some of the major crop plants (Figure 3.3). This is clearly one source of genomic sequence for the genes. Obviously, any genes whose expression is either very low or restricted to tissues that were not sampled in the generation of the ESTS will be missed. Examples of this under- achievement are the EST collections for human (3,500,000), C. elegans (150,000), and Arabidopsis (135,000) where only 35–65% of the genes pre- dicted by genome sequencing were found in the EST collections. Addition- ally, various members of multigene families that do not differ in the region sequenced will be missed. Many of the ESTs have been placed on the genetic maps, but many have not because of the lack of any polymorphisms within the studied germplasm. For some plants, such as wheat, chromosome dele- tion lines that have already been developed can be used to localize these nonpolymorphic ESTs to a region of the chromosome. The development of maize/oat addition lines and radiation hybrids of these lines may serve the same purpose in corn (Kynast et al., 2001). However, this type of genetic resource is not likely to be developed for many other plant species. R EASSOCIATION K INETICS Up until the 1980s many genomes could only be characterized by reassoci- ation kinetics, but this type of analysis went out of fashion with the arrival of easier, quicker modern molecular methods. These experiments physically separated the various classes of sequences, on the basis of the frequency with which they were present in the genome, by separating single- and double- stranded molecules after various incubation times. The more frequently a sequence was present in the genome, the more rapidly it reformed a duplex. 5 4 3. S E Q U E N C I N G S T R AT E G I E S Therefore, a parameter designated Cot (for concentration times time) could be defined whereby various classes of repetitive sequences could be elimi- nated, or isolated physically, from the reaction. A reassociation experiment is carried out by shearing nuclear DNA into small fragments (200–500 bp) by high-speed blending and checking the fragment size by gel elec- trophoresis. The sheared DNA is precipitated, redissolved in the appropri- ate buffer, denatured, and allowed to reanneal at the appropriate temperature for various lengths of time. The single- and double-stranded fractions are physically separated with a hydroxyapatite column, and the amounts of the total starting DNA in each fraction are determined. The single-stranded fraction can be incubated again and the newly reassociated strands again isolated. This results in the physical isolation of the part of the genome that has sequences present with a particular range of copy numbers. The resulting Cot curve, such as that shown in Figure 3.4, can be used to assess the appropriate parameters to isolate a particular fraction of the genome. An example would be that by choosing the appropriate annealing M E T H O D S O F F R A C T I O N AT I N G T H E G E N O M E 5 5 100,000 200,000 300,000 400,000 Number of ESTs Arabidopsis wheat maize rice barley 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 FIGURE 3.3. Increase in ESTs in dbEST. The different colors are for various species. 1—number of ESTs in 1998; 2—number of ESTs in 2000; 3—number of ESTs in 2003. times and concentrations for the first and second incubations, those sequences that are present in the genome at between 1 and 10 copies could be isolated. Most of the most of the low-copy-number sequences are expected to be genes, but the proportion of all the genes that are in this frac- tion still must be determined. However, because of the interspersion of repet- itive sequences adjacent to low-copy sequences, the fragments involved in the reassociation studies must be fairly short (500 bp or less for most complex genomes). A second consideration concerning the proportion of genes that will be in any particular high Cot fraction (the higher the Cot value, the lower the copy number of the sequences in the genome) is the stringency at which the reassociation is performed. In general, to achieve a reasonable rate of reassociation, the stringency of the reaction is set at about T m – 25°C, which will allow about 25% of the nucleotides in any duplex to be mismatched. Therefore, even relatively distantly related sequences will appear to be present in multiple copies because of the cross-reaction under these condi- tions and so might be missing from the high-Cot fractions. This is another example of a technique that had fallen into disuse but can be applied in a new context to provide vital information. Cot fractiona- tion is a strategy for the fractionation of the genome that should be relatively unbiased and may result in the identification of genes not uncovered in any other fashion short of a whole genome sequencing effort (Peterson et al., 2002; Yuan et al., 2002). 5 6 3. S E Q U E N C I N G S T R AT E G I E S 100 Fold Back Highly repeated % ssDNA Intermediately repeated Low copy 0 Log Cot FIGURE 3.4. Cot curve for DNA from a higher plant of large genome size. M ETHYL F ILTRATION As with most higher eukaryotes, a portion of the cytosine residues at CpG or CpNpG sites in plant genomes are methylated (Bird, 1986, 1992; Gruenbaum et al., 1981; Martienssen, 1999). Methylation at these sites is known to modify DNA structure and regulate gene expression. This methy- lation is therefore variable within the genome, being lower in transcribed regions than in transcriptionally inactive regions. High rates of methylation (hypermethylation) are associated with transcriptionally inactive hete- rochromatin, whereas hypomethylation is usually associated with the transcriptionally active euchromatin. Therefore, elimination of the highly methylated regions would enrich the remaining sequences with genes. This discrimination becomes even more useful as the genome size increases. For example, most of the differences between methyl-C levels in corn and Ara- Download 1.13 Mb. Do'stlaringiz bilan baham: |
Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling
ma'muriyatiga murojaat qiling