"Frontmatter". In: Plant Genomics and Proteomics
Download 1.13 Mb. Pdf ko'rish
|
Christopher A. Cullis - Plant Genomics and Proteomics-J. Wiley & Sons (2004)
S
TRATEGIES FOR G ENOME S EQUENCING The gene enrichment or whole genome sequence approaches are not mutu- ally exclusive. The most efficient ways of getting useful sequence from any genome of interest will probably include some combination of the enrich- ment, shotgun, and anchored sequencing methods. The following illustrates how such strategies can be incorporated into the two main sequencing approaches. What is important to realize is that the physical map is essen- tial for genome analysis irrespective of whether minimum tiling path, whole shotgun, or gene enrichment strategies are being used. 5 8 3. S E Q U E N C I N G S T R AT E G I E S T HE M INIMUM T ILING P ATH (MTP) S EQUENCING This approach makes use of the physical map that must be developed before high-throughput sequencing is started. The physical map of overlapping BAC clones described above is the starting point. Obviously, it is not essen- tial to get a complete physical map before beginning, but the regions that are the starting points must be defined. All the BACs in the same bin (for example, all the BACs in Figure 3.2, a through r, are in the same bin) can be readily identified. This information is then used to select a subset of BAC clones that will be used to determine the sequence of the contig. For example in Figure 3.2, the selection of clones a, m, and r would cover the whole of the contig with the minimum overlap. Here there will be substantial overlap between m and r, which will result in duplicate sequencing, but very little redundancy between a and m. However, with another contig as that shown in Figure 3.5, all three BACs would have to be sequenced, even though most of the sequence from BAC b would be redundant. As described below, other strategies for linking two BACs that are close together without the extra sequencing are available. Thus the minimum number of clones that are required to span a contig is determined by the confidence that can be placed in the alignment of the BACs in the contig. In Figure 3.2, the choice of the three BACs to sequence is relatively easy because of the large number of BACs with substantial overlaps that make up the contig. In Figure 3.5, a and c would be singleton BACs, that is, ones with no overlaps, if the BAC rep- resented by b had not been fingerprinted. So even though they were in close physical proximity, they could not be placed relative to each other unless there was additional evidence. Such evidence could be the presence of mol- ecular mapping markers on each of the BACs that were known to map very close together on the genetic map. However, as noted above, the correspon- dence between physical and genetic distances is not uniform, so the actual spacing of these two BACs could not be determined directly from these data. BAC E ND S EQUENCING All of the BACs that are fingerprinted can also be sequenced from both ends. This serves two purposes. First, it adds to the database of sequences for that organism. Current technology that can now generate up to a kilobase of sequence per run, so 2 kb of sequence is obtained from each BAC. Thus, S T R AT E G I E S F O R G E N O M E S E Q U E N C I N G 5 9 a b c FIGURE 3.5. BAC contig where all the BACs have to be sequenced. because each BAC is on the average 125 kb, this end sequencing would gen- erate about 1.6% of the sequence of each BAC. Additionally, it is unlikely that a significant number of BACs would have overlapping ends, so all this sequence should be nonredundant, in the sense of representing various parts of the genome. However, if indeed the BACs are random regions of the genome, then the expectation is that only the same fraction of those end sequences would represent genes, because the genes only make up a small fraction of the whole genome. Thus the resulting sequence would represent about 30% of the total genomic sequence if the number of BACs included in the end sequence was a 20-fold coverage of the genome. A second purpose served is that the BAC end sequences can be used to generate the most effi- cient minimum tiling path. Starting from the sequence of a central clone in the BAC contig, the BAC end sequence data of all the other clones in the contig can be used to identify those with the minimal overlap. Thus the next contiguous clone to be sequenced is based on this minimal overlap. This approach is also called the sequence-tagged connectors (STC) approach. It would appear that the STC approach is the most sensitive in detecting the minimal overlaps and so involves the least amount of sequencing. One pos- sible confounding factor in generating the minimal overlaps is where the minimal overlapping BAC clone ends in a repetitive sequence and so cannot be unambiguously identified. The next closest BAC that is uniquely defined then must be used. So if we go to Figure 3.6 then BAC 10 is chosen as the starting point. The whole of BAC 10 is sequenced and assembled. The BAC end sequences of the other clones in the contig are blasted against the sequence of BAC 10. BACs 7–12 all clearly overlap, and the end sequences are found in the sequence of 10. However, from the fingerprints it is not certain whether 6 and 13 also overlap with 10. If the BAC end sequences of 6 and 13 are found in the sequence of 10, then these two will be taken to con- tinue the sequence of the contig. The caveat to this is that the end of 10 or the overlapping sequences in 6 and 13 must not be repetitive elements. Finally, all of the sequences can be blasted against the databases and any known gene homologies or sequenced marker homologies can be detected and used to anchor the particular BAC. This MTP approach may be even more efficient at getting all the genes than expected because there is evidence that BAC libraries generated with restriction fragments are not random. An inspection of restriction enzyme digests of genomic DNA from most higher plants, with the commonly used six-base recognition site enzymes for BAC library generation, does not give the expected distribution of fragments around a size of 4096 bp. In most cases there are large regions without any sites, and these are overrepresentative of the heterochromatic fraction. Therefore, libraries made with restriction enzymes will be somewhat enriched for low-copy-number (genic) sequences. The use of random-sheared BAC libraries would be one way to overcome gaps containing much of the heterochromatic regions that may be 6 0 3. S E Q U E N C I N G S T R AT E G I E S lost during the production of BAC libraries from restriction digest of DNA. However, if the heterochromatic regions are of less interest, then the restric- tion enzyme-digested DNA should be used for the BAC library construction. Download 1.13 Mb. Do'stlaringiz bilan baham: |
Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling
ma'muriyatiga murojaat qiling