"Frontmatter". In: Plant Genomics and Proteomics


Download 1.13 Mb.
Pdf ko'rish
bet25/87
Sana23.02.2023
Hajmi1.13 Mb.
#1225741
1   ...   21   22   23   24   25   26   27   28   ...   87
Bog'liq
Christopher A. Cullis - Plant Genomics and Proteomics-J. Wiley & Sons (2004)

S
TRATEGIES FOR
G
ENOME
S
EQUENCING
The gene enrichment or whole genome sequence approaches are not mutu-
ally exclusive. The most efficient ways of getting useful sequence from any
genome of interest will probably include some combination of the enrich-
ment, shotgun, and anchored sequencing methods. The following illustrates
how such strategies can be incorporated into the two main sequencing
approaches. What is important to realize is that the physical map is essen-
tial for genome analysis irrespective of whether minimum tiling path, whole
shotgun, or gene enrichment strategies are being used. 
5 8
3. S
E Q U E N C I N G
S
T R AT E G I E S


T
HE
M
INIMUM
T
ILING
P
ATH
(MTP) S
EQUENCING
This approach makes use of the physical map that must be developed before
high-throughput sequencing is started. The physical map of overlapping
BAC clones described above is the starting point. Obviously, it is not essen-
tial to get a complete physical map before beginning, but the regions that 
are the starting points must be defined. All the BACs in the same bin (for
example, all the BACs in Figure 3.2, through r, are in the same bin) can be
readily identified. This information is then used to select a subset of BAC
clones that will be used to determine the sequence of the contig. For example
in Figure 3.2, the selection of clones am, and would cover the whole of the
contig with the minimum overlap. Here there will be substantial overlap
between and r, which will result in duplicate sequencing, but very little
redundancy between and m. However, with another contig as that shown
in Figure 3.5, all three BACs would have to be sequenced, even though most
of the sequence from BAC would be redundant. As described below, other
strategies for linking two BACs that are close together without the extra
sequencing are available. Thus the minimum number of clones that are
required to span a contig is determined by the confidence that can be placed
in the alignment of the BACs in the contig. In Figure 3.2, the choice of the
three BACs to sequence is relatively easy because of the large number of
BACs with substantial overlaps that make up the contig. In Figure 3.5, and
would be singleton BACs, that is, ones with no overlaps, if the BAC rep-
resented by had not been fingerprinted. So even though they were in close
physical proximity, they could not be placed relative to each other unless
there was additional evidence. Such evidence could be the presence of mol-
ecular mapping markers on each of the BACs that were known to map very
close together on the genetic map. However, as noted above, the correspon-
dence between physical and genetic distances is not uniform, so the actual
spacing of these two BACs could not be determined directly from these data.
BAC E
ND
S
EQUENCING
All of the BACs that are fingerprinted can also be sequenced from both ends.
This serves two purposes. First, it adds to the database of sequences for that
organism. Current technology that can now generate up to a kilobase of
sequence per run, so 2 kb of sequence is obtained from each BAC. Thus,
S
T R AT E G I E S F O R
G
E N O M E
S
E Q U E N C I N G
5 9
a
b
c
FIGURE 3.5.
BAC contig where all the BACs have to be sequenced.


because each BAC is on the average 125 kb, this end sequencing would gen-
erate about 1.6% of the sequence of each BAC. Additionally, it is unlikely
that a significant number of BACs would have overlapping ends, so all this
sequence should be nonredundant, in the sense of representing various parts
of the genome. However, if indeed the BACs are random regions of the
genome, then the expectation is that only the same fraction of those end
sequences would represent genes, because the genes only make up a small
fraction of the whole genome. Thus the resulting sequence would represent
about 30% of the total genomic sequence if the number of BACs included in
the end sequence was a 20-fold coverage of the genome. A second purpose
served is that the BAC end sequences can be used to generate the most effi-
cient minimum tiling path. Starting from the sequence of a central clone in
the BAC contig, the BAC end sequence data of all the other clones in the
contig can be used to identify those with the minimal overlap. Thus the next
contiguous clone to be sequenced is based on this minimal overlap. This
approach is also called the sequence-tagged connectors (STC) approach. It
would appear that the STC approach is the most sensitive in detecting the
minimal overlaps and so involves the least amount of sequencing. One pos-
sible confounding factor in generating the minimal overlaps is where the
minimal overlapping BAC clone ends in a repetitive sequence and so cannot
be unambiguously identified. The next closest BAC that is uniquely defined
then must be used. So if we go to Figure 3.6 then BAC 10 is chosen as the
starting point. The whole of BAC 10 is sequenced and assembled. The BAC
end sequences of the other clones in the contig are blasted against the
sequence of BAC 10. BACs 7–12 all clearly overlap, and the end sequences
are found in the sequence of 10. However, from the fingerprints it is not
certain whether 6 and 13 also overlap with 10. If the BAC end sequences of
6 and 13 are found in the sequence of 10, then these two will be taken to con-
tinue the sequence of the contig. The caveat to this is that the end of 10 or
the overlapping sequences in 6 and 13 must not be repetitive elements.
Finally, all of the sequences can be blasted against the databases and any
known gene homologies or sequenced marker homologies can be detected
and used to anchor the particular BAC.
This MTP approach may be even more efficient at getting all the genes
than expected because there is evidence that BAC libraries generated with
restriction fragments are not random. An inspection of restriction enzyme
digests of genomic DNA from most higher plants, with the commonly used
six-base recognition site enzymes for BAC library generation, does not give
the expected distribution of fragments around a size of 4096 bp. In most cases
there are large regions without any sites, and these are overrepresentative 
of the heterochromatic fraction. Therefore, libraries made with restriction
enzymes will be somewhat enriched for low-copy-number (genic)
sequences. The use of random-sheared BAC libraries would be one way to
overcome gaps containing much of the heterochromatic regions that may be
6 0
3. S
E Q U E N C I N G
S
T R AT E G I E S


lost during the production of BAC libraries from restriction digest of DNA.
However, if the heterochromatic regions are of less interest, then the restric-
tion enzyme-digested DNA should be used for the BAC library construction. 

Download 1.13 Mb.

Do'stlaringiz bilan baham:
1   ...   21   22   23   24   25   26   27   28   ...   87




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling