Phylogenetic dependency networks: Inferring patterns of adaptation in hiv

bet	12/12
Sana	23.11.2017
Hajmi	4,8 Kb.
	#20686

1 ... 4 5 6 7 8 9 10 11 12

Document Outline

pi(a)
α
Epitope
Resistance
SNPs
Sieve

Figure B.3: The advantage of using π
0
(α) computed using the ﬁltering technique over
not ﬁltering. Because ﬁltering only aﬀects ˆ
π
0
, these gains result in proportionally
reduced (yet conservative) pFDR estimates.
providing increased power, ˆ
π
0
(α) may provide valuable information in cases were a
large proportion of tests could not achieve α. In such cases, the overall π
0
may be
quite high, but the π
0
among tests that could achieve α (those that we are interested
in) may be much lower.
Figure B.3
demonstrates the advantage of using ˆ
π
0
(α) over
ˆ
π
0
(1).
B.4
Numerical results
To explore the applicability of our proposed pFDR estimator, we created a number
of Epitope-derived synthetic data sets with diﬀerent number of tables that follow the
mixture model assumptions above, allowing for an unequal distribution of marginals
as deﬁned in Assumption (
B.33
) (see the Appendix for details). For each of these
data sets, we plotted the estimated pF DR(α) against the true proportion of false
discoveries using p < α as the threshold (
Figure B.4
).
In practice, it is often the case that pF DR(α) > pF DR(β) for some β > α.

216

0.0001
0.001
0.01
0.1
0.0001
0.001
0.01
0.1
70k
10k
35k
Figure B.4: Estimated pFDR vs. true false discovery proportion for synthetic data
with increasing number of tables generated from the Epitope data set. Estimates
above the dashed line are conservative.
Therefore, there is no reason to choose α as the rejection region, because choosing
β will result in more rejected tests and a lower proportion of false positives among
those rejected tests. For this reason, Storey [
208
] proposed the q-value, deﬁned to be
q(α)
min
β≥α
pF DR(β).
(B.46)
To demonstrate the power gains of our method in practice, we conclude by com-
paring the number of signiﬁcant results for each of our example data sets as a function
of the q-value threshold (
Figure B.5
). As can be seen, our conservative estimates re-
sult in a substantial increase in the number of tests called signiﬁcant at a variety of
thresholds.

217

0
0.01
0.02
0.03
0.04
0.05
0
0.1
0.2
0.3
0.4
0.5
A Resistance

0
0.005
0.01
0.015
0.02
0
0.1
0.2
0.3
0.4
0.5
B Epitope

0
0.2
0.4
0.6
0.8
0
0.1
0.2
0.3
0.4
0.5
C Sieve

0
0.2
0.4
0.6
0.8
1
0
0.1
0.2
0.3
D SNPs
Figure B.5: Plotting the portion of rejected cases vs. q-values for the real data sets.
The solid line is the proposed method for discrete data and the dotted line is the S&T
method using marginal p-values.
B.5
Creating synthetic data sets
In real data sets, the properties of the data that we estimate through our pFDR
computation, such as π
0
or the true pFDR are of course unknown. In such cases,
synthetic data sets that allow manipulation of these properties can provide insights
as to the usefulness of various estimators. It is important, however, to create synthetic
data sets that are as close as possible to the real data. In this section we explain the
procedure that was used to create the synthetic data sets.
To simulate the real data, we used only marginals that were observed. Given such
marginals we ﬁrst decide whether the synthetic table that we create will be null or
alternative. For example, if we are interested in ﬁxing π
0
, we can thus ensure that π
0
of the tables that we create are nulls.

218
B.5.1
Creating null and alternative tables from given marginals
To create a null table given a set of marginals θ = {θ
X
, θ
¯
X
, θ
Y
, θ
¯
Y
}, we simulate n
tests where each test has a result {X, Y } such that X is independent of Y . For each
such test we select the result X ∈ {1, 0} following Pr {X = 1|H
0
} =
θ
X
n
. We select
Y ∈ {1, 0} following Pr {Y = 1|H
0
} =
θ
Y
n
.
To create an alternative table, in which X and Y are not independent, we simulate
tests by ﬁrst selecting X using the same procedure as above, and select Y |X, using the
following distribution: Pr {Y = 1|H
1
, X = 1} =
a
θ
X
and Pr {Y = 1|H
1
, X = 0} =
c
θ
¯
X
.
B.5.2
Selecting marginals
We have created two diﬀerent types of data sets, one where all the marginals come
from the same distribution, and one where the marginals distribution depends on
whether the table is null or alternative.
In the case of a single distribution of marginals, we divided the observed marginals
into 10 exponential bins
[1, 1/2], (1/2, 1/4], (1/4, 1/8], . . .
and place each marginal θ into a bin according to min{θ
X
, θ
¯
X
, θ
Y
, θ
¯
Y
}/max{θ
X
, θ
¯
X
, θ
Y
, θ
¯
Y
}.
We then choose a bin uniformly, and select a set of marginals uniformly from the bin.
We then designate the selected marginal as null with probability π
0
and generate the
table accordingly. This approach biases us towards choosing marginals that permit
lower p-values, which enables us to generate interesting alternative tables, even when
we force π
0
to be much lower than it is in the real data.
When the distribution of marginals depends on the whether the table is null or
alternative, we draw the θ from bin b ∈ 1, . . . , 10 with probability 1/2
10−b−1
for a null
table and with probability 1/2
b
for an alternative table.

219
B.6
Proofs and Remarks
In this appendix, we formalize the theoretical results from the main paper.
For
brevity, we will write H
0
to mean the event H = 0 and H
1
to mean the event H = 1.
Lemma 1. Given m tests, in which the P-values are IID and distributed according
to the mixture Equation (
B.14
), the H are IID Bernoulli random variables, and for
each test i, θ
i
is independent of H
i
,
E
1
m
m
i=1
Pr {P ≤ α|H
0
, θ
i
} = Pr {P ≤ α|H
0
} .
(B.47)
Proof of Lemma
1
. Because θ is independent of H, we can write
Pr {P ≤ α|H
0
} =
θ
Pr {P ≤ α|H
0
, θ} · Pr {θ|H
0
}
(B.48)
=
θ
Pr {P ≤ α|H
0
, θ} · Pr {θ}
(B.49)
where the summation is over all possible marginals. Furthermore,
E
1
m
m
j=1
1 {θ
j
= θ} = Pr {θ} .
(B.50)
Thus,
Pr {P ≤ α|H
0
} =
θ
Pr {P ≤ α|H
0
θ} E
1
m
m
j=1
1 {θ
j
= θ}
(B.51)
= E
1
m
m
j=1
θ
Pr {P ≤ α|H
0
, θ} ·
1 {θ
j
= θ}
(B.52)
= E
1
m
m
i=1
Pr {P ≤ α|H
0
, θ
i
} .
Lemma 2. Let ρ(·) be any non-negative function. Then, under the assumptions of
Lemma
1
,
E
m
i=1
p
ρ(p) ·
1 {p
i
= p}
m
i=1
p
ρ(p) · Pr {P = p|H
0
, θ
i
}
≥ π
0
.
(B.53)

220
Proof of Lemma
2
. Recall that
Pr {P = p} = π
0
· Pr {P = p|H
0
} + π
1
· Pr {P = p|H
1
} ,
(B.54)
where π
1
= Pr {H
1
} = 1 − π
0
. Thus, it follows that
Pr {P = p} ≥ π
0
· Pr {P = p|H
0
}
(B.55)
and
p
ρ(p) Pr {P = p} ≥
p
ρ(p) Pr {P = p|H
0
} · π
0
(B.56)
for any non-negative function ρ(·). Thus, it follows that
π
0
≤
p
ρ(p) Pr {P = p}
p
ρ(p) Pr {P = p|H
0
}
.
(B.57)
It follows analogously to the proof of Lemma
1
that
Pr {P = p|H
0
} =
1
m
E
m
i=1
Pr {P = p|H
0
, θ
i
}
(B.58)
and
Pr {P = p} =
1
m
E
m
i=1
1 {p
i
= p} .
(B.59)
Thus, it follows that
p
ρ(p) Pr{P =p}
p
ρ(p) Pr{P =p|H
0
}
=
p
ρ(p)
1
m
E [
m
i=1
1 {p
i
= p}]
p
ρ(p)
1
m
E [
m
i=1
Pr {P = p|H
0
, θ
i
}]
(B.60)
=
E
m
i=1
p
ρ(p)
1 {p
i
= p}
E
m
i=1
p
ρ(p) Pr {P = p|H
0
, θ
i
}
.
(B.61)
Because
p
ρ(p) Pr {P = p} is a linearly increasing function of
p
ρ(p) Pr {P = p|H
0
},
it follows from Jensen’s inequality that
E
m
i=1
p
ρ(p)
1 {p
i
= p}
E
m
i=1
p
ρ(p) Pr {P = p|H
0
, θ
i
}
≤ E
m
i=1
p
ρ(p)
1 {p
i
= p}
m
i=1
p
ρ(p) Pr {P = p|H
0
, θ
i
}
(B.62)
Thus,
E [ˆ
π
0
] ≥ π
0
.

221
Remark 1. Storey [
208
,
209
] argued that, for continuous statistics, we would expect
most of the observations with p close to 1 to be true null, and thus a natural estimate
for π
0
is
ˆ
π
0
(λ) =
#{p
i
> λ}
(1 − λ)m
(B.63)
for some tuning parameter 0 ≤ λ < 1. This procedure assumes a continuous under-
lying distribution, such that (1 − λ) = Pr {p
i
> λ|H
i
= 0} for all i. It can be shown
that Equation (
B.27
) is a special case of Equation (
B.26
) in which
ρ(p) =





0
if p ≤ λ,
1
otherwise.
(B.64)
Proof.
ˆ
π
0
=
p
m
i=1
ρ(p)
1 {p
i
= p}
p
m
i=1
ρ(p) Pr {P = p|H
0
, θ
i
}
(B.65)
=
m
i=1
p>λ
1 {p
i
= p}
m
i=1
p>λ
Pr {P = p|H
0
, θ
i
}
(B.66)
=
#{p
i
> λ}
m
i=1
Pr {P > p|H
0
, θ
i
}
(B.67)
=
#{p
i
> λ}
m · Pr{P > p|H
0
}
.
(B.68)
For discrete statistics,
Pr {p
i
> λ|H
0
} ≥ (1 − λ),
(B.69)
thus, it follows that
ˆ
π
0
≤
#{p
i
> λ}
(1 − λ)m
,
(B.70)
making it a tighter estimate of π
0
than we get when assuming the statistics are
continuous.
Remark 2. In an argument similar to what lead to Equation (
B.26
), Pounds and
Cheng [
183
] pointed out that
π
0
≤
E [P ]
E [P |H
0
]
.
(B.71)

222
Assuming
E [P ] = ¯
p
1
m
m
i=1
p
i
(B.72)
and E [P |H
0
] ≥
1
2
, Pounds and Cheng suggest deﬁning
ˆ
π
0
2 · ¯
p.
(B.73)
It turns out that this Pounds-Cheng approach is a special case of Equation Equa-
tion (
B.26
), with a conservative approximation for E [ p| H
0
].
Proof. Let ρ(p) = p, then
ˆ
π
0
=
m
i=1
p
p ·
1 {p
i
= p}
m
i=1
p
p · Pr {P = p|H
0
, θ
i
}
(B.74)
=
1
m
m
i=1
p
i
1
m
m
i=1
E [ P | H
0
, θ
i
]
(B.75)
=
¯
p
θ
E [ P | H
0
, θ] · Pr{θ}
(B.76)
=
¯
p
E [P |H
0
]
(B.77)
≤
¯
p
0.5
,
(B.78)
where E [p|H
0
] is our unbiased estimate of E [P |H
0
] and we deﬁne Pr{θ} as in Equa-
tion (
B.50
).
Lemma 3. Under the assumptions of Lemma
2
,
lim
m→∞
ˆ
π
0
a.s.
= π
0
+ π
1
E [ ρ(p)| H
1
]
E [ ρ(p)| H
0
]
.
(B.79)
Proof. By the strong law of large numbers, equations (
B.58
) and Equation (
B.59
)
imply that Pr{P = p|H
0
} converges almost surely to Pr {P = p|H
0
} and Pr{P = p}
converges almost surely to Pr {P = p}. Thus, it follows from Equation (
B.60
) that
lim
m→∞
ˆ
π
0
a.s.
=
p
ρ(p) Pr {P = p}
p
ρ(p) Pr {P = p|H
0
}
.
(B.80)

223
Furthermore, it follows from Equation (
B.22
) that
p
ρ(p) Pr {P = p}
p
ρ(p) Pr {P = p|H
0
}
= π
0
+ π
1
·
p
ρ(p) Pr {P = p|H
1
}
p
ρ(p) Pr {P = p|H
0
}
.
(B.81)
Thus,
lim
m→∞
ˆ
π
0
a.s.
= π
0
+ π
1
·
p
ρ(p) Pr {P = p|H
1
}
p
ρ(p) Pr {P = p|H
0
}
(B.82)
= π
0
+ π
1
·
E [ ρ(p)| H
1
]
E [ ρ(p)| H
0
]
.
Proof of Theorem
1
.
lim
m→∞
pF DR(α) = lim
m→∞
ˆ
π
0
· Pr{P ≤ α|H
0
}
Pr{P ≤ α} · Pr{R(α) > 0}
(B.83)
=
lim
m→∞
ˆ
π
0
· lim
m→∞
Pr{P ≤ α|H = 0}
lim
m→∞
Pr{P ≤ α} · lim
m→∞
Pr{R(α) > 0}
.
(B.84)
By the strong law of large numbers, Lemma
1
implies that Pr{P ≤ α|H = 0} con-
verges almost surely to Pr {P ≤ α|H
0
}, equation (
B.16
) implies that Pr{P ≤ α}
converges almost surely to Pr {P ≤ α}, and Pr{R(α) > 0} converges almost surely
to 1. Thus
lim
m→∞
pF DR(α)
a.s.
=
lim
m→∞
ˆ
π
0
· Pr {P ≤ α|H
0
}
Pr {P ≤ α}
(B.85)
=
lim
m→∞
ˆ
π
0
π
0
· pF DR(α).
(B.86)
Finally, it follows from Lemma
3
that
lim
m→∞
pF DR(α)
a.s.
=
π
0
+ π
1
E[ ρ(p)|H
1
]
E[ ρ(p)|H
0
]
π
0
· pF DR(α).
(B.87)
Proof of Theorem
2
. The proof follows analogously to that of Theorem
1
by noting
that the present assumptions lead to
lim
m→∞
Pr{P ≤ α}
a.s.
= Pr {P ≤ α}
(B.88)
lim
m→∞
Pr{P ≤ α|H
0
}
a.s.
≥ Pr {P ≤ α|H
0
}
(B.89)
lim
m→∞
ˆ
π
0
a.s.
≥ π
0
+ π
1
·
E [ ρ(p)| H
1
]
E [ ρ(p)| H
0
]
.
(B.90)

224
We shall prove each of these statements in turn.
Equation Equation (
B.88
) follows immediately by noting that our estimate Pr{P ≤
α}
R∨1
m
is not aﬀected by the distribution of θ. Equation Equation (
B.89
) can be
seen by noting that we can no longer use equality Equation (
B.49
) and must instead
use
Pr {P ≤ α|H
0
} =
θ
Pr {P ≤ α|H
0
, θ } · Pr {θ |H
0
} .
(B.91)
Thus, we have
lim
m→∞
Pr{P ≤ α|H
0
}
(B.92)
a.s.
=
θ
Pr {P ≤ α|H
0
, θ } · Pr {θ }
(B.93)
=
θ
Pr {P ≤ α|H
0
, θ } × · · ·
· · · × Pr {θ |H
0
} · π
0
+ Pr {θ |H
1
} · π
1
(B.94)
= π
0
θ
Pr {P ≤ α|H
0
, θ } · Pr {θ |H
0
} + · · ·
· · · + π
1
θ
Pr {P ≤ α|H
0
, θ } · Pr {θ |H
1
}
(B.95)
≥ π
0
θ
Pr {P ≤ α|H
0
, θ } · Pr {θ |H
0
} + · · ·
· · · + π
1
θ
Pr {P ≤ α|H
0
, θ } · Pr {θ |H
0
}
(B.96)
= Pr {P ≤ α|H
0
} ,
(B.97)
where the inequality follows from Assumption (
B.33
).
Finally, Inequality (
B.90
) follows from the fact that the added assumptions of The-
orem
2
only aﬀect the denominator of our π
0
estimate Equation (
B.26
). Furthermore,
Inequality (
B.89
) implies
lim
m→∞
Pr{P ≥ α|H
0
}
a.s.
≤ Pr {P ≥ α|H
0
} ,
(B.98)

225
from which it follows that
lim
m→∞
p
ρ(p) · Pr{P = α|H
0
}
a.s.
≤
p
ρ(p) · Pr {P = α|H
0
}
(B.99)
for any non-decreasing function ρ(p). Thus, it follows that
lim
m→∞
ˆ
π
0
a.s.
≥
E [ρ(p)]
E [ ρ(p)| H
0
]
(B.100)
= π
0
+ π
1
·
E [ ρ(p)| H
1
]
E [ ρ(p)| H
0
]
.
Lemma 4. Under the assumptions of Theorem
3
, if
Pr {P ≤ α|H
1
}
Pr {P ≤ α|H
0
}
(B.101)
is non-increasing in α, then
lim
m→∞
pF DR
∗
(α)
a.s.
≤ lim
m→∞
pF DR(α).
(B.102)
Proof. Recall our large sample estimate
pF DR(α) =
ˆ
π
0
· Pr{P ≤ α|H
0
}
Pr{P ≤ α}
(B.103)
=
ˆ
π
0
· m ·
1
m
m
i=1
Pr {P ≤ α|H
0
, θ
i
}
R(α) ∨ 1
(B.104)
=
ˆ
π
0
·
m
i=1
Pr {P ≤ α|H
0
, θ
i
}
R(α) ∨ 1
(B.105)
Removing n tests with p
∗
(θ) > α will have no eﬀect on (R(α) ∨ 1) or on
m
i=1
Pr {P ≤ α|H
0
, θ
i
} .
We will show, however, that, under the present assumptions, our π
0
estimate under
ﬁltering will almost surely be lower than our π
0
estimate without ﬁltering. Let p
+
denote the event p
∗
(θ) > α and p
−
denote the event p
∗
(θ) ≤ α. From equation (
B.81
),
we can write
lim
m→∞
ˆ
π
0
a.s.
= π
0
+ π
1
·
E [ρ(p)|H
1
]
E [ρ(p)|H
0
]
= π
0
+ π
1
·
E [ρ(p)|H
1
, p
+
] Pr {p
+
} + E [ρ(p)|H
1
, p
−
] Pr {p
−
}
E [ρ(p)|H
0
, p
+
] Pr {p
+
} + E [ρ(p)|H
0
, p
−
] Pr {p
−
}
(B.106)

226
Let
ˆ
π
0
(α)
E [ρ(p)|p
−
]
E [ρ(p)|H
0
, p
−
]
(B.107)
be the estimated π
0
over T
−
α
. We wish to show that
lim
m→∞
ˆ
π
0
(α) ≤ lim
m→∞
ˆ
π
0
(1),
(B.108)
which, by Equation (
B.106
) is true if an only if
E [ρ(p)|H
1
, p
+
] Pr {p
+
} + E [ρ(p)|H
1
, p
−
] Pr {p
−
}
E [ρ(p)|H
0
, p
+
] Pr {p
+
} + E [ρ(p)|H
0
, p
−
] Pr {p
−
}
≥
E [ρ(p)|H
1
, p
−
]
E [ρ(p)|H
0
, p
−
]
.
(B.109)
Thus, it follows that Equation (
B.108
) is true if and only if
E [ρ(p)|H
1
, p
+
]
E [ρ(p)|H
0
, p
+
]
≥
E [ρ(p)|H
1
, p
−
]
E [ρ(p)|H
0
, p
−
]
.
(B.110)
Now Assumption (
B.44
) implies that
Pr {P > α|H
1
, p
+
}
Pr {P > α|H
0
, p
+
}
≥
Pr {P > α|H
1
, p
−
}
Pr {P > α|H
0
, p
−
}
,
(B.111)
from which Inequality (
B.110
), and hence Lemma
4
, follows from the constraint that
ρ(·) is non-decreasing.
B.7
Discussion
The false discovery rate has proven to be an extremely useful tool when testing large
numbers of tests, as it allows the researcher to balance the number of signiﬁcant results
with an estimate of the proportion of those results that are truly null. Storey presented
novel methods for estimating pFDR and q-values for general test statistics [
208
,
209
].
He factored the pFDR computation into several components and suggested estimators
for each component. Perhaps the most discussed component is the π
0
—the proportion
of tests that are expected to be null over the entire data set. For example, Dalmasso
and colleagues [
42
] derived a class of π
0
estimators for continuous distributions that
take the same form as Equation (
B.26
) and explored properties of ρ(·). They proved
that a certain class of convex ρ(·) functions yielded provably less biased π
0
estimators

227
than ρ(p) = p. Similarly, Genovese and Wasserman [
78
] explore several estimators
under a mixture model framework that assumes a uniform continuous null distribution
and provide estimates of conﬁdence intervals, and Langaas and colleagues [
126
] use the
mixture model to deﬁne pi
0
estimators that perform particularly well under certain
continuous convexity assumptions.
When the data are ﬁnite, however, some of the underlying assumptions used by
the above methods, such as the uniform distribution of p values under the null and
the convexity and monotone distribution of p values under the alternative, are vi-
olated [
183
]. In such cases, some of the methods developed for general statistics
become overly conservative, and some may provide anti-conservative estimates. For
example, the estimators of Dalmasso et al. [
42
] assume that the null distribution is
non-increasing in p. As we have seen, contingency tables provide a common example
where these assumptions are grossly violated, even when the number of observations
in each table is quite high. In these cases, the use of marginal p-values leads to severe
conservative bias in the FDR estimation.
Pounds and Cheng [
183
] addressed the conservative bias of FDR estimation on
ﬁnite data by proposing a new π
0
estimator.
This estimator avoids the extreme
conservative bias of Storey’s spline-ﬁtting method on ﬁnite data, in which π
0
estimates
at λ = 1 may have more bias rather than less. On our data sets, the two methods were
comparable, the method of Pounds and Cheng was comparable to Storey’s estimator
at λ = 0.5. A key assumption in the method of Pounds and Cheng is that the
expected p-value under the null hypothesis is 0.5, which was grossly violated in all
of our contingency table data sets. Replacing this assumption with the exact null
distribution substantially decreased the bias in all our tests. Our theoretical results
indicate that optimal ρ(·) is that which minimizes the ratio of the expected ρ(·) under
the alternative hypothesis to the expected ρ(·) under the null hypothesis. Other ρ(·)
functions than those described here may thus yield less biased estimates.
Several authors have proposed randomization testing as a means of dealing with

228
non-uniform or unknown p-values distributions, with a focus on non-uniform contin-
uous distributions (see [
36
] for review). Focusing on Fisher’s exact test allows us to
implement exact permutation tests eﬃciently even for very large data sets, resulting
in exact estimation of the pooled null distribution, a straightforward analysis of the
convergence properties, and the removal of numerical error from the estimation.
Furthermore, the exact null distribution allows us to identify and remove tests
that cannot be called signiﬁcant, thereby increasing power. This approach was ﬁrst
proposed by Gilbert [
79
], who proposed choosing a p-value threshold p
0
and removing
a priori all tests for which no permutation of the contingency table results in p ≤ p
0
.
To choose p
0
, Gilbert suggested using a derivative of the Bonferroni adjusted p-value.
Unfortunately, it can be shown that this threshold is too aggressive and will often
remove tests that should be considered signiﬁcant. In contrast, choosing p
0
= α leaves
the true pFDR unchanged while often achieving an increase in statistical power.
This paper provides estimators for the various components of the pFDR, based on
a permutation testing approach. We combine here several ideas that were previously
suggested, adapting them to the important case of contingency tables. As we have
shown above, our methods can rapidly provide tight estimates of pFDR and q-values
for very large data sets. Although we have chosen to focus on Fisher’s exact test,
analogous results can be derived for any discrete test for which all permutations of
the data can be eﬃciently computed.

229
VITA
Jonathan Carlson was born and raised in Beaverton, Oregon. In 2003, he gradu-
ated from Dartmouth, where he met a beautiful girl named Kate, pole vaulted, and
batted cleanup for the Fighting Mullets. Although the Mullets made the intramural
championships several times, they had a propensity for choking and never came away
with a T-shirt. In 2004, Jonathan married Kate, ﬁnally said goodbye to Dartmouth,
and moved back west, hoping to ﬁnd a team that could come through in the clutch.
In 2006, he signed with the Infrared Sox in the University of Washington co-rec league
and with the Fleas in the men’s league. He went on to win two T-shirts with the
IR Sox, setting team records for home runs and slugging percentage, and one T-shirt
with the Fleas. In 2009 he graduated with his Ph.D. in computer science and engi-
neering from the University of Washington. He currently resides in Marina del Rey,
California, is a researcher for the eScience group of Microsoft Research, and is a free
agent.

Document Outline

Table of Contents
List of Figures
List of Tables
Introduction
Detecting Adaptation: Introduction and Review
- Selection and adaptation
- Phylogeny confounds the comparative method
- Related work on the comparative method
- Limitations of existing methods
HIV Immune Escape: Introduction and Review
- The HLA-restricted CTL response is a major selective force driving HIV-1 evolution within an infected host
- Escape follows generally predictable patterns in response to specific immune pressures
- Immune selection pressures drive HIV evolution at the population level: but to what extent?
- Assessing the extent of HLA-driven HIV-1 evolution at the population level: challenges and controversies
- HLA-associated immune pressures influence population HIV diversity at up to 40% of positions in some proteins
- Clinical consequences of immune-mediated evolution
- Strategies to cope with viral diversity in HIV-1 vaccine design
- Remaining challenges
Phylogenetic Dependency Networks
- Phylogenetically corrected distributions for one predictor trait
- Phylogenetically corrected distributions for more than one predictor trait
- q-values
- Model Details
Evaluation and Application of the Univariate Model
- Technical details
- Experiments with synthetic data
- Application 1: Effect of immune pressure on HIV evolution
- Application 2: Pairwise correlations between amino acids in HIV
- Application 3: Genomic search for genotype-phenotype associations in Arabidopsis thaliana
- Studies using the univariate conditional evolution model
- Limitations of univariate conditional evolution model
- Conclusions
Evaluation of Multivariate Models
- Technical Details
- Model validation on synthetic data
Using PDNs to Infer Patterns of Immune Escape and Covariation in HIV
- Technical details
- Phylogenetic dependency network for Gag p17 and p24
- Discussion
Summary
- Limitations and future directions
Glossary
Bibliography
Next Generation Sequencing: Extending the model to single genome sequences
- Likelihood calculation
- Expectation maximization
On computing FDR for Fisher's exact test
- Examples of FET for sequence data
- Background
- Computing pFDR for Fisher's exact test
- Numerical results
- Creating synthetic data sets
- Proofs and Remarks
- Discussion

Download 4,8 Kb.

Do'stlaringiz bilan baham:

1 ... 4 5 6 7 8 9 10 11 12