Analytical Mechanics This page intentionally left blank
Download 10.87 Mb. Pdf ko'rish
|
−f(x)), if at least one of these
integrals is finite. In this case f is called µ-summable. If X |f| dµ < +∞ we say that f is µ-integrable. Let A ∈ A; f is said to be µ-integrable on A if the function f χ A is µ-integrable on X. 13.2 Analytical mechanics 549 We set
A f dµ =
X f χ
A dµ.
(13.6) The space of µ-integrable functions on X is denoted by L 1 (X,
A, µ). Consider 0 < p < + ∞. The space of functions f such that |f| p is µ-integrable on X is denoted by L p (X, A, µ). A particular and well-known case is that of the functions on R l which are Lebesgue integrable. Remark 13.2 Assume that X is a compact metric space and A is the σ-algebra of Borel sets X. In this case one can define the support of a measure µ as the smallest compact set K ⊂ X such that µ(A) = 0 for all A ⊂ X \ K. Moreover, it is possible to endow M(X) with a topological structure by defining at every point µ ∈ M(X) a basis of neighbourhoods V ϕ,ε
(µ) := ν ∈ M(X) | X ϕ dν
− X ϕ dµ ≤ ε , (13.7)
where ε > 0 and ϕ : X → R is continuous. In this topology a sequence of measures (µ n ) n ∈N ⊆ M(X) converges to µ ∈ M(X) if for every ϕ : X → R continuous we have X ϕ dµ n → X ϕ dµ. (13.8)
Remark 13.3 In what follows we always assume that X is totally σ-finite, and hence that X = ∞ i
A i , where the sets A i ∈ A have measure µ(A i ) < +
∞ for every i ∈ N. D efinition 13.8 Let X be a set, and A be a σ-algebra of subsets of X. If µ, ν : A → [0, +∞] are two measures, we say that µ is absolutely continuous with respect to ν if for every A ∈ A such that ν(A) = 0 we have µ(A) = 0. If µ is not absolutely continuous with respect to ν then it is said to be singular (with respect to ν). An important characterisation of measures which are absolutely continuous with respect to another measure is given by the following theorem. The proof, that goes beyond the scope of this limited introduction, can be found in the book of Rudin, already cited. T heorem 13.1 (Radon–Nikodym) A measure µ : A → [0, +∞] is absolutely continuous with respect to another measure ν : A → [0, +∞] if and only if there exists a function ρ : X → R, integrable with respect to ν on every subset A ∈ A such that ν(A) < + ∞, and such that for every A ∈ A we have µ(A) = A
(13.9) 550 Analytical mechanics 13.3 The function ρ is unique (if we identify any two functions which only differ on a set of ν measure zero), and it is called the Radon–Nikodym derivative of µ with respect to ν, or density of µ with respect to ν, and it is denoted by ρ = dµ/dν. Remark 13.4 We have that g ∈ L
1 (X,
A, µ) if and only if gρ ∈ L 1 (X, A, ν). In this case X g dµ = X gρ dν.
(13.10) 13.3
Measurable dynamical systems The objects of study of ergodic theory are the dynamical systems that preserve a measure, in a sense that we now make precise. D efinition 13.9 Let (X, A, µ) be a measure space. A transformation S : X → X is said to be measurable if for every A ∈ A, we have S −1 (A)
∈ A. A measurable transformation is non-singular if µ(S −1 (A)) = 0 for all A ∈ A such that µ(A) = 0. Obviously S −1 (A) = {x ∈ X | S(x) ∈ A} and S is not necessarily invertible. For example, if X is a topological space, A is the σ-algebra of Borel sets and S is a homeomorphism, then S is measurable and non-singular if and only if the inverse map is measurable and non-singular. D efinition 13.10 Let (X, A, µ) be a measure space. A measurable non-singular transformation S : X → X preserves the measure (i.e. the measure µ is invariant with respect to the transformation S) if for every A ∈ A, we have µ(S −1 (A)) =
µ(A). If S is invertible with a measurable non-singular inverse and if it preserves the measure, then clearly µ(S −1 (A)) = µ(A) = µ(S(A)), ∀A ∈ A. If however S is not invertible, the following simple example highlights the need to use the condition µ(S −1 (A)) = µ(A) in the previous definition. Choose X = (0, 1) and the σ-algebra A of Borel sets on (0, 1); the transformation S(x) = 2x (mod 1) preserves the Lebesgue measure λ, while if we take an interval (a, b) ⊂ (0, 1) then λ (S (a, b)) = 2λ ((a, b)). Remark 13.5 Let f be µ-integrable and assume that S preserves the measure µ. Then X f (x) dµ = X f (S(x)) ˙dµ. Conversely, if this property holds for every f : X → R continuous, then S preserves the measure µ. D efinition 13.11 A measurable dynamical system (X, A, µ, S) is constituted by a probability space (X, A, µ) and by a transformation S : X → X which preserves 13.3 Analytical mechanics 551 the measure µ. The orbit of a point x ∈ X is the infinite sequence of points x, S(x), S 2 (x) = S(S(x)), . . . , S n +1 (x) = S(S n (x)), . . . obtained by iterating S. Remark 13.6 The recurrence theorem of Poincar´ e (Theorem 8.4) can be extended without difficulty to the case of measurable dynamical systems (X, A, µ, S). We state it, and leave the proof as an exercise: for every A ∈ A the subset A 0 of all points x ∈ A such that S n (x)
∈ A for infinitely many values of n ∈ N belongs to A and µ(A) = µ(A 0 ).
e is presented in Problem 13.15. A particularly interesting case arises when X is a subset of R l (or, more generally, of a differentiable Riemannian manifold M of dimension l) and µ ρ is a probability measure which is absolutely continuous with respect to the Lebesgue measure
dµ ρ (x) = ρ(x) dx (13.11) (or dµ
ρ = ρ dV
g , where dV g =
ij ) d
l x is the volume element associated with the metric g on the manifold M ). The definition of a measure that is invariant with respect to a non-singular transformation S : X → X is therefore equivalent to S −1
ρ(x) dx = A ρ(x) dx, (13.12) for every A ∈ A. A very important problem in ergodic theory is the problem of determining all measures that are invariant for a given transformation. A case when this is possible is given by the following systems. Let X = [0, 1], and S : X → X be non-singular. Assume that S is piecewise monotone and of class C 1
decomposition of the interval [0, 1] into intervals [a i , a i +1 ], i ∈ I, on which S is monotone (and C 1
−1 i of S is well defined. Let A = [0, x]. Equation (13.12) becomes x 0 ρ(s) ds = i ∈ I S −1 i ([0,x]) ρ(s) ds,
from which, by differentiating with respect to x, we obtain ρ(x) =
i ∈ I x ρ(S
−1 i (x)) |S (S −1 i (x)) | , (13.13) where
I x indicates the subset of I corresponding to the indices i such that S −1 i (x) =
/ ∅. Equation (13.13) is therefore a condition (necessary and sufficient) 552 Analytical mechanics 13.3 for the density ρ for a measure that is absolutely continuous with respect to the Lebesgue measure to be invariant with respect to S. Example 13.4 (Ulam and von Neumann 1947) Consider X = [0, 1], S(x) = 4x(1 − x). The probability measure dµ(x) = dx/π x(1
− x) is invariant. Indeed, to every point x ∈ X there correspond two preimages S −1 1 (x) = 1 2 (1 − √ 1 − x) ∈ [0, 1/2] and S −1 2
1 2 (1 + √ 1 − x) ∈ [1/2, 1]. Therefore, equation (13.13) becomes 1 x(1 − x) = 1 S −1 1 (x)(1 − S
−1 1 (x)) |4 − 8S −1 1 (x) | + 1 S −1 2 (x)(1
− S −1 2 (x)) |4 − 8S
−1 2 (x) | , which is immediately verified. Example 13.5 : the p-adic transformation Consider X = [0, 1], and p ∈ N, S(x) = px (mod 1), and hence S(x) = px − m if m/p
≤ x < (m + 1)/p, m = 0, . . . , p − 1, S(1) = 1. The p-adic transformation preserves the Lebesgue measure. Example 13.6 : the Gauss transformation Consider X = [0, 1), S(x) = 1/x − [1/x] if x = / 0, S(0) = 0, where [ ·] denotes the integer part of a number. The probability measure dµ(x) = dx/(1 + x) log 2 is invariant. Indeed, S is invertible on the intervals [1/(n + 1), 1/n] , n ∈ N, with inverse S −1 n (x) = 1/(n + x), and ∞ n =1 1 1 + S −1 n (x) 1 |S (S
−1 n (x)) | = ∞ n =1 1 1 + 1/(n + x) 1 (n + x) 2 = ∞ n =1 1 (n + x + 1)(n + x) = ∞ n =1 1 n + x − 1 n + x + 1 = 1 x + 1 . Example 13.7 : the ‘baker’s transformation’ If X = [0, 1] × [0, 1], then S(x, y) = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ 2x, 1 2 y , if 0
≤ x < 1 2 , 2x − 1, 1 2 y + 1 2 , if 1 2 ≤ x ≤ 1 13.3 Analytical mechanics 553 (Fig. 13.1) preserves the Lebesgue measure. From a geometrical point of view, S transforms the square [0, 1] × [0, 1] in the rectangle [0, 2] × [0, 1 2
right half of this rectangle and translates it on top of the left half. Example 13.8 Let X be a Riemannian manifold, and S : X → X be an isometry. The measure dV g
Example 13.9 Let X = R 2l , and S : X → X be a completely canonical transformation. By the Liouville theorem, the Lebesgue measure is invariant with respect to S. Example 13.10 : ‘Arnol’d’s cat’ (Arnol’d and Avez 1967) If X = T
2 , then S(x 1 , x
2 ) = (x
1 + x
2 , x
1 + 2x
2 ) (mod 1) preserves the Lebesgue measure. Assume that X is a Riemannian manifold and that S : X → X is a diffeo- morphism of M such that, ∀ x ∈ X, |det(DS(x))| < 1. Then it can be shown that there exists an attractor Ω ⊂ X and a basin of attraction U, i.e. a neighbourhood of Ω such that S(U ) ⊂ U and ∩ n ≥0 S n (U ) = Ω . In addition, the volume of Ω (with respect to the volume form induced by the Riemannian structure on X) vanishes and all the probability measures that are invariant for S have support contained in the attractor Ω .
Ω = {0}, U = R, where the only invariant measure is the Dirac measure δ 0 (x) at the point x = 0: δ 0 (A) = 0, if 0 /
∈ A, 1, if 0 ∈ A. 1 0 1 1 1 2 a a a b b b 1 1 1 1 0 0 1 1 4 2 3 2 2b 4b 1b 3b 2a 4a 1a 3a 4 1 3 1 4 1 1 4 2 3 1 4 S S 2 2 a a a b b b 2 3 4 2 Fig. 13.1 554 Analytical mechanics 13.4 In general δ y (A) = 0, if y /
∈ A, 1, if y ∈ A. Another example is given by X = R 2 , where S(x, y) is the flow at time t = 1 of the following system of ordinary differential equations: ˙ x = x(1 − x 2 − y 2 ) − y, ˙ y = y(1
− x 2 − y 2 ) + x.
Introducing polar coordinates x = r cos θ, y = r sin θ it is immediate to verify that ˙r = r(1 − r 2
θ = 1, and therefore the circle r = 1 is an attractor. Note that ˙r > 0 if 0 < r < 1, while ˙r < 0 if 1 < r < + ∞. In this case Ω = S
1 , U = R
2 \ {0} and the invariant measure is dµ(r, θ) = δ r
dθ/2π. The support of this measure is precisely the limit cycle S 1 = {r = 1}. 13.4
Ergodicity and frequency of visits Consider a measurable dynamical system (X, A, µ, S). The first fundamental notion associated with such a system is its ‘statistics’, which is the frequency with which the orbit {S j x } j ∈N of a point x ∈ X visits a prescribed measurable set A
∈ A. To this end, we define for every n ∈ N the number of visits T (x, A, n) of A by the orbit of x: T (x, A, n) := n −1 j =0 χ A (S j x), (13.14)
where χ A indicates the characteristic function of the set A. D efinition 13.12 We call the frequency of visits ν(x, A) of the set A by the orbit of x the limit (when it exists) ν(x, A) = lim n
1 n T (x, A, n). (13.15) The first important result for the study of the orbit statistics of a dynamical system is the existence of the frequency of visits for µ-almost every initial condition. 13.4 Analytical mechanics 555 T
Proof Let n
∈ N be fixed, and set ν(x, A, n) := 1 n
(average frequency after n steps), ν(x, A) := lim sup n →+∞
ν(x, A, n). Obviously 0 ≤ ν(x, A) ≤ 1 and ∀ k ∈ N we have ν(S k x, A) = ν(x, A). Analogous properties hold for ν(x, A) := lim inf n →+∞ ν(x, A, n). We want to prove that for µ-almost every x ∈ X, we have ν(x, A) = ν(x, A). To this end, introduce ε > 0 and the function τ A (x, ε) := min {n ∈ N such that ν(x, A, n) ≥ ν(x, A) − ε} so that ν(x, A, τ A (x, ε)) ≥ ν(x, A)−ε. Suppose that there exists M > 0 such that τ A (x, ε) ≤ M for every x ∈ X. In this case we can decompose the orbit of x up to time n, i.e. the finite sequence (S j x) 0 ≤ j < n , in parts on each of which the average frequency of visits of A is at least ν(x, A) −ε. Indeed, consider the points x 0
1 = S
τ A (x 0 ,ε ) x 0 , x 2 = S
τ A (x 1 ,ε ) x 1 = S τ A (x 1 ,ε )+τ A (x 0 ,ε ) x 0 , and so on. We then have T (x
j , A, τ
A (x j , ε)) = τ A (x j , ε)ν(x
j , A, τ
A (x j , ε)) ≥ τ
A (x j , ε)(ν(x j , A) − ε) = τ
A (x j , ε)(ν(x, A) − ε),
where we have used the previous remark ν(S k x, A) = ν(x, A) ∀ k ∈ N. For fixed n > M , proceed in this way until a point x J = S
τ J x, with τ J = J −1 j =0 τ A (x j , ε) < n and τ J + τ
A (x J , ε) ≥ n, is reached. We then have T (x, A, n) = J −1 j =0 T (x j , A, τ
A (x j , ε)) + T (x J , A, n − τ J ) ≥ ⎛ ⎝ J −1 j =0 τ A (x j , ε) ⎞ ⎠ (ν(x, A) − ε) = τ J (ν(x, A)
− ε). 556 Analytical mechanics 13.4 On the other hand, τ J ≥ n − τ
A (x J , ε) ≥ n − M, and hence we found that ∀ n > M, T (x, A, n) ≥ (n − M)(ν(x, A) − ε). Integrating this inequality over all the set X, since X T (x, A, n) dµ = n −1 j =0 X χ A (S j x) dµ = n −1 j =0 X χ A (x) dµ = nµ(A), we find nµ(A) ≥ (n − M) X [ν(x, A)
− ε] dµ; hence we find that ∀ ε > 0, µ(A)
≥ X ν(x, A) dµ − ε or µ(A)
≥ X ν(x, A) dµ. It is now possible to repeat this procedure considering ν(x, A) in place of ν(x, A), defining τ A
{n ∈ N such that ν(x, A, n) ≤ ν(x, A) + ε} and supposing that this is also bounded, to arrive at the conclusion that X ν(x, A) dµ ≥ µ(A) ≥ X ν(x, A) ˙dµ. From this, taking into account that ν and ν are non-negative, it obviously follows that ν(x, A) ≥ ν(x, A) µ-almost everywhere, and therefore that ν(x, A) exists µ-almost everywhere. We still need to consider the case that τ A (or τ A ) is not a bounded function. In this case, remembering the definition, we choose M > 0 sufficiently large, so that we have µ( {x ∈ X | τ A (x, ε) > M }) < ε.
13.4 Analytical mechanics 557 From this choice it follows that A = A ∪ {x ∈ X | τ A (x, ε) > M }, so we have µ(A) ≤ µ(A) + ε. Considering now the number of visits T (x, A, n) relative to A, setting ν(x, A) as before, the function τ A (x, ε) relative to A is now bounded. Hence proceeding as above, we arrive at the inequality µ(A) >
X ν(x, A) dµ − ε, from which, taking into account that µ(A) ≤ µ(A)+ε and that ν(x, A) ≥ ν(x, A), we deduce µ(A) ≥
ν(x, A) dµ − 2ε.
Since ε is arbitrary, µ(A) ≥ X ν(x, A) dµ, exactly as in the case that τ A is bounded. The frequency of visits describes the ‘statistics’ of an orbit and can depend essentially upon it. Example 13.11 : the square billiard Consider a point particle of unit mass moving freely in a square of side 2π, and reflected elastically by the walls. To study the motion, instead of reflecting the trajectory when it meets a wall, we can reflect the square with respect to the wall and consider the motion as undisturbed (note that this argument shows how to extend the motion of the particle to the case in which the trajectory meets one of the vertices of the square). In this way each trajectory of the billiard corresponds to a geodesic of the flat torus (recall the results of Sections I.7 and I.8), and hence we can apply the results of Section 11.7. In particular we find that if α denotes the angle of incidence (constant) of the trajectory on a wall, then the latter is periodic if tan α is a rational number, and it is dense on the torus if tan α is irrational. Given any measurable subset A of the torus T 2 it is evident that the frequency of visits of A by one of the billiard’s orbits depends essentially on the initial condition (s, α). However, it is possible to compute it exactly thanks to Theorem 11.9 and the following theorem of Lusin (see Rudin 1974, pp. 69–70). There exists a sequence of continuous functions χ j : T
2 → [0, 1] such that χ j
A for j
→ ∞ almost everywhere. 558 Analytical mechanics 13.4 Applying Theorem 11.9 to the sequence χ j we find that if tan α is irrational, we have ν(s, α, A) = lim T →∞ 1 T T 0 χ A (s + t cos α, t sin α) dt = lim j →∞ lim T →∞ 1 T T 0 χ j (s + t cos α, t sin α) dt = lim
j →∞ 2π 0 dx 1 2π 0 dx 2 χ j (x 1 , x 2 ) = µ(A), where µ denotes 1/(2π) 2 multiplied by the Lebesgue measure on T 2 . Therefore, for almost every initial condition, the frequency of visits of a meas- urable set A by the corresponding trajectory of the billiard in the square is simply equal to the measure of A, and is hence independent of the initial condition. What we have just discussed is a first example of an ergodic system. D efinition 13.13 A measurable dynamical system (X, A, µ, S) is called ergodic if for every choice of A ∈ A it holds that ν(x, A) = µ(A) for µ-almost every x ∈ X.
We now turn to the study of ergodic systems and their properties. We start with a remark: consider A ∈ A and let χ A be its characteristic function. Since µ(A) = X χ A (x) dµ, the ergodicity is equivalent to the statement that ∀ A ∈ A and for µ-almost every x ∈ X one has lim n →∞ 1 n n −1 j =0 χ A (S j x) =
X χ A (x) dµ. (13.16)
If instead of the characteristic function of a set we consider arbitrary integ- rable functions f ∈ L 1
A, µ), the following corresponding generalisation of Theorem 13.2 is called Birkhoff ’s theorem. T heorem 13.3 Let (X, A, µ, S) be a measurable dynamical system, and let f ∈ L 1 (X, A, µ). For µ-almost every x ∈ X the limit ˆ f (x) := lim n →∞ 1 n n −1 j =0 f (S j x) (13.17) exists and it is called the time average of f along the orbit of the point x ∈ X. For the proof of this theorem see Gallavotti (1981) and Cornfeld et al. (1982). We remark however that from the proof of Theorem 4.1 it follows in fact that the time average exists whenever f is a finite linear combination of characteristic functions of measurable sets (hence every time that f is a simple function): f = m
=1 a j χ A j , a j ∈ R, A j ∈ A, ∀ j = 1, . . . , m. (13.18) 13.4 Analytical mechanics 559 Recall that every function f ∈ L 1 (X, A, µ) is the limit a.e. of a sequence of simple functions. Remark 13.7 It is obvious that ˆ f (Sx) = ˆ f (x), and hence that the time average depends on the orbit and not on the initial point chosen along the orbit. In addition, since µ is S-invariant, by Remark 13.5 we have X f (x) dµ = X f (Sx) dµ, from which it follows, by an application of the theorem of Lebesgue on dominated convergence to (13.17): f µ
X f (x) dµ = X ˆ
(13.19) The quantity f µ =
f dµ is called the phase average of f (or expectation of f ) and equation (13.19) implies that f and its time average ˆ f have the same expectation value. The ergodicity of a dynamical system has as the important consequence that the phase and time averages are equal almost everywhere, as the following theorem shows (see property 4). T heorem 13.4 Let (X, A, µ, S) be a measurable dynamical system. The following properties are equivalent. (1) The system is ergodic. (2) The system is metrically indecomposable: every invariant set A ∈ A (i.e. every set such that S −1 (A) = A) has measure µ(A) either zero or equal to µ(X) = 1. (3) If f
∈ L 1 (X, A, µ) is invariant (i.e. f ◦ S = f µ-almost everywhere) then f is constant µ-almost everywhere. (4) If f ∈ L
1 (X,
A, µ) then f µ = ˆ f (x) for µ-almost every x ∈ X.
(5) ∀ A, B ∈ A then lim n
1 n n −1 j =0 µ(S −j (A) ∩ B) = µ(A)µ(B). (13.20)
Proof (1)
⇒ (2) Suppose that there exists an invariant set A ∈ A with measure µ(A) > 0. Since A is invariant for every choice of x ∈ A the frequency of visits of A is precisely ν(x, A) = 1. But since the system is ergodic for µ-almost every x, then ν(x, A) = µ(A). The hypothesis that µ(A) > 0 then yields µ(A) = 1. (2)
⇒ (3) If f ∈ L 1 (X, A, µ) is invariant, for every choice of γ ∈ R the set A
γ = {x ∈ X | f(x) ≤ γ} is invariant. Since the system is metrically indecomposable it follows that either µ(A γ ) = 0 or µ(A γ ) = 1. On the other 560 Analytical mechanics 13.4 hand, if γ 1 < γ 2 clearly A γ 1 ⊂ A γ 2 . Therefore setting γ f = inf
{γ ∈ R | µ(A γ ) = 1 } it follows that f (x) = γ f for µ-almost every x. (3) ⇒ (4) Since the time average ˆ f is invariant we have that ˆ f is constant µ-almost everywhere. From equation (13.19) it then follows that ˆ f (x) = f µ for
µ-almost every x ∈ X.
(4) ⇒ (1) It suffices to apply hypothesis (4) to the characteristic function χ A of the set A. (4) ⇒ (5) Let f = χ A . For µ-a.e. x ∈ X we have µ(A) =
X χ A dµ = ˆ χ A (x) = lim n →∞ 1 n n −1 j =0 χ A (S j (x)).
By the dominated convergence theorem, we have µ(A)µ(B) = X lim
n →∞ 1 n n −1 j =0 χ A (S j (x))χ B (x) dµ = lim n →∞ 1 n X n −1 j =0 χ A (S j (x))χ B (x) dµ
= lim n →∞ 1 n n −1 j =0 µ(S −j (A) ∩ B). (5)
⇒ (2) Let A be invariant. Setting B = A c we have by (5) that µ(A)µ(A c ) = lim n →∞ 1 n n −1 j =0 µ(S −j (A)
∩ A c ) = 0 because of the invariance of A. Hence µ(A) = 0 or µ(A c ) = 0. In general a dynamical system has more than just one invariant measure. For example if it has a periodic orbit {x i
n i =1 , x i +1 = S(x i ) i = 0, . . . , n −1, x 0 = S(x n ), the measure µ(x) = 1 n n i =1 δ x i (x) (13.21)
is invariant, where δ y (x) denotes the Dirac measure at the point y: δ y (A) = 0, if y /
∈ A, 1, if y ∈ A. (13.22)
Given that a system often has many periodic orbits it follows that it also has many distinct invariant measures, and (13.21) clearly implies that they are not absolutely continuous with respect to one another. For ergodic transformations, the distinct invariant measures are necessarily singular. T heorem 13.5 Assume that (X, A, µ, S) is ergodic and that µ 1 : A → [0, 1] is another S-invariant probability measure. The following statements are then 13.4 Analytical mechanics 561 equivalent: (1) µ 1 = / µ; (2) µ
1 is not absolutely continuous with respect to µ; (3) there exists an invariant set A ∈ A such that µ(A) = 0 and µ 1 (A) =
/ 0. Proof
(1) ⇒ (2) If µ 1 were absolutely continuous with respect to µ, the Radon– Nykodim derivative dµ 1 /dµ would be an invariant function in L 1 (X,
A, µ). Since the system (X, A, µ, S) is ergodic it follows that (dµ 1 /dµ)(x) is constant µ- almost everywhere and therefore it is necessarily equal to 1 as both µ and µ 1 are probability measures. It follows that µ 1 = µ, a contradiction. (2) ⇒ (3) Since µ 1 is not absolutely continuous with respect to µ there exists B ∈ A such that µ(B) = 0, while µ 1 (B) =
/ 0. Setting A = ∞ i =0 S j (B), it is immediate to verify that A ∈ A, µ(A) = 0, µ 1 (A) = / 0. (3)
⇒ (1) Obvious. Suppose that X is a compact metric space and A is the σ-algebra of the Borel sets of X. In some exceptional cases, a dynamical system (X, A, µ, S) can have a unique invariant measure. In this case the system is called uniquely ergodic. This has the following motivation. T heorem 13.6 Let (X, A, µ, S) be a uniquely ergodic system. Then the system is ergodic and for every choice of f : X → R continuous and x ∈ X the sequence (1/n) n
j =0 f (S j (x)) converges uniformly to a constant that is independent of x. Therefore the time average exists for every x ∈ X and has value X f dµ.
Proof If the system were metrically decomposable, there would exist an invariant subset A ⊂ X such that 0 < µ(A) < 1. The measure dν = χ A (dµ/µ(A)) is an invariant probability measure distinct from µ: µ(A c ) = 1 − µ(A) while ν(A c ) = 0. This contradicts the hypothesis that the system is uniquely ergodic, and hence the system cannot be metrically decomposable. Suppose then that there exists a continuous function f : X → R for which the sequence of functions (1/n)
n −1 j =0 f ◦ S j n ∈N does not converge uniformly to X f dµ (because of ergodicity, this is the limit of the sequence µ-almost everywhere). There then exist ε > 0, and two sequences (n i )
∈N ⊂ N, n
i → ∞
and (x i ) i ∈N ⊂ X such that for every i ∈ N we have 1 n i n i −1 j =0 f (S j (x i )) − X f dµ ≥ ε.
(13.23) 562 Analytical mechanics 13.4 Consider the sequence of probability measures on X: ν i := 1 n i n i −1 j =0 δ S j (x i ) . (13.24) By the compactness of the space of probability measures on X (see Problem 1 of Section 13.13 for a proof) there is no loss of generality in assuming that the sequence ν i converges to a probability measure ν. We show that ν is invariant; to this end, thanks to Remark 13.5, it is sufficient to show that for every continuous g : X
→ R we have X g(S(x)) dν = X g(x) dν. On the other hand X g(S(x)) dν = lim i →∞ X
g(S(x)) dν i = lim i →∞ 1 n i n i −1 j =0 g(S
j +1 (x i )) = lim i →∞ X g(x) dν i − 1 n i g(x i ) + 1 n i g(S n i +1 (x i )) . (13.25)
Since X is compact and g is continuous, the second and third terms of the sum in (13.25) have limits of zero. It follows that the measure ν is invariant. Recalling equations (13.23) and (13.24) we have X f dν − X f dµ = lim i →∞ X f dν i − X f dµ
= lim i →∞ 1 n i n i −1 j =0 f (S j (x i )) − X f dµ ≥ ε,
which shows that ν = / µ and contradicts the hypothesis that µ is the only invariant measure of the system. Remark 13.8 It is not difficult to prove that if for every continuous function f : X → R the
limit lim n →∞ (1/n) n −1 j =0 f (S j (x)) exists for every fixed point x, independently of x, then the system is uniquely ergodic. Example 13.12 Let ω ∈ R
l be such that ω · k + p = 0 for every p ∈ Z and for every k ∈ Z l \{0}. Consider the measurable dynamical system determined by X = T l , A is the σ-algebra of Borel sets on T l , χ
∈ T l , dµ(χ) = 1/(2π) l d l χ is the Haar measure on T l and Sχ = χ + ω (mod 2πZ l ). Theorem 11.9 guarantees that the time average exists ∀χ ∈ T
l and it is independent of the choice of the initial point χ ∈ T
l . Therefore, by the previous remark, the system is uniquely ergodic.
13.5 Analytical mechanics 563 13.5
Mixing One of the equivalent characterisations of ergodicity for a measurable dynamical system (X, A, µ, S) is the fact that on average the measure of the preimages S −j
∈ A is distributed uniformly on the whole support of the measure µ in the sense described by (13.20) (see Theorem 13.4). However, the existence of the limit in (13.20) does not guarantee that the limit of the sequence µ(S
−j (A)
∩ B) exists, but it guarantees that if this sequence converges, then its limit is µ(A)µ(B). 1 It is therefore natural to consider the dynamical systems satisfying the following definition. D efinition 13.14 A measurable dynamical system (X, A, µ, S) is mixing if ∀ A, B ∈ A one has lim
n →+∞
µ(S −n (A) ∩ B) = µ(A)µ(B). (13.26)
Since equation (13.26) implies (13.20) every mixing system is ergodic. An independent verification of this fact can be obtained assuming that A is invariant, in which case from (13.26) it follows that µ(A)µ(A
c ) = lim
n →∞ µ(S −n (A)
∩ A c ) = µ(A ∩ A c ) = 0, and therefore either µ(A) = 0 or µ(A c ) = 0, and the system is metrically indecomposable. The converse is false: the irrational translations on tori (see Example 13.12) are uniquely ergodic but not a mixing (see Problem 9 of Section 13.12). A simple example of a mixing dynamical system is given by the so-called ‘baker’s transformation’ of Example 3.4, as we shall see below (see Problem 2 of Section 13.13).
Just as ergodicity has an equivalent formulation in terms of the behaviour of the time average of integrable functions, mixing can be characterised by studying the functions f : X → R which are measurable and square integrable. D efinition 13.15 Let the measurable dynamical system (X, A, µ, S) be given. The linear operator U S : L 2 (X,
A, µ) → L 2 (X, A, S) defined by U S f = f ◦ S
(13.27) is called Koopman’s operator. Recalling the definition of the scalar product of two functions f, g ∈ L
2 (X,
A, µ): f, g :=
X f g dµ,
(13.28) 1 Recall that in a probability space ( X, A, µ, S), two sets (or events) A, B ∈ A are independent if µ(A ∩ B) = µ(A)µ(B).
564 Analytical mechanics 13.5 it is immediate to verify that since S preserves the measure µ, U S is an isometry: U S
S g = f, g , ∀ f, g ∈ L 2 (X, A, µ). (13.29)
T heorem 13.7 A necessary and sufficient condition for the measurable dynamical system (X, A, µ, S) to be mixing is that lim n
U n S f, g = f, 1 1, g
(13.30) for every f, g ∈ L 2
A, µ). Remark 13.9 The quantity U n S f, g
− f, 1 1, g = X f ◦ S n g dµ − X f dµ X g dµ
is also called the correlation between f and g at time n. Theorem 13.7 therefore states that a system is mixing if and only if the correlation between any two functions tends to zero as n → ∞.
Proof of Theorem 13.7 It is immediate to verify that (13.30) implies that the system is mixing; it is enough to apply it to f = χ A and g = χ B , A, B
∈ A. Conversely, assuming that (X, A, µ, S) is mixing, then (13.26) implies that equation (13.30) holds when f and g are two characteristic functions of sets belonging to A. By linearity, we therefore find that equation (13.30) is valid when f and g are two simple functions. Recall now that simple functions are dense in L 2 (X,
A, µ) (see Rudin 1974); hence it follows that ∀ f, g ∈ L 2 (X, A, µ) and ∀ ε > 0 there exist two simple functions f 0 , g
0 ∈ L
2 (X,
A, µ) such that f − f 0 = f − f 0 , f − f 0 ≤ ε g − g
0 = g − g 0 , g − g 0 ≤ ε lim n →∞ U n S f 0 , g 0 = f
0 , 1
1, g 0 . Writing U n S f, g = U
n S f 0 , g
0 + U
n S f, g − g 0 + U n S (f − f 0 ), g 0 since U
S is an isometry, using the Schwarz inequality | f, g | ≥ f g one has | U
n S f, g − f, 1 1, g | ≤ | U n S f 0 , g 0 − f
0 , 1
1, g 0 | + f g − g 0 + f
− f 0 g 0 + ε f + ε g 0 .
13.6 Analytical mechanics 565 There then exists a constant c > 0 such that if n is sufficiently large | U n S f, g − f, 1 1, g | ≤ cε, and hence (13.30) follows. Example 13.13 : linear automorphisms of the torus T 2 Consider the flat two-dimensional torus with the σ-algebra of Borel sets, and the Haar measure dµ (χ) = (1/4π 2 ) dχ 1 dχ 2 . A linear automorphism of the torus is given by
S(χ 1 , χ 2 ) = (aχ
1 + bχ
2 , cχ
1 + dχ
2 ) mod 2πZ 2 ,
where a, b, c, d ∈ Z and |ad − bc| = 1. It is easy to verify that the Haar measure is S-invariant. We now prove that if the matrix σ = a b
d has no eigenvalue with unit modulus, then the system is mixing. To this end, we check that (13.30) is satisfied by the functions f k (χ) = e
ik·χ , k
∈ Z 2 , which form a basis of L 2 (T 2 ). We want to show therefore that for every pair k, k ∈ Z
2 we have
lim n →∞ T 2 f k (S n (χ))f k (χ) dµ(χ) = T 2
k χ dµ(χ)
T 2 f k χ dµ(χ).
(13.32) If k = k = 0 the two sides are constant and equal to 1 for every n ∈ N. It is not restrictive to assume that k = 0 which yields immediately that the right- hand side is equal to 0. On the other hand, we have f k (S(χ)) = f σ T k (χ) hence f k (S n (χ)) = f (σ T ) n k (χ) and since σ has an eigenvalue with absolute value > 1 the norm is |(σ
T ) n k | → ∞.
2 It follows that if n is sufficiently large we necessarily have (σ T
n k = k and as the basis (f k )
2 is orthonormal, then the left-hand side of (13.32) vanishes. This concludes the proof that the system is mixing. 13.6
Entropy Let (X,
A, µ, S) be a measurable dynamical system. Ergodicity and mixing give two qualitative indications of the degree of randomness (or stochasticity) of the system. An indication of quantitative type is given by the notion of entropy which we shall soon introduce. We start by considering the following situation. Let α be an experiment with m ∈ N possible mutually exclusive outcomes A 1 , . . . , A m (for example the toss of 2 Since
σ transforms the vectors of Z 2 into vectors of Z 2 and is invertible, no non-zero vector with integer components can be entirely contained in the eigenspace corresponding to the eigenvalue less than 1, because this would imply that by iterating σ a finite number of times the vector has norm less than 1, contradicting the hypothesis that it belongs to Z 2
566 Analytical mechanics 13.6 a coin m = 2 or the roll of a die m = 6). Assume that each outcome A i happens
with probability p i ∈ [0, 1]: m i =1 p i = 1. In a probability space (X, A, µ, S) this situation is described by assigning a finite partition of X = A 1 ∪ . . . ∪ A m (mod 0), A i ∈ A, A
i ∩ A
j = ∅ if i = / j, µ(A
i ) = p
i . The following definition describes the properties which must hold for a function measuring the uncertainty of the prediction of an outcome of the experiment (equivalently, the information acquired from the execution of the experiment α). Let ∆
be the (m − 1)-dimensional standard symplex of R m , given by ∆ (m)
= (x 1 , . . . , x m ) ∈ R m | x i ∈ [0, 1], m i
x i = 1 . D efinition 13.16 A family of continuous functions H (m) : ∆ (m) → [0, +∞], where m ∈ N, is called an entropy if the following properties hold: (1) symmetry: ∀ i, j ∈ {1, . . . , m} we have H (m)
(p 1 , . . . , p i , . . . , p j , . . . , p m ) = H(p
1 , . . . , p j , . . . , p i , . . . , p m );
(m) (1, 0, . . . , 0) = 0; (3) H (m)
(0, p 2 , . . . , p m ) = H
(m−1) (p 2 , . . . , p m ), ∀ m ≥ 2, ∀ (p 2 , . . . , p m ) ∈ ∆ (m−1)
; (4)
∀ (p 1 , . . . , p m ) ∈ ∆ (m)
we have H (m)
(p 1 , . . . , p m ) ≤ H (m) (1/m, . . . , 1/m) and the equality holds if and only if p i = 1/m for every i = 1, . . . , m; (5) consider (π 11 , . . . , π 1l , π
21 , . . . , π 2l , . . . , π m 1 , . . . , π ml ) ∈ ∆ (ml)
; then for every (p 1 , . . . , p m ) ∈ ∆ (m) we have H (ml) (π 1l , . . . , π 1l , π
21 , . . . , π ml )
(m) (p 1 , . . . , p m ) + m i =1 p i H (l) π i 1 p i , . . . , π il p i . Property (2) expresses the absence of uncertainty of a certain event. Property (3) means that no information is gained by impossible outcomes and (4) means that the maximal uncertainty is attained when all outcomes are equally probable. Property (5) describes the behaviour of the entropy when distinct experiments are compared. Let β be another experiment with possible outcomes B 1 , . . . , B l (i.e. another partition of (X, A, µ, S)). Let π ij be the probability of A i and B
j together. The probability of B j conditional on the fact that the outcome of α is A i is prob (B j | A
i ) = π
ij /p i (= µ(A i ∩ B j )). Clearly the uncertainty in the prediction of the outcome of the experiment β when the outcome of α is A i is measured by H (l)
(π i 1 /p i , . . . , π il /p i ). From this fact stems the requirement that (5) be satisfied. In the following, we use the simpler notation H(p 1 , . . . , p m ). T heorem 13.8 The function H(p
1 , . . . , p m ) =
− m i =1 p i log p i (13.33) 13.6 Analytical mechanics 567 (with the convention 0 log 0 = 0) is, up to a constant positive multiplier, the only function satisfying (1)–(5). Proof (see Khinchin 1957, pp. 10–13). Let H(p 1
m ) be an entropy function, and for any m set K(m) = H(1/m, . . . , 1/m). We show first of all that K(m) = +c log m, where c is a positive constant. Properties (3) and (4) imply that K is a non-decreasing function. Indeed, K(m) = H
0, 1 m , . . . , 1 m ≤ H 1 m + 1 , . . . , 1 m + 1 = K(m + 1). Consider now any two positive integers m and l. The property (5) applied to the case π ij ≡ 1/ml, p i ≡ 1/m yields K(lm) = K(m) + m i =1 1 m K(l) = K(m) + K(l), from which it follows that K(l m
Given any three integers r, n, l let m be such that l m ≤ r n ≤ l
m +1 , i.e. m n ≤ log r log l
≤ m n + 1 n . We know that mK(l) = K(l m ) ≤ K(r n ) = nK(r) ≤ K(l m +1 ) = (m + 1)K(l), from which it follows that m n
K(r) K(l)
≤ m n + 1 n , i.e.
K(r) K(l)
− log r
log l ≤ 1 n . Because of the arbitrariness of n we deduce that K(r)/ log r = K(l)/ log l and therefore K(m) = c log m, c > 0. Assume now that p 1 , . . . , p m are rational numbers. Setting the least common multiple of the denominators equal to s, we have p i = r i /s, with
m i =1 r i = s. In addition to the experiment α with outcomes A 1 , . . . , A m with respective probabil- ities p 1
m we consider an experiment β constituted by s outcomes B 1 , . . . , B s divided into m groups, each containing, respectively, r 1 , . . . , r m outcomes. We now set π ij = p i /r i = 1/s, i = 1, . . . , m, j = 1, . . . , r i . Given any outcome A i of α, we therefore have that the outcome β is the outcome of an experiment with r i equally probable outcomes, and hence H π i 1 p i , . . . , π ir i p i = c log r i
568 Analytical mechanics 13.6 and
m i =1 p i H π i 1 p i , . . . , π ir i p i = c m i =1 p i log r i = c
m i =1 p i log p i + c log s. On the other hand, H(π 11 , . . . , π mr m ) = c log s and by property (5) we have H(p 1 , . . . , p m ) = H(π
11 , . . . , π mr m
− m i =1 p i H π i 1 p i , . . . , π ir i p i = −c m i =1 p i log p
i . The continuity of H ensures that the formula (13.33), proved so far when p i ∈ Q,
is also valid when p i is a real number. Remark 13.10 H can be characterised as the ( −1/N) × logarithm of the probability of a ‘typical’ outcome of the experiment α repeated N times. Indeed, if N is large, repeating the experiment α N times one expects to observe each outcome A i approximately p i N times (this is a formulation of the so-called law of large numbers). The probability of a typical outcome containing p 1 N times A 1 , p
2 N times A 2 ,
p p 1 N 1 p p 2 N 2 . . . p
p m N m . From this it follows precisely that H(p 1 , . . . , p m ) =
− 1 N log p p 1 N 1 . . . p p m N m = − m i =1 p i log p i . Remark 13.11 The maximum value of H is attained when p i = 1/m, i = 1, . . . , m (as required by property (4)) and has value H (1/m, . . . , 1/m) = log m. We now consider how to extend the notion of entropy to measurable dynamical systems (X, A, µ, S). We introduce some notation. If α and β are two partitions of A, the joined partition α ∨ β of α and β is defined by the subsets {A ∩ B, A ∈ α, B ∈ β}. If α 1
n are partitions, we write n i
α i for the joined partition of α 1 , . . . , α n .
−1 α is the partition defined by the subsets {S −1 A, A ∈ α}. Finally, we say that a partition β is finer than α, which we denote by α < β, if ∀ B ∈ β there exists A ∈ α such that B ⊂ A. Obviously, the joined partitions are finer than the starting ones. The entropy H(α) of a partition α = {A 1
m } is given by H(α) = − m i
µ(A i ) log µ(A i ).
13.6 Analytical mechanics 569 D
be a partition. The entropy of S relative to the partition α is defined by h(S, α) := lim n →∞
n H n −1 i =0 S −i α . (13.34)
The entropy of S is h(S) := sup {h(S, α), α is a finite partition of X}. (13.35)
Remark 13.12 It is possible to prove, exploiting the strict convexity of the function x log x on R +
∨ n −1 i =0 S −i α is monotone non-increasing and non-negative. Hence h(S, α) ≥ 0 for every α. Remark 13.13 The entropy of a partition α measures the quantity of information acquired by observing the system using an instrument that distinguishes between the points of X with the resolution given by the sets of the partition {A 1 , . . . , A m } = α. For x ∈ X, consider the orbit of x up to time n − 1: x, Sx, S 2 x, . . . , S n −1 x. Since α is a partition of X, the points S i x, 0 ≤ i ≤ n − 1, belong to precisely one of the sets of the partition α: setting x 0 = x, x
i = S
i x, we have x i ∈ A
k i with k i ∈ {1, . . . , m} for every i = 0, . . . , n − 1. H ∨
−1 i =0 S −i α measures the quantity of information deduced from the know- ledge of the distribution with respect to the partition α of a segment of ‘duration’ n of the orbit. Therefore (1/n)H ∨ n −1 i =0 S −i α is the average quantity of inform- ation per unit of time and h(S, α) is the the quantity of information acquired (asymptotically) at each iteration of the dynamical system, knowing how the orbit of a point is distributed with respect to the partition α. This remark is made rigorous by the following theorem. The proof can be found in Ma˜ ne (1987). T heorem 13.9 (Shannon–Breiman–McMillan) Let (X, A, µ, S) be a measurable ergodic dynamical system, and α be a finite partition of X. Given x ∈ X let
α n (x) be the element of n −1 i =0 S −i α which contains x. Then for µ-almost every x ∈ X we have h(S, α) = lim n →∞ − 1 n log µ(α n (x)). (13.36) 570 Analytical mechanics 13.6 An interpretation of the Shannon–Breiman–McMillan theorem is the following. For an ergodic system there exists a number h such that ∀ ε > 0, if α is a sufficiently fine partition of X, then there exists a positive integer N such that for every n ≥ N there exists a subset X n of X of measure µ(X n ) > 1
−ε made of approximately e nh elements of n −1 i =0 S −i α, each of measure approximately e −nh
. If X is a compact metric space and A is the σ-algebra of Borel sets X, Brin and Katok (1983) have given an interesting topological version of the Shannon– Breiman–McMillan theorem. Let B(x, ε) be the ball of centre x ∈ X and radius ε. Assume that S : X → X is continuous and preserves the probability measure µ : A → [0, 1]. Consider B(x, ε, n) := {y ∈ X | d(S i x, S
i y) ≤ ε for every i = 0, . . . , n − 1}, i.e. B(x, ε, n) is the set of points y ∈ X whose orbit remains at a distance less than ε from that of x for at least n − 1 iterations. It is possible to prove the following. T heorem 13.10 (Brin–Katok) Assume that (X, A, µ, S) is ergodic. For µ-almost every x ∈ X we have sup ε>
lim sup n →∞ − 1 n log µ(B(x, ε, n)) = h(S). (13.37)
An interesting corollary of the previous theorem is that the entropy of the translations over tori T l is zero. Indeed, in this case d(Sx, Sy) = d(x, y) and therefore ∀ n ∈ N and ∀ ε > 0 we have B(x, ε, n) = B(x, ε), from which it follows that h(S) = 0. The same is true, more generally, if S is an isometry of the metric space (X, d). The notion of entropy allows one to distinguish between systems in terms of the ‘predictability’ of their observables. When the entropy is positive, at least part of the observables cannot be computed from the knowledge of the past history.
Chaotic systems are therefore the systems that have positive entropy. Taking into account the Brin–Katok theorem and the recurrence theorem of Poincar´ e, one sees how in chaotic systems the orbits are subject to two constraints, apparently contradicting each other. On the one hand, almost every orbit is recurrent and in the future will pass infinitely many times near the starting-point. On the other hand, the probability that two orbits remain close for a given time interval n decays exponentially as n grows. Since two orbits, that were originally close to each other, must return infinitely many times near the starting-point, they must be entirely uncorrelated if the entropy is positive, and hence they must go far and come back in different times. This complexity of motions is called chaos, and it clearly shows how difficult (or impossible) it is to compute the future values of an observable (corresponding to a function f : X → R) simply from the knowledge of its past history.
13.7 Analytical mechanics 571 13.7
Computation of the entropy. Bernoulli schemes. Isomorphism of dynamical systems In the definition of entropy h(S) of a measurable dynamical system, it is necessary to compute the supremum of h(S, α) as α varies among all finite partitions of X. This seems to exclude the practical possibility of computing h(S). In reality, this is not the case, and one can proceed in a much simpler way. In this section we identify a partition α with the σ-algebra generated by α, and ∞ i =0 S −i α with the smallest σ-algebra containing all the partitions n −1 i =0 S −i α for every n ∈ N. Recall that two σ-algebras A and B are equal (mod 0), denoted by A = B (mod 0), if
∀ A ∈ A there exists B ∈ B such that µ(A ∆ B) = 0, and vice versa. The discovery of Kolmogorov and Sinai, which makes possible the computation of the entropy overcoming the need to compute the supremum in (13.35), is that it suffices to consider the finite partitions α that generate the σ-algebra A, and
hence such that +∞ −∞ S −i α = A (mod 0) if S is invertible, or ∞ i =0 S −i α = A (mod
0) if S is not invertible. Indeed one can prove the following. T heorem 13.11 (Kolmogorov–Sinai) If α is a partition of X generating the σ-algebra A, the entropy of the measurable dynamical system (X, A, µ, S) is given by h(S) = h(S, α). (13.38) The proof of this theorem does not present special difficulties but it is tedious and will be omitted (see Ma˜ ne 1987). Among the measurable dynamical systems for which it is possible to compute the entropy, the Bernoulli schemes, which we now introduce, constitute the fundamental example of systems with strong stochastic properties. Consider the space X of infinite sequences x = (x i )
∈N , where the variable x i can only take a finite number of values which, for simplicity, we assume to be the integers {0, . . . , N − 1} (we sometimes use the notation Z N to denote the integers {0, 1, . . . , N − 1}). The space of sequences X is often denoted by Z N N
3 When we want to model an infinite sequence of outcomes of the toss of a coin (or the roll of a die) we fix N = 2 (respectively N = 6) and each possible value of x
i is equally probable. Consider on X the transformation S : X → X defined by (S(x)) i
i +1 , ∀ i ∈ N, (13.39)
usually known as a shift. 3 If instead of one-sided sequences ( x i ) i∈N ∈ Z
N N the space X is made of two-sided (doubly infinite) sequences ( x i
i∈Z ∈ Z
Z N we have a so-called bilateral Bernoulli scheme. All considerations to be developed trivially extend to the case of bilateral Bernoulli schemes. 572 Analytical mechanics 13.7 We proceed as in Example 13.2, associating with Z N a probability measure and assigning to the value j ∈ Z
N a probability equal to p j > 0, with the condition N −1 j =0 p j = 1. This choice induces a probability measure on the space of sequences X that we now describe. Consider first of all the σ-algebras A on X generated by the cylinders, i.e. the subsets of X corresponding to sequences for which a finite number of values is fixed. Given k ≥ 1 elements j 1 , . . . , j k ∈ Z
N , not necessarily distinct, and k distinct positions i 1
2
k ∈ N, the corresponding cylinder is C = C j 1 , . . . , j k i 1 , . . . , i k =
i 1 = j 1 , x
i 2 = j 2 , . . . , x i k
k }. (13.40) Therefore all sequences in X which take the prescribed values in the positions corresponding to the indices i 1 , . . . , i k belong to C. We therefore define the measure µ on A by prescribing its value on cylinders: µ C
1 , . . . , j k i
, . . . , i k = p j 1 . . . p j k . (13.41) Note that in (13.41) the positions i 1 , . . . , i k do not play any role. Hence it is immediate to deduce that if C is a cylinder, then µ(S −1 (C)) = µ(C), and recalling that the σ-algebra A is generated by cylinders we conclude that (X, A, µ, S) is a measurable dynamical system (hence that S preserves the measure µ). This system is known as a Bernoulli scheme with probability (p 0 , . . . , p N −1 ) and it is denoted by SB(p 0 , . . . , p N −1 ). We leave as an exercise the verification that a Bernoulli scheme is mixing (see Problem 10 of Section 13.12) but we show that the entropy of SB(p 0 , . . . , p N −1 ) is − N −1 i =0 p i log p i . The partition α into the cylinders C j 0 j =0,...,N −1 generates the σ-algebra A. Indeed we have α ∨ S
−1 α =
C j 0 j 1 0 1 j 0 , j 1 =0,...,N −1 , α ∨ S −1 α ∨ S −2 α =
C j 0 j 1 j 2 0 1 2 j i = 0, . . . , N − 1 i = 0, 1, 2 , 13.7 Analytical mechanics 573 and so on. The corresponding entropies are (use (13.41)): H(α) = − N −1 j =0 p j log p j , H(α ∨ S −1 α) = − N −1 j 0 =0 N −1 j 1 =0 p j 0 p j 1 log p j 0 p j 1 = − N −1 j 0 =0 (p j 0 log p j 0 ) N −1 j 1 =0 p j 1 − N −1 j 1 =0 (p j 1 log p j 1 ) N −1 j 0 =0 p j 0 = −2 N −1 j =0 p j log p j , H(α ∨ S −1 α ∨ S −2 α ) = − j 0 ,j 1 ,j 2 p j 0 p j 1 p j 2 log p j 0 p j 1 p j 2 = −3 N −1 j =0 p j log p
j , and so on. From this it follows that h(S, α) = − N −1 j =0 p j log p
j and thus the entropy of SB(p 0 , . . . , p N −1 ) also follows by the Kolmogorov–Sinai theorem. We examine again the p-adic transformation S of Example 13.5 and consider the partition α = {(j/p, (j + 1)/p)} j =0,...,p−1 . Using the fact that ∨ n i =0 S −i α =
{(j/(p n +1 ), (j + 1)/(p n +1 )) } j =0,...,p n+1
−1 it is not difficult to verify that α is a generating partition and therefore h(S) = h(S, α). On the other hand, H n
=0 S −i α = p
n +1 · p −(n+1) log p
−(n+1) = −(n + 1) log p, from which it follows that h(S, α) = log p. Note that SB (1/p, . . . , 1/p) has the same entropy. It is indeed possible to pass from one system to the other by a very easy construction. With every point ξ ∈ (0, 1) we associate the sequence x ∈ Z N p defined as follows: for every i = 0, 1, . . . we set x i
⇔ S i (ξ) ∈ j p , j + 1
p . (13.42) Denote by (X, A, µ, S) and by (X , A , µ , S ), respectively, the two 4-tuples: ((0,1), σ-algebra of Borel sets of (0,1), Lebesgue measure, p-adic transformation) and (Z N p , the σ-algebra generated by the cylinders, the measure corresponding to SB (1/p, . . . , 1/p), and the shift). In addition, denote by T : X → X the transformation defined in (13.42).
574 Analytical mechanics 13.7 The following facts are of immediate verification: (a) T is measurable; (b)
∀ A ∈ A , µ(T −1 A ) = µ (A ); (c) for µ-a.e. x ∈ X, T (S(x)) = S (T (x)); (d) T is invertible (mod 0), i.e. there exists a measurable transformation T : X → X, which preserves the measures (so that ∀ A ∈ A, µ (T −1 A) = µ(A)), such that T (T (x)) = x for µ-a.e. x ∈ X and T (T (x )) = x for µ -a.e. x ∈ X . In general, we have the following. D efinition 13.18 Let (X, A, µ, S), (X , A , µ , S ) be two measurable dynamical systems. A transformation T : X → X satisfying the conditions (a), (b), (c), (d) is called an isomorphism of measurable dynamical systems and the two systems are then isomorphic. Ergodic theory does not distinguish between isomorphic systems: two iso- morphic systems have the same ‘stochastic’ properties. It is an exercise to prove the following. T heorem 13.12 Two isomorphic systems have the same entropy. If one system is mixing, then the other is also mixing. If one system is ergodic, then the other is also ergodic. In the particular case of the Bernoulli schemes the equality of entropy is not only a necessary condition but it is also sufficient for two schemes to be isomorphic. T heorem 13.13 (Ornstein) Two Bernoulli schemes with the same entropy are isomorphic. The proof of this result goes beyond the scope of this book. Besides the original article of Ornstein (1970), see also Cornfeld et al. (1982, section 7, chapter 10). A consequence of this theorem of Ornstein is that the Bernoulli schemes are completely classified (up to isomorphism) by their entropy. The last result we quote in this section shows how the entropy also classifies the hyperbolic isomorphisms of tori (see Example 13.13): these are given by matrices σ ∈ GL(l, Z) with no eigenvalue of absolute value = 1. T heorem 13.14 (Katznelson) Every linear hyperbolic automorphism of T l is isomorphic to a Bernoulli scheme. Due to the theorem of Ornstein the classification of the ergodic properties of the automorphisms of T l is given by the entropy. It can be proved (see Walters 1982, sections 8.4 and 8.10) that if ν 1 , . . . , ν l are
the eigenvalues of the automorphism σ then h(σ) =
{i||ν i |>1} log |ν i |. (13.43)
We conclude with the definition of Bernoulli systems. 13.8 Analytical mechanics 575 D
system if it is isomorphic to a Bernoulli scheme. Bernoulli systems exhibit the most significant stochastic properties. Their equi- valence classes up to isomorphism, due to the theorem of Ornstein, are completely classified by only one invariant, the entropy. 13.8 Dispersive billiards Many important models of classical statistical mechanics are systems of point particles or rigid spheres moving freely except for the effect of elastic collisions, either with fixed obstacles or among themselves. To study the behaviour of electron gases in metals, Lorentz introduced in 1905 the following model: a point particle moves in R l subject only to elastic collisions with a distribution of infinitely many fixed rigid spheres (see Fig. 13.2). Another important model is the hard spheres gas: a system of spheres which move freely in a domain V ⊂ R
l interacting through elastic collisions between them and with the boundary ∂V of the domain (see Fig. 13.3). In all these cases, the main element of the model is the condition that the collision be elastic. This is the characterising feature of all dynamical systems of billiard type. D efinition 13.20 A billiard is a dynamical system constituted by the motion of a point particle with constant velocity inside a bounded open subset V ⊆ R
d with
piecewise smooth boundary ( C ∞ ), and with a finite number of smooth components intersecting transversally. The particle is subject to elastic reflections when it collides with ∂V (see Fig. 13.3): the incidence angle is equal to the reflection angle and the energy is conserved. Fig. 13.2 Lorentz gas.
576 Analytical mechanics 13.8
Fig. 13.3 Hard spheres gas. the ‘stadium’ Fig. 13.4 Examples of plane billiards. In our short introduction to the study of billiards we shall restrict ourselves to the plane case, when l = 2 (See Fig. 13.4 for some examples of plane billiards). This is the only case whose stochastic properties are sufficiently understood. Since the absolute value of the velocity is constant, it is possible to describe the motion using a system with discrete time. We parametrise ∂V using the natural parameter s and suppose that the length of ∂V is equal to 2π; we can then characterise x ∈ ∂V by choosing arbitrarily the origin corresponding to s = 0 and via the application S 1 s → x(s) (note that x(s + 2π) = x(s)) (see Fig. 13.5). The elastic collision with ∂V is completely described by assigning the pair (s, α) ∈ S
1 × (0, π), where x(s) is the collision point in ∂V and α is the angle formed by the reflected velocity (i.e. the velocity immediately after the collision) and the unit vector tangent to ∂V . Consider the phase space X = S 1 × (0, π) with the σ-algebra A of Borel sets, and the transformation S : X → X which associates with (s, α) the next collision point and reflection angle (s , α ). P roposition 13.1 S preserves the probability measure dµ(s, α) = 1/4π sin α ds dα. |
ma'muriyatiga murojaat qiling