So far: - So far:
- Historical overview of speech technology basic components/goals for systems
- Quick review of DSP fundamentals
- Quick overview of pattern recognition basics
- Next talk focuses on the nature of the signal:
- Acoustic waves in small spaces (sources)
- Acoustic waves in large spaces (rooms)
Acoustic waves - a brief intro - A way to bridge from thinking about EE to thinking about acoustics:
- Acoustic signals are like electrical ones, only much slower …
- Pressure is like voltage
- Volume velocity is like current (and impedance = Pressure/velocity)
- For wave solutions, c is a lot smaller
- To analyze, look at constrained models of common structures: strings and tubes
- is the wave equation for transverse vibration on a string
- Where c can be derived from the properties of the medium, and is the wave propagation speed
- Solutions dependent on boundary conditions
- Assume form f(t - x/c) for positive x direction
- Then f(t + x/c) for negative x direction
- Sum is A f(t - x/c) + B f(t +x/c)
- Uniform tube, source on one end, open on the other
- Plane wave propagation for frequencies below ~4000 Hz
- By looking at the solutions to this equation, we can show that c is the speed of sound
- u(0,t) = ej t = A e j(t - 0/c) - B e j(t + 0/c)
- Let u+(t - x/c) = A e j(t - x/c) and u-(t + x/c) = B e j(t + x/c)
u(0,t) = ej t = A e j(t - 0/c) - B e j(t + 0/c) - Now you can get equation 10.24 in text, for excitation U() ej t :
- p(L,t) = 0 = A e j(t - L/c) + B e j(t + L/c)
- Problem: Find A and B to match boundary conditions
- Solve for A and B (eliminate t)
- u(x,t) = cos [(L-x)/c] U() ej t
- (upcoming homework problem)
- c = 340 m/s L = 17cm 4L = .68 m
First 3 modes of an acoustic tube open at one end Effect of losses in the tube Effect of nonuniformities in the tube - Impedance mismatches cause reflections
- Can be modeled as a succession of smaller tubes
- Resonances move around - hence the different formants for different speech sounds
Acoustic reverberation - Reflection vs absorption at room surfaces
- Effects tend to be more important than room modes for speech intelligibility
- Also very important for musical clarity, tone
- (uniformly distributed and diffuse)
- Decay of intensity
- when source is shut off (W=0)
- The phrase “two oh six” convolved with impulse
- response from .5 second RT60 room
- Initial time delay gap = t0
Measuring room responses - Impulsive sounds
- Correlation of mic input with random signal source (since R(x,y) = R(x,x) * h(t) )
- Chirp input
- Also includes mic, speaker responses
- No single room response (also not really linear)
Effects of reverb - Increases loudness
- “Early” loudness increase helps intelligibility
- “Late” loudness increase hurts intelligibility
- When noise is present, ill effects compounded
- Even worse for machine algorithms
Dealing with reverb - Microphone arrays - beamforming
- Reducing effects by subtraction/filtering
- Stereo mic transfer function
- Using robust features (for ASR especially)
- Statistical adaptation
Artificial reverberation - Physical devices (springs, plate, etc.)
- Simple electronic delay with feedback
- FIR for early delays (think of “initial time delay gap” in concert halls), IIR for later decay
- Explicit convolution with stored response
Do'stlaringiz bilan baham: |