Speech Processing

Course offered as elective in the final year of Bachelor of Electronics and Communication Engineering and Bachelor of Computer Engineering in Tribhuvan University.

Course Objectives:

  • To introduce the characteristics of Speech signals and the related time and frequency domain methods for speech analysis and speech compression
  • To introduce the models for speech production
  • To develop time and frequency domain techniques for estimating speech parameters
  • To introduce a predictive technique for speech compression
  • To understand speech recognition, synthesis and speaker identification.

Syllabus

Orientation slides

  1. Nature of speech signal (8 hours)
    • Speech production: Mechanism of speech production,Acoustic phonetics, Digitalmodels for speech signals,Representations of speech waveform, Sampling speechsignals, Basics of quantization, Delta modulation, Differential PCM , Auditoryperception: psychoacoustics.
  2. Time domain methods for speech processing (8 hours)
    • Time domain parameters of Speech signal, Methods for extracting the parameters, Short-time Energy, Average Magnitude, Short-time average Zero crossing Rate, Silence Discrimination using ZCR and energy , Short Time Auto Correlation Function, Pitch period estimation using AutoCorrelation Function
  3. Frequency domain method for speech processing (10 hours)
    • Short Time Fourier analysis, Fourier transform and linear filtering interpretations, Sampling rates, Spectrographic displays, Pitch and formant extraction, Analysis bySynthesis, Analysis synthesis systems, Phase vocoder, Channel Vocoder, Homomorphic speech analysis, Cepstral analysis of Speech, Formant and PitchEstimation, Homomorphic Vocoders
    • References 1 2
  4. Linear predictive analysis of speech Linear predictive analysis of speech (10 hours)
    • Basic Principles of linear predictive analysis, Auto correlation method, Covariance method, Solution of LPC equations, Cholesky method, Durbin’s Recursive algorithm
    • Application of LPC parameters, Pitch detection using LPC parameters, Formant analysis, VELP,CELP
  5. Application of speech & audio signal processing (9 hours)
    • Natural language Processing
    • Algorithms: Dynamic time warping, K-means clustering and Vector quantization, Gaussian mixture modeling, Hidden Markov modeling,
    • Automatic Speech Recognition, Feature Extraction for ASR, Deterministic sequence recognition, Statistical Sequence, Recognition, Language models, Speaker identification and verification, Voice response system
    • Speech synthesis: Basics of articulatory, Source-filter, Concatenative synthesis

References:

  1. Thomas F. Quatieri, “Discrete-Time Speech Signal Processing”, Prentice Hall /Pearson Education.
  2. Ben Gold and Nelson Morgan, “Speech and Audio Signal Processing”, John Wiley and Sons Inc.
  3. L.R.Rabiner and R.W.Schaffer, “Digital Processing of Speech signals”, Prentice Hall.
  4. L.R. Rabiner and B. H. Juang, “Fundamentals of Speech Recognition”, Prentice Hall.
  5. J.R. Deller, J.H.L. Hansen and J.G. Proakis, “Discrete Time Processing of SpeechSignals”, John Wiley, IEEE Press.
  6. J.L Flanagan, “Speech Analysis Synthesis and Perception”,Springer, Verlag. Evaluation Scheme:
  7. Digital Speech Processing Course by Rabiner
  8. Speech signal Processing IIT Kanpur

Practical:

There will be at 4-6 experiments based on following topics

Spectral analysis, Time-Frequency analysis, Pitch extraction, Formant tracking, Speech enhancement, Audio coding, Speaker recognition

All these lab works will be performed in Matlab or similar softwares capable of processing speech signals.

  1. Labsheet 1
  2. Labsheet 2
  3. Labsheet 3

MATLAB Functionality for Digital Speech Processing

Past exam papers