senselab.audio.tasks.features_extraction

Features extraction

Task Overview

This module provides the API for extracting voice and speech features from audio recordings using the senselab package. Features span multiple speech subsystems/clincal constructs—such as fluency, respiration, phonation, articulation, and spectral characteristics—and are derived using trusted libraries including Praat-Parselmouth, OpenSMILE, Torchaudio, and Torchaudio-SQUIM.

The following table summarizes the currently supported features, categorized by speech subsystem or clinical construct, and includes a description, units, implementation reference, and implementation status.

Speech Subsystem/ Clinical construct	Feature	Description	Unit	Implementation	Implemented
Fluency	Duration	Total length of the audio recording	sec	Praat Parselmouth (docs)	✅
Fluency	Phonation Time	Length of all phonated sounds within the audio	sec	N/A	No
Fluency	Phonation Ratio	Phonation time divided by duration	--	Praat Parselmouth (docs)	✅
Fluency	Mean Phrase Duration	Average duration of a phrase (continuous speech between pauses)	sec	N/A	No
Fluency	Coefficient of Variance Phrase Duration	Normalized variability of phrase durations	sec	N/A	No
Fluency	Number of Spoken Units	Number of spoken units (phonemes, syllables, or words) identified in the audio	--	N/A	No
Fluency	Mean Unit Duration	Phonation Time divided by the number of spoken units	sec	N/A	No
Fluency	Speaking Rate	Number of spoken units divided by duration	unit sec⁻¹	Praat Parselmouth (docs)	✅
Fluency	Articulation Rate	Number of spoken units divided by phonation time	unit sec⁻¹	Praat Parselmouth (docs)	✅
Fluency	Mean Length of Run	Average number of units produced in runs between silences	--	N/A	No
Fluency	Number Pauses	Number of filled/silent pauses in a recording	unit	N/A	No
Fluency	Pause Rate	Number of pauses divided by duration	unit⁻¹	Praat Parselmouth ( docs)	✅
Fluency	Pause Ratio	Total pause time divided by audio recording duration	--	N/A	No
Fluency	Mean Pause Duration	Average duration of pauses (filled/silent)	sec	Praat Parselmouth ( docs)	✅
Fluency	Coefficient of Variance Pause Duration	Normalized variability in pause durations	--	N/A	No
Fluency	Mean Phone Length	Average duration of phones	sec	N/A	No
Fluency	Phoneme-Dependent Duration	Linear combination of average phone durations	sec	N/A	No
Fluency	Voice Onset Time (VOT)	Time between release of a stop consonant and the onset of vocal fold vibration	sec	N/A	No
Fluency	Maximum Phonation Time	Maximum duration of a continuous phonation (usually a vowel)	sec	N/A	No
Fluency	Pairwise Variability Index	Temporal variability between successive speech unit intervals	--	N/A	No
Respiration	Intensity	Sum of the squares of the signal amplitude (approximates loudness)	dB	Praat Parselmouth (docs)	✅
Respiration	Intensity Range	Range of loudness values in a speech signal	--	Praat Parselmouth ( docs)	✅
Respiration	Voice Range Profile (VRP)	Minimum and maximum intensity across a set of frequencies	dBHz⁻¹	N/A	No
Respiration	Number of Breath Events	Count of inhalations in a recording	--	N/A	No
Respiration	Speech Respiration Rate	Respiratory rate during speech	unit⁻¹	N/A	No
Respiration	Speech Tidal Volume	Amount of air inhaled during a typical breath for speech	mL	N/A	No
Respiration	Pause Intervals per Respiration	Measure of breathing periodicity	--	N/A	No
Respiration	Relative Loudness of Respiration	Ratio of respiration loudness relative to speech intensity	--	N/A	No
Respiration	Respiratory Exchange Latency	Time interval between expiration and the subsequent inspiration	s	N/A	No
Phonation	Fundamental Frequency (F0)	Rate of vocal-fold vibration (perceived as pitch)	Hz	Praat Parselmouth (docs)	✅
Phonation	Pitch Sigma	Standard deviation of F0, expressed in semitones	Semitones	N/A	No
Phonation	Jitter (Absolute)	Average absolute difference between consecutive F0 periods	sec	Praat Parselmouth (docs)	✅
Phonation	Jitter (Relative)	Absolute jitter divided by the average F0 period	%	Praat Parselmouth (docs)	✅
Phonation	Shimmer (local)	Average absolute amplitude difference between consecutive F0 periods (relative measure)	%	Praat Parselmouth (docs)	✅
Phonation	Shimmer (dB)	Difference in amplitude between consecutive F0 periods, expressed in dB	dB	Praat Parselmouth (docs)	✅
Phonation	Harmonic to Noise Ratio	Ratio of harmonic energy to noise energy in voiced segments	dB	Praat Parselmouth (docs)	✅
Phonation	Percentage of Unvoiced Frames	Fraction of pitch frames detected as unvoiced	%	N/A	No
Phonation	Number of Voice Breaks	Count of interruptions in the fundamental period during sustained phonation	--	N/A	No
Phonation	Degree of Voice Breaks	Total duration of voice breaks relative to total signal duration	%	N/A	No
Phonation	Hammarberg Index	Difference between dominant frequencies in two spectral ranges (0–2000 Hz and 2000–5000 Hz)	Hz	N/A	No
Phonation	Spectral Slope	Slope of the long-term average spectrum	dB/octave	Praat Parselmouth (docs)	✅
Phonation	Spectral Tilt	Tilt of the regression line through the long-term average spectrum	--	Praat Parselmouth (docs)	✅
Phonation	Cepstral Peak Prominence	Integrative measure of temporal aperiodicity and spectral variation	dB	Praat Parselmouth (docs)	✅
Phonation	H1–H2	Difference between the levels of the first two harmonics	dB	N/A	No
Phonation	H1-H2	Difference between the first two harmonics after removing formant influence	dB	N/A	No
Phonation	Harmonic Richness Factor	Amplitude relationship between the fundamental and higher harmonics	dB	N/A	No
Phonation	Parabolic Spectral Parameter	Quantifies the spectral decay of the voice source	--	N/A	No
Phonation	Open Quotient	Ratio of the open phase of the glottal pulse to the fundamental period	--	N/A	No
Phonation	Closing Quotient	Ratio of the glottal closing phase to the fundamental period	--	N/A	No
Phonation	Speed Quotient	Ratio between the durations of glottal opening and closing phases	--	N/A	No
Phonation	Normalized Amplitude Quotient	Ratio between the amplitude of the airflow and the peak flow derivative, normalized by period length	--	N/A	No
Articulation	Formant Frequencies	Center frequencies of vocal tract resonance peaks	Hz	Praat Parselmouth (docs)	✅
Articulation	Formant Bandwidths	Width of the spectral peak (3 dB down from the resonance peak)	Hz	Praat Parselmouth (docs)	✅
Articulation	Formant Slopes	Rate of change in formant frequencies over time	Hz/ms	N/A	No
Articulation	Vocal Tract Coordination	Cross-correlation between formant trajectories at set time delays	--	N/A	No
Articulation	Vowel Space Area	Area of the quadrilateral defined by the four corner vowels in the F1–F2 space	--	N/A	No
Articulation	Formant Centralization Ratio (FCR)	Ratio combining F1 and F2 values of corner vowels (/a/, /u/, /i/) as defined in the literature	--	N/A	No
Articulation	Vowel Articulation Index (VAI)	Reciprocal of the Formant Centralization Ratio	--	N/A	No
Articulation	Goodness of Pronunciation	Posterior probabilities from an acoustic model reflecting pronunciation quality	--	N/A	No
Articulation	Wideband Perceptual Estimation of Speech Quality (PESQ)	Objective measure of speech quality based on perceptual modeling	--	Torchaudio-SQUIM (docs)	✅
Articulation	Short-Time Objective Intelligibility (STOI)	Predicts speech intelligibility by comparing short-time temporal envelopes of reference and degraded signals	--	Torchaudio-SQUIM (docs)	✅
Articulation	Scale-Invariant Signal-to-Distortion Ratio (SI-SDR)	Signal fidelity measure that is invariant to signal scale	dB	Torchaudio-SQUIM (docs)	✅
Articulation	Mean Opinion Score (MOS)	Subjective estimate of audio quality rated by a neural network model trained on human ratings	--	Torchaudio-SQUIM (docs)	✅
Spectral	Spectral Gravity	Spectral centroid (center of gravity) of the signal	Hz	Praat Parselmouth (docs)	✅
Spectral	Spectral Deviation	Spread of spectral energy around the centroid (second moment)	Hz	Praat Parselmouth (docs)	✅
Spectral	Spectral Skewness	Asymmetry of the spectral energy distribution (third moment)	Hz	Praat Parselmouth (docs)	✅
Spectral	Spectral Kurtosis	Flatness (peakedness) of the spectral distribution (fourth moment)	Hz	Praat Parselmouth (docs)	✅
Spectral	Mel Frequency Cepstral Coefficients	Multivariate spectral representation based on the Mel frequency scale	--	Torchaudio (docs)	✅
Spectral	Linear Predictive Cepstral Coefficients	Cepstral coefficients derived through Linear Predictive Coding	--	N/A	No
Spectral	Perceptual Linear Prediction	Spectral representation based on the Bark scale with equal-loudness pre-emphasis	--	N/A	No

Beyond the descriptors listed below, users can extract additional acoustic representations such as:

Note: This section is actively under development. Coming updates will address usability, efficiency, clarity, robustness, and overall effectiveness. We welcome any feedback—feel free to reach out via email at fabiocat@mit.edu or open an issue on GitHub.

View Source

1""".. include:: ./doc.md"""  # noqa: D415
2
3from .api import extract_features_from_audios  # noqa: F401