senselab.audio.tasks.features_extraction

Features extraction

Task Overview

This module provides the API for extracting voice and speech features from audio recordings using the senselab package. Features span multiple speech subsystems/clincal constructs—such as fluency, respiration, phonation, articulation, and spectral characteristics—and are derived using trusted libraries including Praat-Parselmouth, OpenSMILE, Torchaudio, and Torchaudio-SQUIM.

The following table summarizes the currently supported features, categorized by speech subsystem or clinical construct, and includes a description, units, implementation reference, and implementation status.

Speech Subsystem/ Clinical construct Feature Description Unit Implementation Implemented
Fluency Duration Total length of the audio recording sec Praat Parselmouth (docs)
Fluency Phonation Time Length of all phonated sounds within the audio sec N/A No
Fluency Phonation Ratio Phonation time divided by duration -- Praat Parselmouth (docs)
Fluency Mean Phrase Duration Average duration of a phrase (continuous speech between pauses) sec N/A No
Fluency Coefficient of Variance Phrase Duration Normalized variability of phrase durations sec N/A No
Fluency Number of Spoken Units Number of spoken units (phonemes, syllables, or words) identified in the audio -- N/A No
Fluency Mean Unit Duration Phonation Time divided by the number of spoken units sec N/A No
Fluency Speaking Rate Number of spoken units divided by duration unit sec⁻¹ Praat Parselmouth (docs)
Fluency Articulation Rate Number of spoken units divided by phonation time unit sec⁻¹ Praat Parselmouth (docs)
Fluency Mean Length of Run Average number of units produced in runs between silences -- N/A No
Fluency Number Pauses Number of filled/silent pauses in a recording unit N/A No
Fluency Pause Rate Number of pauses divided by duration unit⁻¹ Praat Parselmouth ( docs)
Fluency Pause Ratio Total pause time divided by audio recording duration -- N/A No
Fluency Mean Pause Duration Average duration of pauses (filled/silent) sec Praat Parselmouth ( docs)
Fluency Coefficient of Variance Pause Duration Normalized variability in pause durations -- N/A No
Fluency Mean Phone Length Average duration of phones sec N/A No
Fluency Phoneme-Dependent Duration Linear combination of average phone durations sec N/A No
Fluency Voice Onset Time (VOT) Time between release of a stop consonant and the onset of vocal fold vibration sec N/A No
Fluency Maximum Phonation Time Maximum duration of a continuous phonation (usually a vowel) sec N/A No
Fluency Pairwise Variability Index Temporal variability between successive speech unit intervals -- N/A No
Respiration Intensity Sum of the squares of the signal amplitude (approximates loudness) dB Praat Parselmouth (docs)
Respiration Intensity Range Range of loudness values in a speech signal -- Praat Parselmouth ( docs)
Respiration Voice Range Profile (VRP) Minimum and maximum intensity across a set of frequencies dBHz⁻¹ N/A No
Respiration Number of Breath Events Count of inhalations in a recording -- N/A No
Respiration Speech Respiration Rate Respiratory rate during speech unit⁻¹ N/A No
Respiration Speech Tidal Volume Amount of air inhaled during a typical breath for speech mL N/A No
Respiration Pause Intervals per Respiration Measure of breathing periodicity -- N/A No
Respiration Relative Loudness of Respiration Ratio of respiration loudness relative to speech intensity -- N/A No
Respiration Respiratory Exchange Latency Time interval between expiration and the subsequent inspiration s N/A No
Phonation Fundamental Frequency (F0) Rate of vocal-fold vibration (perceived as pitch) Hz Praat Parselmouth (docs)
Phonation Pitch Sigma Standard deviation of F0, expressed in semitones Semitones N/A No
Phonation Jitter (Absolute) Average absolute difference between consecutive F0 periods sec Praat Parselmouth (docs)
Phonation Jitter (Relative) Absolute jitter divided by the average F0 period % Praat Parselmouth (docs)
Phonation Shimmer (local) Average absolute amplitude difference between consecutive F0 periods (relative measure) % Praat Parselmouth (docs)
Phonation Shimmer (dB) Difference in amplitude between consecutive F0 periods, expressed in dB dB Praat Parselmouth (docs)
Phonation Harmonic to Noise Ratio Ratio of harmonic energy to noise energy in voiced segments dB Praat Parselmouth (docs)
Phonation Percentage of Unvoiced Frames Fraction of pitch frames detected as unvoiced % N/A No
Phonation Number of Voice Breaks Count of interruptions in the fundamental period during sustained phonation -- N/A No
Phonation Degree of Voice Breaks Total duration of voice breaks relative to total signal duration % N/A No
Phonation Hammarberg Index Difference between dominant frequencies in two spectral ranges (0–2000 Hz and 2000–5000 Hz) Hz N/A No
Phonation Spectral Slope Slope of the long-term average spectrum dB/octave Praat Parselmouth (docs)
Phonation Spectral Tilt Tilt of the regression line through the long-term average spectrum -- Praat Parselmouth (docs)
Phonation Cepstral Peak Prominence Integrative measure of temporal aperiodicity and spectral variation dB Praat Parselmouth (docs)
Phonation H1–H2 Difference between the levels of the first two harmonics dB N/A No
Phonation H1-H2 Difference between the first two harmonics after removing formant influence dB N/A No
Phonation Harmonic Richness Factor Amplitude relationship between the fundamental and higher harmonics dB N/A No
Phonation Parabolic Spectral Parameter Quantifies the spectral decay of the voice source -- N/A No
Phonation Open Quotient Ratio of the open phase of the glottal pulse to the fundamental period -- N/A No
Phonation Closing Quotient Ratio of the glottal closing phase to the fundamental period -- N/A No
Phonation Speed Quotient Ratio between the durations of glottal opening and closing phases -- N/A No
Phonation Normalized Amplitude Quotient Ratio between the amplitude of the airflow and the peak flow derivative, normalized by period length -- N/A No
Articulation Formant Frequencies Center frequencies of vocal tract resonance peaks Hz Praat Parselmouth (docs)
Articulation Formant Bandwidths Width of the spectral peak (3 dB down from the resonance peak) Hz Praat Parselmouth (docs)
Articulation Formant Slopes Rate of change in formant frequencies over time Hz/ms N/A No
Articulation Vocal Tract Coordination Cross-correlation between formant trajectories at set time delays -- N/A No
Articulation Vowel Space Area Area of the quadrilateral defined by the four corner vowels in the F1–F2 space -- N/A No
Articulation Formant Centralization Ratio (FCR) Ratio combining F1 and F2 values of corner vowels (/a/, /u/, /i/) as defined in the literature -- N/A No
Articulation Vowel Articulation Index (VAI) Reciprocal of the Formant Centralization Ratio -- N/A No
Articulation Goodness of Pronunciation Posterior probabilities from an acoustic model reflecting pronunciation quality -- N/A No
Articulation Wideband Perceptual Estimation of Speech Quality (PESQ) Objective measure of speech quality based on perceptual modeling -- Torchaudio-SQUIM (docs)
Articulation Short-Time Objective Intelligibility (STOI) Predicts speech intelligibility by comparing short-time temporal envelopes of reference and degraded signals -- Torchaudio-SQUIM (docs)
Articulation Scale-Invariant Signal-to-Distortion Ratio (SI-SDR) Signal fidelity measure that is invariant to signal scale dB Torchaudio-SQUIM (docs)
Articulation Mean Opinion Score (MOS) Subjective estimate of audio quality rated by a neural network model trained on human ratings -- Torchaudio-SQUIM (docs)
Spectral Spectral Gravity Spectral centroid (center of gravity) of the signal Hz Praat Parselmouth (docs)
Spectral Spectral Deviation Spread of spectral energy around the centroid (second moment) Hz Praat Parselmouth (docs)
Spectral Spectral Skewness Asymmetry of the spectral energy distribution (third moment) Hz Praat Parselmouth (docs)
Spectral Spectral Kurtosis Flatness (peakedness) of the spectral distribution (fourth moment) Hz Praat Parselmouth (docs)
Spectral Mel Frequency Cepstral Coefficients Multivariate spectral representation based on the Mel frequency scale -- Torchaudio (docs)
Spectral Linear Predictive Cepstral Coefficients Cepstral coefficients derived through Linear Predictive Coding -- N/A No
Spectral Perceptual Linear Prediction Spectral representation based on the Bark scale with equal-loudness pre-emphasis -- N/A No

Beyond the descriptors listed below, users can extract additional acoustic representations such as:

Note: This section is actively under development. Coming updates will address usability, efficiency, clarity, robustness, and overall effectiveness. We welcome any feedback—feel free to reach out via email at fabiocat@mit.edu or open an issue on GitHub.

1""".. include:: ./doc.md"""  # noqa: D415
2
3from .api import extract_features_from_audios  # noqa: F401