senselab
senselab
This Python package streamlines, optimizes, and enforces best open-science practices for processing and analyzing _behavioral data_ (primarily voice and speech, but also text and video) using robust reproducible pipelines and utilities.
Quick start
from senselab.audio.data_structures import Audio
from senselab.audio.tasks.preprocessing import resample_audios
from senselab.audio.tasks.features_extraction import extract_features_from_audios
from senselab.audio.tasks.speech_to_text import transcribe_audios
audio = Audio(filepath='path_to_audio_file.wav')
print(audio.sampling_rate)
# ➡️ 44100
[resampled_audio] = resample_audios([audio], resample_rate=16000)
print(resampled_audio.sampling_rate)
# ➡️ 16000
audio_features = extract_features_from_audios([audio])
print(audio_features[0].keys())
# ➡️ dict_keys(['opensmile', 'praat_parselmouth', 'torchaudio', 'torchaudio_squim', ...])
transcript = transcribe_audios([audio])
print(transcript)
# ➡️ "The quick brown fox jumps over the lazy dog."
For more detailed information, check out our Documentation and our Tutorials.
💡 Tip: Many tutorials include Google Colab badges and you can try them instantly without installing anything on your local machine.
Why should you use senselab
?
- Modular design: Easily integrate or use standalone transformations for flexible data manipulation.
- Pre-built pipelines: Access pre-configured pipelines to reduce setup time and effort.
- Reproducibility: Ensure consistent and verifiable results with fixed seeds and version-controlled steps.
- Easy integration: Seamlessly fit into existing workflows with minimal configuration.
- Extensible: Modify and contribute custom transformations and pipelines to meet specific research needs.
- Comprehensive documentation: Detailed guides, examples, and documentation for all features and modules.
- Performance optimized: Efficiently process large datasets with optimized code and algorithms.
- Interactive examples: Jupyter notebooks provide practical examples for deriving insights from real-world datasets.
- senselab AI: Interact with your data through an AI-based chatbot. The AI agent generates and runs senselab-based code for you, making exploration easier and giving you both the results and the code used to produce them (perfect for quick experiments or for users who prefer not to code).
⚠️ System Requirements
If on macOS, this package requires an ARM64 architecture due to PyTorch 2.2.2+ dropping support for x86-64 on macOS.
❌ Unsupported systems include:
- macOS (Intel x86-64)
- Other platforms where dependencies are unavailable
To check your system compatibility, please run this command:
python -c "import platform; print(platform.machine())"
If the output is:
arm64
→ ✅ Your system is compatible.x86_64
→ ❌ Your system is not supported.
If you attempt to install this package on an unsupported system, the installation or execution will fail.
FFmpeg
is required by some audio and video dependencies (e.g.,torchaudio
). Please make sure you haveFFmpeg
properly installed on your machine before installing and usingsenselab
(see here for detailed platform-dependent instructions).CUDA libraries matching the CUDA version expected by the PyTorch wheels (e.g., the latest pytorch 2.8 expects cuda-12.8). To install those with conda, please do:
conda config --add channels nvidia
conda install -y nvidia/label/cuda-12.8.1::cuda-libraries-dev
- Docker is required and must be running for some video models (e.g., MediaPipe-based estimators). Please follow the official installation instructions for your platform: Install Docker.
Some functionalities rely on HuggingFace models, and increasingly, models require authentication and signed license agreements. Instructions on how to generate a Hugging Face access token can be found here: https://huggingface.co/docs/hub/security-tokens
You can provide your HuggingFace token either by exporting it in your shell:
export HF_TOKEN=your_token_here
or by adding it to your
.env
file (see.env.example
for reference).
Installation
Install this package via:
pip install 'senselab[all]'
Or get the newest development version via:
pip install 'git+https://github.com/sensein/senselab.git#egg=senselab[all]'
If you want to install only audio dependencies, you do:
pip install 'senselab[audio]'
To install articulatory, video, text, and senselab-ai extras, please do:
pip install 'senselab[articulatory,video,text,senselab-ai]'
senselab AI (our AI-based chatbot)
Development (with poetry)
poetry install --extras "senselab-ai"
poetry run senselab-ai
Production (with pip)
pip install 'senselab[senselab-ai]'
senselab-ai
Once started, you can open the provided JupyterLab interface, setup the agent and chat with it, and let it create and execute code for you.
For a walkthrough, see: tutorials/senselab-ai/senselab_ai_intro.ipynb
.
Contributing
We welcome contributions from the community! Before proceeding with that, please review our CONTRIBUTING.md.
Funding
senselab
is mostly supported by the following organizations and initiatives:
- McGovern Institute ICON Fellowship
- NIH Bridge2AI Precision Public Health (OT2OD032720)
- Child Mind Institute
- ReadNet Project
- Chris and Lann Woehrle Psychiatric Fund
Acknowledgments
senselab
builds on the work of many open-source projects. We gratefully acknowledge the developers and maintainers of the following key dependencies:
- PyTorch, Torchvision, Torchaudio _deep learning framework and audio/vision extensions_
- Transformers, Datasets, Accelerate, Huggingface Hub _training and inference utilities plus (pre-)trained models and datasets_
- Scikit-learn, UMAP-learn _machine learning utilities_
- Matplotlib _visualization toolkit_
- Praat-Parselmouth, OpenSMILE, SpeechBrain, SPARC, Pyannote-audio, Coqui-TTS, NVIDIA NeMo, Vocos, Audiomentations, Torch-audiomentations _speech and audio processing tools_
- NLTK, Sentence-Transformers, Pylangacq, Jiwer _text and language processing tools_
- OpenCV, Ultralytics, mediapipe, Python-ffmpeg, AV _computer vision and pose estimation_
- Pydra, Pydantic, Iso639, PyCountry, Nest-asyncio _workflow, validation, and utilities_
- Ipywidgets, IpKernel, Nbformat, Nbss-upload, Notebook-intelligence _Jupyter and notebook-related tools_
We are thankful to the open-source community for enabling this project! 🙏
1""".. include:: ../../README.md""" # noqa: D415 2 3import asyncio 4import platform 5from multiprocessing import set_start_method 6 7import nest_asyncio 8 9# Raise error on incompatible macOS architecture 10if platform.system() == "Darwin" and platform.machine() != "arm64": 11 raise RuntimeError( 12 "Error: This package requires an ARM64 architecture on macOS " 13 "since PyTorch 2.2.2+ does not support x86-64 on macOS." 14 ) 15 16 17# Conditionally apply nest_asyncio to avoid uvloop conflict 18def safe_apply_nest_asyncio() -> None: 19 """Apply nest_asyncio to avoid uvloop conflict.""" 20 try: 21 loop = asyncio.get_event_loop() 22 if "uvloop" not in str(type(loop)): 23 nest_asyncio.apply() 24 except Exception as e: 25 print(f"nest_asyncio not applied: {e}") 26 27 28safe_apply_nest_asyncio() 29 30from senselab.utils.data_structures.pydra_helpers import * # NOQA 31 32# Ensure multiprocessing start method is 'spawn' 33try: 34 set_start_method("spawn", force=True) 35except RuntimeError: 36 pass # Method already set
19def safe_apply_nest_asyncio() -> None: 20 """Apply nest_asyncio to avoid uvloop conflict.""" 21 try: 22 loop = asyncio.get_event_loop() 23 if "uvloop" not in str(type(loop)): 24 nest_asyncio.apply() 25 except Exception as e: 26 print(f"nest_asyncio not applied: {e}")
Apply nest_asyncio to avoid uvloop conflict.