senselab.audio.tasks.speaker_diarization

Speaker diarization

Tutorial

Task Overview

Speaker diarization is the process of segmenting audio recordings by speaker labels, aiming to answer the question: "Who spoke when?"

Models

In senselab, we integrate pyannote.audio models for speaker diarization. These models can be explored on the Hugging Face Hub. We may integrate additional approaches for speaker diarization into the package in the future.

Evaluation

Metrics

The Diarization Error Rate (DER) is the standard metric for evaluating and comparing speaker diarization systems. It is defined as:

DER= (false alarm + missed detection + confusion) / total

where

  • false alarm is the duration of non-speech incorrectly classified as speech, missed detection
  • missed detection is the duration of speech incorrectly classified as non-speech, confusion
  • confusion is the duration of speaker confusion, and total
  • total is the sum over all speakers of their reference speech duration.

Note: DER takes overlapping speech into account. This can lead to increased missed detection rates if the speaker diarization system does not include an overlapping speech detection module.

Benchmark

You can find a benchmark of the latest pyannote.audio model's performance on various time-stamped speech datasets here.

1""".. include:: ./doc.md"""  # noqa: D415
2
3from .api import diarize_audios  # noqa: F401