senselab.audio.tasks.speech_to_text_evaluation

Speech to text evaluation

Overview

Evaluating speech transcripts involves comparing a predicted transcript (hypothesis) generated by an automated speech recognition (ASR) system against a ground truth transcript (reference). This evaluation helps to determine the accuracy and performance of the ASR system. Various metrics can be used for this purpose, each capturing different aspects of the errors in the transcription process.

Key metrics

1. Word Error Rate (WER)

Word Error Rate (WER) is the most common metric used to evaluate the accuracy of ASR systems. It measures the number of word-level errors (insertions, deletions, and substitutions) divided by the total number of words in the reference. The lower the value, the better the performance of the ASR system, with an error rate of 0 indicating a perfect score. Notably, this is not a percentage as can go higher than 1.

Formula:

WER = (S + D + I) / N

Where:

( S ) = Number of substitutions
( D ) = Number of deletions
( I ) = Number of insertions
( N ) = Total number of words in the reference

2. Character Error Rate (CER)

Character Error Rate (CER) is similar to WER but operates at the character level rather than the word level. It measures the number of character-level errors divided by the total number of characters in the reference. The lower the value, the better the performance of the ASR system, with an error rate of 0 indicating a perfect score. As for WER, CER is not a percentage as can go higher than 1.

Formula:

CER = (S + D + I) / N

Where:

( S ) = Number of substitutions
( D ) = Number of deletions
( I ) = Number of insertions
( N ) = Total number of characters in the reference

3. Match Error Rate (MER)

Match Error Rate (MER) calculates the proportion of word-level errors relative to the total number of correct matches plus the number of errors. The lower the value, the better the performance of the ASR system, with an error rate of 0 indicating a perfect score.

Formula:

MER = (S + D + I) / (S + D + I + C)

Where:

( S ) = Number of substitutions
( D ) = Number of deletions
( I ) = Number of insertions
( C ) = Number of correct words

4. Word Information Lost (WIL)

Word Information Lost (WIL) is computed as follows:

Formula:

WIL = 1 - (C / N) + (C / P)

Where:

( C ) = Number of correct words
( N ) = Number of words in the reference
( P ) = Number of words in the prediction

The lower the WIL, the better the performance of the ASR system.

5. Word Information Preserved (WIP)

Word Information Preserved (WIP) is computed as follows: Formula:

WIP = (C / N) * (C / P)

Where:

( C ) = Number of correct words
( N ) = Number of words in the reference
( P ) = Number of words in the prediction

The higher the value, the better the performance of the ASR system, with 1 being a perfect score.

Examples

For a given reference "hello world" and hypothesis "hello duck":

WER (Word Error Rate):
```
WER = (0 + 1 + 0)/(2) = 0.5
```
CER (Character Error Rate):
```
CER = (0 + 1 + 0)/(11) = 0.0909
```
MER (Match Error Rate):
```
MER = (1)/(2) = 0.5
```

View Source

1""".. include:: ./doc.md"""  # noqa: D415
2
3from .utils import calculate_cer, calculate_mer, calculate_wer, calculate_wil, calculate_wip  # noqa: F401