-
public final class VadCreated by Georgiy Konovalov on 6/1/2023.
The WebRTC VAD algorithm, based on GMM, analyzes the audio signal to determine whether it contains speech or non-speech segments.
The WebRTC VAD supports the following parameters:
Sample Rates:
8000Hz, 16000Hz, 32000Hz, 48000HzFrame Sizes (per sample rate):
For 8000Hz: 80, 160, 240 For 16000Hz: 160, 320, 480 For 32000Hz: 320, 640, 960 For 48000Hz: 480, 960, 1440Mode:
NORMAL, LOW_BITRATE, AGGRESSIVE, VERY_AGGRESSIVEPlease note that the VAD class supports these specific combinations of sample rates and frame sizes, and the classifiers determine the aggressiveness of the voice activity detection algorithm.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description public classVad.Companion
-
Method Summary
Modifier and Type Method Description final VadsetSampleRate(SampleRate sampleRate)Set, retrieve and validate sample rate for Vad Model. final VadsetFrameSize(FrameSize frameSize)Set, retrieve and validate frame size for Vad Model. final VadsetMode(Mode mode)Set and retrieve detection mode for Vad model. final VadsetSpeechDurationMs(Integer speechDurationMs)Set the minimum duration in milliseconds for speech segments. final VadsetSilenceDurationMs(Integer silenceDurationMs)Set the minimum duration in milliseconds for silence segments. final VadWebRTCbuild()Builds and returns a VadModel instance based on the specified parameters. -
-
Method Detail
-
setSampleRate
final Vad setSampleRate(SampleRate sampleRate)
Set, retrieve and validate sample rate for Vad Model.
Valid Sample Rates:
8000Hz, 16000Hz, 32000Hz, 48000Hz- Parameters:
sampleRate- is required for processing audio input.
-
setFrameSize
final Vad setFrameSize(FrameSize frameSize)
Set, retrieve and validate frame size for Vad Model.
Valid Frame Sizes (per sample rate):
For 8000Hz: 80, 160, 240 For 16000Hz: 160, 320, 480 For 32000Hz: 320, 640, 960 For 48000Hz: 480, 960, 1440- Parameters:
frameSize- is required for processing audio input.
-
setMode
final Vad setMode(Mode mode)
Set and retrieve detection mode for Vad model.
Mode:
NORMAL, LOW_BITRATE, AGGRESSIVE, VERY_AGGRESSIVE- Parameters:
mode- is required for processing audio input.
-
setSpeechDurationMs
final Vad setSpeechDurationMs(Integer speechDurationMs)
Set the minimum duration in milliseconds for speech segments. The value of this parameter will define the necessary and sufficient duration of positive results to recognize result as speech. This parameter is optional.
Permitted range (0ms >= speechDurationMs <= 300000ms).
Parameters used for {@link VadSilero.isSpeech}.
- Parameters:
speechDurationMs- minimum duration in milliseconds for speech segments.
-
setSilenceDurationMs
final Vad setSilenceDurationMs(Integer silenceDurationMs)
Set the minimum duration in milliseconds for silence segments. The value of this parameter will define the necessary and sufficient duration of negative results to recognize it as silence. This parameter is optional.
Permitted range (0ms >= silenceDurationMs <= 300000ms).
Parameters used in {@link VadSilero.isSpeech}.
- Parameters:
silenceDurationMs- minimum duration in milliseconds for silence segments.
-
-
-
-