-
- All Implemented Interfaces:
-
java.io.Closeable,java.lang.AutoCloseable
public final class VadWebRTC implements Closeable
Created by Georgiy Konovalov on 6/1/2023.
The WebRTC VAD algorithm, based on GMM, analyzes the audio signal to determine whether it contains speech or non-speech segments.
The WebRTC VAD supports the following parameters:
Sample Rates:
8000Hz, 16000Hz, 32000Hz, 48000HzFrame Sizes (per sample rate):
For 8000Hz: 80, 160, 240 For 16000Hz: 160, 320, 480 For 32000Hz: 320, 640, 960 For 48000Hz: 480, 960, 1440Mode:
NORMAL, LOW_BITRATE, AGGRESSIVE, VERY_AGGRESSIVEPlease note that the VAD class supports these specific combinations of sample rates and frame sizes, and the classifiers determine the aggressiveness of the voice activity detection algorithm.
-
-
Field Summary
Fields Modifier and Type Field Description private Map<SampleRate, Set<FrameSize>>supportedParametersprivate SampleRatesampleRateprivate FrameSizeframeSizeprivate Modemodeprivate IntegerspeechDurationMsprivate IntegersilenceDurationMs
-
Method Summary
Modifier and Type Method Description final Map<SampleRate, Set<FrameSize>>getSupportedParameters()Valid sample rates and frame sizes for WebRTC VAD GMM model. final UnitsetSupportedParameters(Map<SampleRate, Set<FrameSize>> supportedParameters)final SampleRategetSampleRate()Set, retrieve and validate sample rate for Vad Model. final UnitsetSampleRate(SampleRate sampleRate)final FrameSizegetFrameSize()Set, retrieve and validate frame size for Vad Model. final UnitsetFrameSize(FrameSize frameSize)final ModegetMode()Set and retrieve detection mode for Vad model. final UnitsetMode(Mode mode)final IntegergetSpeechDurationMs()Set, retrieve and validate speechDurationMs for Vad Model. final UnitsetSpeechDurationMs(Integer speechDurationMs)final IntegergetSilenceDurationMs()Set, retrieve and validate silenceDurationMs for Vad Model. final UnitsetSilenceDurationMs(Integer silenceDurationMs)final BooleanisSpeech(ShortArray audioData)Determines if the provided audio data contains speech. final BooleanisSpeech(ByteArray audioData)Determines if the provided audio data contains speech. final BooleanisSpeech(FloatArray audioData)Determines if the provided audio data contains speech. Unitclose()Closes the WebRTC VAD and releases all associated resources. -
-
Constructor Detail
-
VadWebRTC
VadWebRTC(SampleRate sampleRate, FrameSize frameSize, Mode mode, Integer speechDurationMs, Integer silenceDurationMs)
- Parameters:
sampleRate- is required for processing audio input.frameSize- is required for processing audio input.mode- is required for the VAD model.speechDurationMs- is minimum duration in milliseconds for speech segments (optional).silenceDurationMs- is minimum duration in milliseconds for silence segments (optional).
-
-
Method Detail
-
getSupportedParameters
final Map<SampleRate, Set<FrameSize>> getSupportedParameters()
Valid sample rates and frame sizes for WebRTC VAD GMM model.
-
setSupportedParameters
final Unit setSupportedParameters(Map<SampleRate, Set<FrameSize>> supportedParameters)
-
getSampleRate
final SampleRate getSampleRate()
Set, retrieve and validate sample rate for Vad Model.
Valid Sample Rates:
8000Hz, 16000Hz, 32000Hz, 48000Hz
-
setSampleRate
final Unit setSampleRate(SampleRate sampleRate)
-
getFrameSize
final FrameSize getFrameSize()
Set, retrieve and validate frame size for Vad Model.
Valid Frame Sizes (per sample rate):
For 8000Hz: 80, 160, 240 For 16000Hz: 160, 320, 480 For 32000Hz: 320, 640, 960 For 48000Hz: 480, 960, 1440
-
setFrameSize
final Unit setFrameSize(FrameSize frameSize)
-
getMode
final Mode getMode()
Set and retrieve detection mode for Vad model.
Mode:
NORMAL, LOW_BITRATE, AGGRESSIVE, VERY_AGGRESSIVE
-
getSpeechDurationMs
final Integer getSpeechDurationMs()
Set, retrieve and validate speechDurationMs for Vad Model. The value of this parameter will define the necessary and sufficient duration of positive results to recognize result as speech. This parameter is optional.
Permitted range (0ms >= speechDurationMs <= 300000ms).
Parameters used for {@link VadSilero.isSpeech}.
-
setSpeechDurationMs
final Unit setSpeechDurationMs(Integer speechDurationMs)
-
getSilenceDurationMs
final Integer getSilenceDurationMs()
Set, retrieve and validate silenceDurationMs for Vad Model. The value of this parameter will define the necessary and sufficient duration of negative results to recognize it as silence. This parameter is optional.
Permitted range (0ms >= silenceDurationMs <= 300000ms).
Parameters used in {@link VadSilero.isSpeech}.
-
setSilenceDurationMs
final Unit setSilenceDurationMs(Integer silenceDurationMs)
-
isSpeech
final Boolean isSpeech(ShortArray audioData)
Determines if the provided audio data contains speech. The audio data is passed to the model for prediction.
- Parameters:
audioData- audio data to analyze.
-
isSpeech
final Boolean isSpeech(ByteArray audioData)
Determines if the provided audio data contains speech. The audio data is passed to the model for prediction. Size of audio chunk for ByteArray should be 2x of Frame size.
- Parameters:
audioData- audio data to analyze.
-
isSpeech
final Boolean isSpeech(FloatArray audioData)
Determines if the provided audio data contains speech. The audio data is passed to the model for prediction.
- Parameters:
audioData- audio data to analyze.
-
-
-
-