Package 

Class VadWebRTC

  • All Implemented Interfaces:
    java.io.Closeable , java.lang.AutoCloseable

    
    public final class VadWebRTC
     implements Closeable
                        

    Created by Georgiy Konovalov on 6/1/2023.

    The WebRTC VAD algorithm, based on GMM, analyzes the audio signal to determine whether it contains speech or non-speech segments.

    The WebRTC VAD supports the following parameters:

    Sample Rates:

        8000Hz,
        16000Hz,
        32000Hz,
        48000Hz

    Frame Sizes (per sample rate):

        For 8000Hz: 80, 160, 240
        For 16000Hz: 160, 320, 480
        For 32000Hz: 320, 640, 960
        For 48000Hz: 480, 960, 1440

    Mode:

        NORMAL,
        LOW_BITRATE,
        AGGRESSIVE,
        VERY_AGGRESSIVE

    Please note that the VAD class supports these specific combinations of sample rates and frame sizes, and the classifiers determine the aggressiveness of the voice activity detection algorithm.

    • Constructor Detail

      • VadWebRTC

        VadWebRTC(SampleRate sampleRate, FrameSize frameSize, Mode mode, Integer speechDurationMs, Integer silenceDurationMs)
        Parameters:
        sampleRate - is required for processing audio input.
        frameSize - is required for processing audio input.
        mode - is required for the VAD model.
        speechDurationMs - is minimum duration in milliseconds for speech segments (optional).
        silenceDurationMs - is minimum duration in milliseconds for silence segments (optional).
    • Method Detail

      • getSampleRate

         final SampleRate getSampleRate()

        Set, retrieve and validate sample rate for Vad Model.

        Valid Sample Rates:

            8000Hz,
            16000Hz,
            32000Hz,
            48000Hz
      • getFrameSize

         final FrameSize getFrameSize()

        Set, retrieve and validate frame size for Vad Model.

        Valid Frame Sizes (per sample rate):

            For 8000Hz: 80, 160, 240
            For 16000Hz: 160, 320, 480
            For 32000Hz: 320, 640, 960
            For 48000Hz: 480, 960, 1440
      • getMode

         final Mode getMode()

        Set and retrieve detection mode for Vad model.

        Mode:

            NORMAL,
            LOW_BITRATE,
            AGGRESSIVE,
            VERY_AGGRESSIVE
      • getSpeechDurationMs

         final Integer getSpeechDurationMs()

        Set, retrieve and validate speechDurationMs for Vad Model. The value of this parameter will define the necessary and sufficient duration of positive results to recognize result as speech. This parameter is optional.

        Permitted range (0ms >= speechDurationMs <= 300000ms).

        Parameters used for {@link VadSilero.isSpeech}.

      • getSilenceDurationMs

         final Integer getSilenceDurationMs()

        Set, retrieve and validate silenceDurationMs for Vad Model. The value of this parameter will define the necessary and sufficient duration of negative results to recognize it as silence. This parameter is optional.

        Permitted range (0ms >= silenceDurationMs <= 300000ms).

        Parameters used in {@link VadSilero.isSpeech}.

      • isSpeech

         final Boolean isSpeech(ShortArray audioData)

        Determines if the provided audio data contains speech. The audio data is passed to the model for prediction.

        Parameters:
        audioData - audio data to analyze.
      • isSpeech

         final Boolean isSpeech(ByteArray audioData)

        Determines if the provided audio data contains speech. The audio data is passed to the model for prediction. Size of audio chunk for ByteArray should be 2x of Frame size.

        Parameters:
        audioData - audio data to analyze.
      • isSpeech

         final Boolean isSpeech(FloatArray audioData)

        Determines if the provided audio data contains speech. The audio data is passed to the model for prediction.

        Parameters:
        audioData - audio data to analyze.
      • close

         Unit close()

        Closes the WebRTC VAD and releases all associated resources. This method should be called when the VAD is no longer needed to free up system resources.