Skip to main content
Skip table of contents

Voice Activity Detection (VAD) Settings

These settings are accessed from the Advanced Settings → Developer Settings Tab

Voice Activity Detection (VAD)
This block allows fine-tuning of how Rapport detects when a user starts and stops speaking.

  • VAD activation threshold (ms) – How long the user must be speaking before the system recognises it as speech.

  • VAD deactivation threshold (ms) – How long a pause must last before the system decides speech has ended.
    VAD buffer length (s) – Maximum duration of speech in a single transcription block. Also affects Push-to-Talk behaviour. Max setting is 3 seconds

Experimenting with VAD Settings

The optimal thresholds depend on your use case. Experimentation is encouraged, and validation values exist on the fields to prevent invalid inputs.

Example Behaviour

Setting

Low Value Example

High Value Example

Activation Threshold (ms)

200ms → Short phrases like “Hi” are recognised and transcribed quickly

2000ms → Short phrases ignored; only longer utterances (e.g. 2s+) are transcribed

Deactivation Threshold (ms)

200ms → Each pause causes text to be transcribed in smaller chunks

2000ms → Longer pauses required; whole speech appears as one block, with more delay

Tip: Start with default values, then adjust incrementally to balance responsiveness (low thresholds) and completeness of transcription (high thresholds).

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.