Voice Activity Detection (VAD) Settings

Overview

These settings are accessed from the Advanced Settings → Developer Settings Tab

When using Voice Activity Detection (VAD) ensure you Speech-to-Text provider is set to Whisper

Voice Activity Detection (VAD)
This block allows fine-tuning of how Rapport detects when a user starts and stops speaking.

VAD activation threshold (ms) – How long the user must be speaking before the system recognises it as speech.
VAD deactivation threshold (ms) – How long a pause must last before the system decides speech has ended.
VAD buffer length (s) – Maximum duration of speech in a single transcription block. Also affects Push-to-Talk behaviour. Set this to a minimum of 60 seconds

Recommendation
VAD activation threshold = 100ms
VAD deactivation threshold = 500ms
VAD buffer length set to a minimum of 60 seconds but can be up to 300 seconds

Experimenting with VAD Settings

The optimal thresholds depend on your use case. Experimentation is encouraged, and validation values exist on the fields to prevent invalid inputs.

Example Behaviour

Setting	Low Value Example	High Value Example
Activation Threshold (ms)	200ms → Short phrases like “Hi” are recognised and transcribed quickly	2000ms → Short phrases ignored; only longer utterances (e.g. 2s+) are transcribed
Deactivation Threshold (ms)	200ms → Each pause causes text to be transcribed in smaller chunks	2000ms → Longer pauses required; whole speech appears as one block, with more delay

✅ Tip: Start with default values, then adjust incrementally to balance responsiveness (low thresholds) and completeness of transcription (high thresholds).