Voice Activity Detection (VAD) Settings
These settings are accessed from the Advanced Settings → Developer Settings Tab
Voice Activity Detection (VAD)
This block allows fine-tuning of how Rapport detects when a user starts and stops speaking.
VAD activation threshold (ms) – How long the user must be speaking before the system recognises it as speech.
VAD deactivation threshold (ms) – How long a pause must last before the system decides speech has ended.
VAD buffer length (s) – Maximum duration of speech in a single transcription block. Also affects Push-to-Talk behaviour. Max setting is 3 seconds
Recommendation
VAD activation threshold = 100ms
VAD deactivation threshold = 500ms
VAD buffer length set to a minimum of 60 seconds but can be up to 300 seconds
Experimenting with VAD Settings
The optimal thresholds depend on your use case. Experimentation is encouraged, and validation values exist on the fields to prevent invalid inputs.
Example Behaviour
Setting | Low Value Example | High Value Example |
|---|---|---|
Activation Threshold (ms) | 200ms → Short phrases like “Hi” are recognised and transcribed quickly | 2000ms → Short phrases ignored; only longer utterances (e.g. 2s+) are transcribed |
Deactivation Threshold (ms) | 200ms → Each pause causes text to be transcribed in smaller chunks | 2000ms → Longer pauses required; whole speech appears as one block, with more delay |
✅ Tip: Start with default values, then adjust incrementally to balance responsiveness (low thresholds) and completeness of transcription (high thresholds).