Voice Activity Detection (VAD) Settings
These settings are accessed from the Advanced Settings → Developer Settings Tab
Voice Activity Detection (VAD)
This block allows fine-tuning of how Rapport detects when a user starts and stops speaking.
VAD activation threshold (ms) – How long the user must be speaking before the system recognises it as speech.
VAD deactivation threshold (ms) – How long a pause must last before the system decides speech has ended.
VAD buffer length (s) – Maximum duration of speech in a single transcription block. Also affects Push-to-Talk behaviour. Max setting is 3 seconds
Experimenting with VAD Settings
The optimal thresholds depend on your use case. Experimentation is encouraged, and validation values exist on the fields to prevent invalid inputs.
Example Behaviour
Setting | Low Value Example | High Value Example |
---|---|---|
Activation Threshold (ms) | 200ms → Short phrases like “Hi” are recognised and transcribed quickly | 2000ms → Short phrases ignored; only longer utterances (e.g. 2s+) are transcribed |
Deactivation Threshold (ms) | 200ms → Each pause causes text to be transcribed in smaller chunks | 2000ms → Longer pauses required; whole speech appears as one block, with more delay |
✅ Tip: Start with default values, then adjust incrementally to balance responsiveness (low thresholds) and completeness of transcription (high thresholds).