ElevenLabs Models in Rapport - August 2024
Information source
Model Summary
ElevenLabs offers several text-to-speech models which can be used within Rapport, each with its own strengths and characteristics:
English v1: The oldest and fastest model, optimized for English. It's reliable but limited in accuracy and flexibility. Best for audiobooks but less suitable for conversational speech.
Multilingual v1 (experimental): Not recommended for general use due to its limitations.
Multilingual v2: A significant improvement over v1, offering better accuracy, naturalness, and language coverage. (model id: eleven_multilingual_v2)
Turbo v2: Optimized for low-latency applications without sacrificing vocal performance. It's English-only and very stable, but slightly less accurate than Multilingual v2.
Turbo v2.5: The latest model, designed for extremely low latency tasks. The model ID is
eleven_turbo_v2_5
.
This table presents a clear overview of each model, its ID, and its description and languages supported.
Model Name | Model ID | Description | Languages Supported |
---|---|---|---|
Eleven Multilingual v2 | eleven_multilingual_v2 | Our most life-like, emotionally rich mode in 29 languages. Best for voice overs, audiobooks, post-production, or any other content creation needs. | English, Japanese, Chinese, German, Hindi, French, Korean, Portuguese, Italian, Spanish, Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic, Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian, Russian |
Eleven Turbo v2.5 | eleven_turbo_v2_5 | Our high quality, lowest latency model in 32 languages. Best for developer use cases where speed matters. | English, Japanese, Chinese, German, Hindi, French, Korean, Portuguese, Italian, Spanish, Russian, Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic, Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian, Vietnamese, Norwegian, Hungarian |
Eleven Turbo v2 | eleven_turbo_v2 | Previously our lowest latency model. Turbo v2.5 is 25% faster and supports 32 languages. | English |
Eleven Multilingual v2 (STS) (We don’t support this) | eleven_multilingual_sts_v2 | Our cutting-edge, multilingual speech-to-speech model is designed for situations that demand unparalleled control over both the content and the prosody of the generated speech across various languages. | English, Japanese, Chinese, German, Hindi, French, Korean, Portuguese, Italian, Spanish, Russian, Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic, Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian |
Eleven English v2 (STS)
(We don’t support this) | eleven_english_sts_v2 | Our state-of-the-art speech to speech model suitable for scenarios where you need maximum control over the content and prosody of your generations. | English |
Eleven English v1 | eleven_monolingual_v1 | Our first ever text to speech model. Now outclassed by Multilingual v2 (for content creation) and Turbo v2.5 (for low latency use cases). | English |
Eleven Multilingual v1 | eleven_multilingual_v1 | Our first Multilingual model, capability of generating speech in 10 languages. Now outclassed by Multilingual v2 (for content creation) and Turbo v2.5 (for low latency use cases). | English, German, Polish, Spanish, Italian, French, Portuguese, Hindi, Arabic |
Models in more detail
Explanation of Columns:
Model ID: The unique identifier for each model.
Name: The descriptive name of the model.
Can be Fine-Tuned: Indicates whether the model can be fine-tuned.
Can Do Text-to-Speech: Indicates if the model can generate speech from text.
Can Do Voice Conversion: Indicates if the model can convert one voice into another.
Can Use Style: Indicates whether the model can use styles (emotional tones, etc.).
Can Use Speaker Boost: Indicates if the model supports speaker boost functionality.
Languages Supported: A list of languages that the model supports.
Here’s a breakdown of the models
Model ID | Name | Can be Fine-Tuned | Can Do Text-to-Speech | Can Do Voice Conversion | Can Use Style | Can Use Speaker Boost | Languages Supported |
---|---|---|---|---|---|---|---|
eleven_multilingual_v2 | Eleven Multilingual v2 | True | True | False | True | True | English, Japanese, Chinese, German, Hindi, French, Korean, Portuguese, Italian, Spanish, Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic, Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian, Russian |
eleven_turbo_v2_5 | Eleven Turbo v2.5 | True | True | False | False | False | English, Japanese, Chinese, German, Hindi, French, Korean, Portuguese, Italian, Spanish, Russian, Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic, Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian, Vietnamese, Norwegian, Hungarian |
eleven_turbo_v2 | Eleven Turbo v2 | True | True | False | False | False | English |
eleven_multilingual_sts_v2 | Eleven Multilingual v2 | True | False | True | True | True | English, Japanese, Chinese, German, Hindi, French, Korean, Portuguese, Italian, Spanish, Russian, Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic, Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian |
eleven_english_sts_v2 | Eleven English v2 | False | False | True | True | True | English |
eleven_monolingual_v1 | Eleven English v1 | False | True | False | False | False | English |
eleven_multilingual_v1 | Eleven Multilingual v1 | False | True | False | False | False | English, German, Polish, Spanish, Italian, French, Portuguese, Hindi, Arabic |
How do I set the ElevenLabs model in Rapport?
Under Project Settings
Start by setting the Text to Speech (TTS) option to ElevenLabs. The model can be set in the TTS Args field within the user interface.
To do this, we can enter a small piece of JSON code as shown below. This example uses the model eleven_turbo_v2_5
{
"model_id":"eleven_turbo_v2_5",
"voice_settings":{
"stability":0.5,
"similarity_boost":0.8,
"style":0.0,
"use_speaker_boost":true
}
}
In the Rapport user interface (UI) the TTS arguments field is a write field and will validate if an incorrect JSON format has been entered. If you copy the format shown above then everything should be fine. You can however choose to alter the Elevenlabs model to suit your specific requirements.
Next Step. Project Design
In the Project Design select Voice and enter the ElevenLabs Voice ID.
Save your changes
And then click preview and try it out.
Custom Voices
If you are entering a custom voice ID, the voice_id can also be found on Elevenlabs website by selecting a voice on the their interface and clicking on ID, as shown below. This will copy the voice_id which can then be pasted into the field within the Rapport User Interface.
Click here for further information on the ElevenLabs pre-made voices