Skip to main content
Skip table of contents

ElevenLabs Models in Rapport - August 2024

Information source

Model Summary


ElevenLabs offers several text-to-speech models which can be used within Rapport, each with its own strengths and characteristics:

  1. English v1: The oldest and fastest model, optimized for English. It's reliable but limited in accuracy and flexibility. Best for audiobooks but less suitable for conversational speech.

  2. Multilingual v1 (experimental): Not recommended for general use due to its limitations.

  3. Multilingual v2: A significant improvement over v1, offering better accuracy, naturalness, and language coverage. (model id: eleven_multilingual_v2)

  4. Turbo v2: Optimized for low-latency applications without sacrificing vocal performance. It's English-only and very stable, but slightly less accurate than Multilingual v2.

  5. Turbo v2.5: The latest model, designed for extremely low latency tasks. The model ID is eleven_turbo_v2_5.

This table presents a clear overview of each model, its ID, and its description and languages supported.

Model Name

Model ID

Description

Languages Supported

Eleven Multilingual v2

eleven_multilingual_v2

Our most life-like, emotionally rich mode in 29 languages. Best for voice overs, audiobooks, post-production, or any other content creation needs.

English, Japanese, Chinese, German, Hindi, French, Korean, Portuguese, Italian, Spanish, Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic, Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian, Russian

Eleven Turbo v2.5

eleven_turbo_v2_5

Our high quality, lowest latency model in 32 languages. Best for developer use cases where speed matters.

English, Japanese, Chinese, German, Hindi, French, Korean, Portuguese, Italian, Spanish, Russian, Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic, Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian, Vietnamese, Norwegian, Hungarian

Eleven Turbo v2

eleven_turbo_v2

Previously our lowest latency model. Turbo v2.5 is 25% faster and supports 32 languages.

English

Eleven Multilingual v2 (STS)

(We don’t support this)

eleven_multilingual_sts_v2

Our cutting-edge, multilingual speech-to-speech model is designed for situations that demand unparalleled control over both the content and the prosody of the generated speech across various languages.

English, Japanese, Chinese, German, Hindi, French, Korean, Portuguese, Italian, Spanish, Russian, Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic, Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian

Eleven English v2 (STS)

 

(We don’t support this)

eleven_english_sts_v2

Our state-of-the-art speech to speech model suitable for scenarios where you need maximum control over the content and prosody of your generations.

English

Eleven English v1

eleven_monolingual_v1

Our first ever text to speech model. Now outclassed by Multilingual v2 (for content creation) and Turbo v2.5 (for low latency use cases).

English

Eleven Multilingual v1

eleven_multilingual_v1

Our first Multilingual model, capability of generating speech in 10 languages. Now outclassed by Multilingual v2 (for content creation) and Turbo v2.5 (for low latency use cases).

English, German, Polish, Spanish, Italian, French, Portuguese, Hindi, Arabic

Models in more detail

Explanation of Columns:

  • Model ID: The unique identifier for each model.

  • Name: The descriptive name of the model.

  • Can be Fine-Tuned: Indicates whether the model can be fine-tuned.

  • Can Do Text-to-Speech: Indicates if the model can generate speech from text.

  • Can Do Voice Conversion: Indicates if the model can convert one voice into another.

  • Can Use Style: Indicates whether the model can use styles (emotional tones, etc.).

  • Can Use Speaker Boost: Indicates if the model supports speaker boost functionality.

  • Languages Supported: A list of languages that the model supports.

 

Here’s a breakdown of the models

Model ID

Name

Can be Fine-Tuned

Can Do Text-to-Speech

Can Do Voice Conversion

Can Use Style

Can Use Speaker Boost

Languages Supported

eleven_multilingual_v2

Eleven Multilingual v2

True

True

False

True

True

English, Japanese, Chinese, German, Hindi, French, Korean, Portuguese, Italian, Spanish, Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic, Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian, Russian

eleven_turbo_v2_5

Eleven Turbo v2.5

True

True

False

False

False

English, Japanese, Chinese, German, Hindi, French, Korean, Portuguese, Italian, Spanish, Russian, Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic, Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian, Vietnamese, Norwegian, Hungarian

eleven_turbo_v2

Eleven Turbo v2

True

True

False

False

False

English

eleven_multilingual_sts_v2

Eleven Multilingual v2

True

False

True

True

True

English, Japanese, Chinese, German, Hindi, French, Korean, Portuguese, Italian, Spanish, Russian, Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic, Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian

eleven_english_sts_v2

Eleven English v2

False

False

True

True

True

English

eleven_monolingual_v1

Eleven English v1

False

True

False

False

False

English

eleven_multilingual_v1

Eleven Multilingual v1

False

True

False

False

False

English, German, Polish, Spanish, Italian, French, Portuguese, Hindi, Arabic

 

How do I set the ElevenLabs model in Rapport?


Under Project Settings

Start by setting the Text to Speech (TTS) option to ElevenLabs. The model can be set in the TTS Args field within the user interface.

To do this, we can enter a small piece of JSON code as shown below. This example uses the model eleven_turbo_v2_5

JSON
{
   "model_id":"eleven_turbo_v2_5",
   "voice_settings":{
      "stability":0.5,
      "similarity_boost":0.8,
      "style":0.0,
      "use_speaker_boost":true
   }
}

In the Rapport user interface (UI) the TTS arguments field is a write field and will validate if an incorrect JSON format has been entered. If you copy the format shown above then everything should be fine. You can however choose to alter the Elevenlabs model to suit your specific requirements.

image-20240830-124149.png


Next Step. Project Design

In the Project Design select Voice and enter the ElevenLabs Voice ID.

image-20240830-143002.png

Save your changes

And then click preview and try it out.

Custom Voices

If you are entering a custom voice ID, the voice_id can also be found on Elevenlabs website by selecting a voice on the their interface and clicking on ID, as shown below. This will copy the voice_id which can then be pasted into the field within the Rapport User Interface.

image-20240830-142404.png

Click here for further information on the ElevenLabs pre-made voices

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.