How to install the best open source TTS (text to speech model) -- ChatTTS and fix tone solution
1. Introduction
The ChatTTS, released on May 30, 2024, is a text-to-speech model specifically designed for conversational scenarios, such as LLM assistant dialogue tasks. It supports both English and Chinese languages. The largest model has been trained on over 100,000 hours of Chinese and English data. The open-source version on HuggingFace, which is based on 40,000 hours of training and has not undergone SFT (the specific training method is not mentioned), is available. This article will introduce how to install ChatTTS in PIP and fix tone solution (deployment on LattePanda single board computer).
ChatTTS boasts impressive voice quality, almost indistinguishable from human speech, and is suitable for video voiceovers and voice responses. The deployment process is relatively simple, and the latest update even supports installation directly via pip.
2. Install progress
The installation process is as follows:
2.1 First, set up a conda environment to isolate the relevant libraries.
conda create --name chattts -y
After executing the commands, the results should appear as follows:
Figure: Conda environment set up
2.2 Activate the conda environment.
conda activate chattts
After executing the command, the result should be as follows:
Figure: Conda activate Chattts
2.3 Create a directory (optional).
mkdir chattts
cd chattt
This directory is primarily used to save the generated WAV files.
2.4 Install the chattts-fork library.
pip install chattts-fork
The execution result should be as follows:
Figure: Pip install result
The installation completion message for chattts-fork will be displayed.
2.5 Run
After installation, it’s ready to use.
A simple usage method is as follows with the command:
chattts hello,world
The first time you run it, it will require downloading relevant dependency files. The gpt.ps file is approximately 901MB, and a total of nearly 1GB of disk space will be needed. The execution result should be as follows:
Figure: Chattts cli generate done
Afterward, a tts.wav file will appear in the current directory, which can be played using a media player. You will then hear the corresponding audio content.
Congratulations, your first audio file has been successfully created.
3. Additional Information
How can we use it? Here’s a brief introduction.
For detailed usage instructions, you can refer to the help file of the chattts command.
chattts -h
After executing the command, the content should appear as follows:
Figure: Chattts cli help premeter
The -h
command is the help command, which allows you to understand the parameter settings for running the command line.
4. Fix tone solution
The -s
option is the seed option. Since the voice is randomly generated, to ensure consistency in the voice, you can use the -s
option to guarantee voice consistency.
chattts -s 111 'this is a test sentense voice.'
chattts -s 111 'the voice is same as before.'
Male
Figure: Male voice seed
Female
Figure: Female voice seed
The `-o` option is for specifying the output filename. You can use this option to provide the desired filename for the output.
Figure: Chattts voice fix by seed
5. Precautions
When using pip install chattts-fork
, there are certain network requirements as it needs to download nearly 1GB of content. You can choose a pip source that is geographically closer to you.
The project has just been released and is updating rapidly. The related operations may be updated at any time. Please keep an eye on the 2noise/chattts GitHub project.
The model accepts English commas and periods as punctuation marks. Other punctuation marks are considered illegal. You can use an apostrophe as a separator to enclose the text.
WARNING:ChatTTS.core:Invalid characters found! : {':', ':', '\n', '!'}
6. Test in LattePanda Sigma
We have already deployed it on the Lattepanda Sigma and tested its performance.
Figure: Deploy Chattts in Lattepanda Sigma
Figure: Chattts generate wav result in Lattepanda Sigma
When running on the Lattepanda Sigma, you can see that a 22-second voice clip takes about 39 seconds to process, which is nearly a 1:2 processing efficiency. This is comparable to the efficiency on devices equipped with an RTX 4090 graphics card.
Recently, chattts has launched its own website at https://chattts.com/. If you prefer not to set things up yourself, you can also experience it through the web interface.
7. Reference
2. ChatTTS GitHub - yihong0618