Skip to content

 

Navigation Menu


Type /

 

 

SWivid/F5-TTS_ci

 

 

 

 

 

SWivid/F5-TTS_ci_02F5-TTSPublic

 

 

SWivid/F5-TTS

 

 

 main

1 Branch0 Tags


t

Add file

Add file

Code

Folders and files

Name

 

 


Latest commit



SWivid/F5-TTS_f5_03SWivid



Update README.md




ab2ad3b · 2 days ago


History

361 Commits







.github





Create config.yml


2 weeks ago




ckpts





add suppl.


last month




data





Add back /data for local editable use case


last month




src





Fixes both issues from #480


3 days ago




.gitignore





Revert "."


3 weeks ago




.gitmodules





update Bigvgan vocoder and F5-bigvgan version, trained on Emilia ZH&E…


3 weeks ago




.pre-commit-config.yaml





Revert "."


3 weeks ago




Dockerfile





.


last month




LICENSE





Update LICENSE, switch to MIT


last month




README.md





Update README.md


2 days ago




pyproject.toml





formatting


3 days ago




ruff.toml





add and run pre-commit with ruff


last month

Repository files navigation

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

      

F5-TTS: Diffusion Transformer with ConvNeXt V2, faster trained and inference.

E2 TTS: Flat-UNet Transformer, closest reproduction from paper.

Sway Sampling: Inference-time flow step sampling strategy, greatly improves performance

Thanks to all the contributors !

News

Installation

# Create a python 3.10 conda env (you could also use virtualenv)
conda create -n f5-tts python=3.10
conda activate f5-tts

# Install pytorch with your CUDA version, e.g.
pip install torch==2.3.0+cu118 torchaudio==2.3.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118

 

Then you can choose from a few options below:

1. As a pip package (if just for inference)

pip install git+https:///SWivid/F5-TTS.git

 

2. Local editable (if also do training, finetuning)

git clone https:///SWivid/F5-TTS.git
cd F5-TTS
# git submodule update --init --recursive  # (optional, if need bigvgan)
pip install -e .

 

If initialize submodule, you should add the following code at the beginning of src/third_party/BigVGAN/bigvgan.py.

import os
import sys
sys.path.append(os.path.dirname(os.path.abspath(__file__)))

 

3. Docker usage

# Build from Dockerfile
docker build -t f5tts:v1 .

# Or pull from GitHub Container Registry
docker pull ghcr.io/swivid/f5-tts:main

 

Inference

1. Gradio App

Currently supported features:

# Launch a Gradio app (web interface)
f5-tts_infer-gradio

# Specify the port/host
f5-tts_infer-gradio --port 7860 --host 0.0.0.0

# Launch a share link
f5-tts_infer-gradio --share

 

2. CLI Inference

# Run with flags
# Leave --ref_text "" will have ASR model transcribe (extra GPU memory usage)
f5-tts_infer-cli \
--model "F5-TTS" \
--ref_audio "ref_audio.wav" \
--ref_text "The content, subtitle or transcription of reference audio." \
--gen_text "Some text you want TTS model generate for you."

# Run with default setting. src/f5_tts/infer/examples/basic/basic.toml
f5-tts_infer-cli
# Or with your own .toml file
f5-tts_infer-cli -c custom.toml

# Multi voice. See src/f5_tts/infer/README.md
f5-tts_infer-cli -c src/f5_tts/infer/examples/multi/story.toml

 

3. More instructions

  • In order to have better generation results, take a moment to read detailed guidance.
  • The Issues are very useful, please try to find the solution by properly searching the keywords of problem encountered. If no answer found, then feel free to open an issue.

Training

1. Gradio App

Read training & finetuning guidance for more instructions.

# Quick start with Gradio web interface
f5-tts_finetune-gradio

 

Evaluation

Development

Use pre-commit to ensure code quality (will run linters and formatters automatically)

pip install pre-commit
pre-commit install

 

When making a pull request, before each commit, run:

pre-commit run --all-files

 

Note: Some model components have linting exceptions for E722 to accommodate tensor notation

Acknowledgements

Citation

If our work and codebase is useful for you, please cite as:

@article{chen-etal-2024-f5tts,
      title={F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching}, 
      author={Yushen Chen and Zhikang Niu and Ziyang Ma and Keqi Deng and Chunhui Wang and Jian Zhao and Kai Yu and Xie Chen},
      journal={arXiv preprint arXiv:2410.06885},
      year={2024},
}

 

License

Our code is released under MIT License. The pre-trained models are licensed under the CC-BY-NC license due to the training data Emilia, which is an in-the-wild dataset. Sorry for any inconvenience this may cause.

About

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

arxiv.org/abs/2410.06885

Resources

 Readme

License

 MIT license

Activity

Stars

 7.3k stars

Watchers

 71 watching

Forks

 877 forks

Report repository

Releases

No releases published

Packages1

Contributors25

+ 11 contributors

Languages

 

Footer

© 2024 GitHub, Inc.

Footer navigation