初めに

こちら動かしていきます。

昔にライブラリのverが合わずに諦めていたのですが、以下の方がPRでライブラリのverを固定するものを出していたので、こちらを元にいろいろ行い Dockerを使ってローカル推論を行います。

すぐに動かしたい方は以下にdockerおよびdocker composeを使って環境構築をしたものをRepositoryにあげているので、こちらで動かしてください

以下でビルドおよび推論ができます

docker compose build
docker compose run kotoba_speech

以下で公式の事前学習モデルを使って推論をすることができます

python -i fam/llm/fast_inference.py  --model_name kotoba-tech/kotoba-speech-v0.1
tts.synthesise(text="コトバテクノロジーズのミッションは音声基盤モデルを作る事です。", spk_ref_path="assets/bria.mp3")

開発環境

Windows 11
Docker Desktop
Docker Compose version v2.29.2-desktop.2

構築時の詳細

FlashAttentionのインストールについて

基本上手くインストールできない & ビルドをすると永遠に終わらないので、以下のwheelから適切なものを使ってインストールを行います。

今回は以下の条件ものを使います * CUDAバージョン：CUDA 12 * PyTorchバージョン：PyTorch 2.2 * Pythonバージョン：Python 3.10 * C++11 ABI：TRUE

こちらに該当するものは、flash_attn-2.7.0.post2+cu12torch2.2cxx11abiTRUE-cp310-cp310-linux_x86_64.whl になるため、以下でダウンロードおよびインストールを行います

RUN wget -q https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.0.post2/flash_attn-2.7.0.post2%2Bcu12torch2.2cxx11abiTRUE-cp310-cp310-linux_x86_64.whl \
    && pip install flash_attn-2.7.0.post2+cu12torch2.2cxx11abiTRUE-cp310-cp310-linux_x86_64.whl \
    && rm flash_attn-2.7.0.post2+cu12torch2.2cxx11abiTRUE-cp310-cp310-linux_x86_64.whl

github.com

audiocraftのインストール

audiocraftをpipからインストールをしようとすると以下のようなエラーが出ました

            [end of output]

        note: This error originates from a subprocess, and is likely not a problem with pip.
        ERROR: Failed building wheel for numpy
        Running setup.py clean for numpy
        error: subprocess-exited-with-error

        × python setup.py clean did not run successfully.
        │ exit code: 1
        ╰─> [10 lines of output]
            Running from numpy source directory.

            `setup.py clean` is not supported, use one of the following instead:

              - `git clean -xdf` (cleans all files)
              - `git clean -Xdf` (cleans all versioned files, doesn't touch
                                  files that aren't checked into the git repo)

            Add `--force` to your command to use it anyway if you must (unsupported).

            [end of output]

        note: This error originates from a subprocess, and is likely not a problem with pip.
        ERROR: Failed cleaning build dir for numpy
      Failed to build numpy
      ERROR: Could not build wheels for numpy, which is required to install pyproject.toml-based projects
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

(エラー文はログの最後の部分のみ)

今回は audiocraftをcloneして、フォルダ内からインストールを行っています

RUN git clone https://github.com/facebookresearch/audiocraft.git \
    && cd audiocraft \
    && pip install . \
    && pip list | grep audiocraft \
    && cd .. \
    && rm -rf audiocraft

pytorchの古い問題

はじめの方はpytorch 2.1で構築をしていたのですが、以下のエラー起きてpytorch 2.2にしています

AttributeError: torch._inductor.config.fx_graph_cache does not exist

このエラーは、torch._inductor.config 内に fx_graph_cache という属性が存在しないために発生しています。

ボリュームマウントを使ったフォルダ共有

Dockerfileのみの場合、docker image内で作成したファイルなどホスト側のPCからアクセスする場合すこしめんどくさいです。そのため以下のように docker composeを使ってボリュームマウントを行って今う

version: '3.9'

services:
  kotoba_speech:
    build:
      context: .
      dockerfile: Dockerfile
    image: kotoba_speech_image
    volumes:
      - .:/home/user/kotoba_speech_release
    runtime: nvidia  # 追加
    stdin_open: true
    tty: true
    command: >
      /bin/bash -c "
      cd /home/user/kotoba_speech_release &&
      pip install -e . &&
      exec /bin/bash
      "

こちらによって推論時の生成物をすぐに確認することができます

推論

ビルドなどが終わった後は、以下で推論を行います

python -i fam/llm/fast_inference.py --model_name kotoba-tech/kotoba-speech-v0.1 tts.synthesise(text="コトバテクノロジーズのミッションは音声基盤モデルを作る事です。", spk_ref_path="assets/bria.mp3")

これによって以下のようにファイルが生成されます