Mobile VITS 移动端 VITS

User Guide

Make sure your phone has enough storage space, with at least 1GB of RAM.
Download the APK file from the release page and install it.
Feature Introduction:

Text-to-Speech (TTS)

After downloading the model files, unzip them and place them under the /sdcard/Download folder on your phone.
Click Load Config (approve permissions when prompted), then select /sdcard/Download/[your model folder]/config.json to load the config file. (Example: /sdcard/Download/Models/365_epochs/config.json)
Click Load Model (approve permissions when prompted), then select /sdcard/Download/[your model folder]/*.bin to load the model files. (Example: /sdcard/Download/Models/365_epochs/dec.ncnn.bin)
Enter text and click Generate.
Click Play to play the generated audio. Click Export to export the generated audio file. The file will be saved in the parent directory of the model folder.

Voice Conversion (VC)

Config and model loading are the same as above.
Click Record Voice to use your phone’s microphone to record the voice to be converted (make sure microphone permission is granted), or click Load Audio to load an audio file you want to convert (currently only .wav format is supported).
Select the original speaker and the target speaker.
Click Convert to transform the voice from the original speaker into the target speaker.

Notes:

If inference speed is slow, manually increase the number of threads (default is 1). The GPU option can be disabled—since Vulkan support is incomplete, enabling it may actually slow things down.
This project currently supports only Japanese, Chinese, and English, so make sure the input text is in one of these supported languages.

(Optional) Build Instructions

Download the code to a specified directory:

git clone https://github.com/weirdseed/Vits-Android-ncnn.git

Download the Vulkan version of the ncnn library, or get it manually from https://github.com/Tencent/ncnn/releases.
- Extract it into the project’s \app\src\main\cpp\ directory.
- Rename the folder to ncnn.
- Directory structure should look like this:

├─openjtalk.asset_manager_api
├─audio_process
├─fftpack
├─openjtalk.jpcommon
├─openjtalk.mecab
├─openjtalk.mecab2njd
├─openjtalk.mecab_api
├─ncnn
│  ├─arm64-v8a
│  ├─armeabi-v7a
│  ├─x86
│  └─x86_64
├─openjtalk.njd
├─openjtalk.njd2jpcommon
├─openjtalk.njd_set_accent_phrase
├─openjtalk.njd_set_accent_type
├─openjtalk.njd_set_digit
├─openjtalk.njd_set_long_vowel
├─openjtalk.njd_set_pronunciation
├─openjtalk.njd_set_unvoiced_vowel
├─openjtalk.text2mecab
└─vits

If compiling ncnn yourself: note that RTTI is disabled by default in ncnn’s build options. Enable it manually if required. See the official build guide for instructions.
Download the openjtalk dictionary files, extract them into the \src\main\assets folder.
- Directory structure should look like:

├─multi
├─open_jtalk_dic_utf_8-1.11
└─single

4、Compile and run the project.

Using Your Own Trained Models

Tool for conversion: https://github.com/weirdseed/vits-ncnn-convert-tool

Preview Images

使用说明

一、确保手机有足够的空间，运行内存不小于1GB

二、下载发布页的apk文件并安装

三、功能介绍

文字转语音（tts）

将模型文件下载之后解压放在手机/sdcard/Download文件夹下。
点击加载配置（批准权限后），选择/sdcard/Download/[你的模型目录]/config.json加载配置文件。（示例：/sdcard/Download/模型/365_epochs/config.json）
点击加载模型（批准权限后），选择/sdcard/Download/[你的模型目录]/*.bin加载模型文件。（示例：/sdcard/Download/模型/365_epochs/dec.ncnn.bin）
输入文本，点击生成
点击播放即可播放音频，点击导出即可将生成- 的音频导出，音频文件会保存在模型目录的上一级目录中

声线转换（vc）

配置加载和模型加载同上
点击录制声音将开启手机麦克风录制待转换的声音（请确保录音权限批准）或者点击加载音频将加载您要转换的音频（目前仅支持.wav格式）
分别选择原讲话人和目标讲话人
点击转换按钮即可将声音从原讲话人转换到目标讲话人

注意：

1、推理速度慢的话请手动增加线程数（默认为1），gpu选项可以选择不开启，由于Vulkan部分代码没有写，所以开启后反而更慢

2、本项目目前仅支持日语、中文和英文，所以确保输入支持的文本

四、（可选）自行编译教程

1.将代码下载到指定目录

git clone https://github.com/weirdseed/Vits-Android-ncnn.git

2.下载Vulkan版本ncnn库，或者自行到https://github.com/Tencent/ncnn/releases 下载，解压到项目的\app\src\main\cpp\目录下，（需更改目录名称为ncnn），目录结构如下

├─openjtalk.asset_manager_api
├─audio_process
├─fftpack
├─openjtalk.jpcommon
├─openjtalk.mecab
├─openjtalk.mecab2njd
├─openjtalk.mecab_api
├─ncnn
│  ├─arm64-v8a
│  ├─armeabi-v7a
│  ├─x86
│  └─x86_64
├─openjtalk.njd
├─openjtalk.njd2jpcommon
├─openjtalk.njd_set_accent_phrase
├─openjtalk.njd_set_accent_type
├─openjtalk.njd_set_digit
├─openjtalk.njd_set_long_vowel
├─openjtalk.njd_set_pronunciation
├─openjtalk.njd_set_unvoiced_vowel
├─openjtalk.text2mecab
└─vits

2.1 自行编译ncnn，请注意，ncnn默认的编译选项中rtti为关闭状态，需要的话自己手动打开，编译教程见链接

3、下载openjtalk字典文件解压到\src\main\assets文件夹，目录结构为

├─multi
├─open_jtalk_dic_utf_8-1.11
└─single

4、编译并运行项目

使用自己训练的模型

https://github.com/weirdseed/vits-ncnn-convert-tool

预览图

PreviousCHATGPT API NextLive2D

Last updated 2 days ago