Local installation of Fish Audio on Windows 10

July 4, 2026 matt Comments 0 Comment

I’ve been exploring

Fish Audio S2 Pro is one of (if not the) best text-to-speech solutions. Getting it installed locally and working, however, isn’t so straightforward on Windows 10. There are at least 2 different ways to get this working. One of which is to download/run

Method 0: Use the free online version

It’s not hard – but expect to be limited in usage. https://fish.audio/app/text-to-speech

Method 1: Fish S2 Pro Zero Docker

Go to the Huggingface Fish Audio S3 Pro project page.
Ensure you’re logged into Huggingface, and you should see the ‘Run Locally’ option Go up in the link
Ensure Docker is installed on the Windows desktop and WSL support is enabled in the Docker options.
Open a WSL session running Ubuntu 24.04 or similar.
Enter the docker command:

docker run -it -p 7860:7860 --platform=linux/amd64 --gpus all \
registry.hf.space/artificialguybr-fish-s2-pro-zero:latest python app.py

6. You’ll see the docker container download along with the models and start up:

(base) me@DESKTOP:/mnt/c/fish-audio-s2$ docker run -it -p 7860:7860 --platform=linux/amd64 --gpus all registry.hf.space/artificialguybr-fish-s2-pro-zero:latest python app.py
Cloning into 'fish-speech'…
remote: Enumerating objects: 6605, done.
remote: Counting objects: 100% (1088/1088), done.
remote: Compressing objects: 100% (292/292), done.
remote: Total 6605 (delta 905), reused 796 (delta 796), pack-reused 5517 (from 2)
Receiving objects: 100% (6605/6605), 28.21 MiB | 10.42 MiB/s, done.
Resolving deltas: 100% (4328/4328), done.
Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
Fetching 13 files: 100%|████████████████████████████████████████████████████████████████| 13/13 [02:19<00:00, 10.72s/it]Fetching 13 files: 62%|████████████████████████████████████████ | 8/13 [02:19<01:13, 14.78s/itYou are using a model of type fish_qwen3_omni to instantiate a model of type `. This may be expected if you are loading a checkpoint that shares a subset of the architecture (e.g., loading asam2_video checkpoint intoSam2Model), but is otherwise not supported and can yield errors. Please verify that the checkpoint is compatible with the model you are instantiating. Download complete: : 11.0GB [02:19, 79.0MB/s] 2026-07-03 18:59:16.787 | INFO | fish_speech.models.text2semantic.llama:from_pretrained:504 - Injected Semantic IDs into Config: 151678-155773 2026-07-03 18:59:16.787 | INFO | fish_speech.models.text2semantic.llama:from_pretrained:520 - Loading model from /home/user/.cache/huggingface/hub/models--fishaudio--s2-pro/snapshots/1de9996b6be38b745688de084d87a5633f714e4e, config: DualARModelArgs(model_type='dual_ar', vocab_size=155776, n_layer=36, n_head=32, dim=2560, intermediate_size=9728, n_local_heads=8, head_dim=128, rope_base=1000000, norm_eps=1e-06, max_seq_len=32768, dropout=0.0, tie_word_embeddings=True, attention_qkv_bias=False, attention_o_bias=False, attention_qk_norm=True, codebook_size=4096, num_codebooks=10, semantic_begin_id=151678, semantic_end_id=155773, use_gradient_checkpointing=True, initializer_range=0.01976423537605237, is_reward_model=False, scale_codebook_embeddings=True, audio_embed_dim=2560, n_fast_layer=4, fast_dim=2560, fast_n_head=32, fast_n_local_heads=8, fast_head_dim=128, fast_intermediate_size=9728, fast_attention_qkv_bias=False, fast_attention_qk_norm=False, fast_attention_o_bias=False, norm_fastlayer_input=True) 2026-07-03 18:59:46.228 | INFO | fish_speech.models.text2semantic.llama:from_pretrained:552 - Loading sharded safetensors weights 2026-07-03 18:59:46.717 | INFO | fish_speech.models.text2semantic.llama:from_pretrained:588 - Model weights loaded - Status: <All keys matched successfully> 2026-07-03 18:59:48.707 | INFO | fish_speech.models.text2semantic.inference:init_model:366 - Restored model from checkpoint 2026-07-03 18:59:48.708 | INFO | fish_speech.models.text2semantic.inference:init_model:371 - Using DualARTransformer/usr/local/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py:144: FutureWarning:torch.nn.utils.weight_normis deprecated in favor oftorch.nn.utils.parametrizations.weight_norm`.
WeightNorm.apply(module, name, dim)
* Running on local URL: http://0.0.0.0:7860, with SSR ⚡ (experimental, to disable set ssr_mode=False in launch())
* To create a public link, set share=True in launch().

7. Open a browser to localhost:7860

Method 2: Build and run locally

Clone the github project: https://github.com/fishaudio/fish-speech
Open a WSL Ubuntu 24.04 installation.
Ensure you have nVidia support in WSL installed. Rebooting after this is often required.
Follow the installation/build instructions.
- Run the conda setup steps
- Run the UV steps for CPU or GPU depending on your install
- Skip the docker part
WebUI On the left menu, select the ‘Inference’ from the list of items
- Download the model weights with the hf command
  - You can test using the command line inference steps if you want to test it
- Scroll down to the WebUI inference
- Install Gradio if you want the older style (not so much recommended, but easier than Awesome WebUI)
- Install the ‘Awesome WebUI’
- Start the ‘Awesome WebUI’ using the python command
  - Open a browser on http://localhost:8888/ui
Server
- Select the ‘Server’ item from the list of left-hand items
- Run the python command to start the server locally.
- Try out one of the api_client.py commands to test it out

Things you can do with the server:

Create your own local cloned voice (.npy files) from sample wav files and transcribed text from inference.py.
Check out the Text to Speech API Developer’s Guide
- The Text to Speech API
Check out the emotional cues you can add to the text

Matt's Homepage

Local installation of Fish Audio on Windows 10

July 4, 2026 matt Comments 0 Comment

Method 0: Use the free online version

Method 1: Fish S2 Pro Zero Docker

Method 2: Build and run locally

Related

Leave a Reply Cancel reply

Method 0: Use the free online version

Method 1: Fish S2 Pro Zero Docker

Method 2: Build and run locally

Share this:

Related

Leave a Reply Cancel reply