Local installation of Fish Audio on Windows 10

Local installation of Fish Audio on Windows 10

I’ve been exploring

Fish Audio S2 Pro is one of (if not the) best text-to-speech solutions. Getting it installed locally and working, however, isn’t so straightforward on Windows 10. There are at least 2 different ways to get this working. One of which is to download/run

Method 0: Use the free online version

It’s not hard – but expect to be limited in usage. https://fish.audio/app/text-to-speech

Method 1: Fish S2 Pro Zero Docker

  1. Go to the Huggingface Fish Audio S3 Pro project page.
  2. Ensure you’re logged into Huggingface, and you should see the ‘Run Locally’ option Go up in the link
  3. Ensure Docker is installed on the Windows desktop and WSL support is enabled in the Docker options.
  4. Open a WSL session running Ubuntu 24.04 or similar.
  5. Enter the docker command:
docker run -it -p 7860:7860 --platform=linux/amd64 --gpus all \
registry.hf.space/artificialguybr-fish-s2-pro-zero:latest python app.py

6. You’ll see the docker container download along with the models and start up:

(base) me@DESKTOP:/mnt/c/fish-audio-s2$ docker run -it -p 7860:7860 --platform=linux/amd64 --gpus all registry.hf.space/artificialguybr-fish-s2-pro-zero:latest python app.py
Cloning into 'fish-speech'…
remote: Enumerating objects: 6605, done.
remote: Counting objects: 100% (1088/1088), done.
remote: Compressing objects: 100% (292/292), done.
remote: Total 6605 (delta 905), reused 796 (delta 796), pack-reused 5517 (from 2)
Receiving objects: 100% (6605/6605), 28.21 MiB | 10.42 MiB/s, done.
Resolving deltas: 100% (4328/4328), done.
Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
Fetching 13 files: 100%|████████████████████████████████████████████████████████████████| 13/13 [02:19<00:00, 10.72s/it]Fetching 13 files: 62%|████████████████████████████████████████ | 8/13 [02:19<01:13, 14.78s/itYou are using a model of type fish_qwen3_omni to instantiate a model of type `. This may be expected if you are loading a checkpoint that shares a subset of the architecture (e.g., loading asam2_video checkpoint intoSam2Model), but is otherwise not supported and can yield errors. Please verify that the checkpoint is compatible with the model you are instantiating. Download complete: : 11.0GB [02:19, 79.0MB/s] 2026-07-03 18:59:16.787 | INFO | fish_speech.models.text2semantic.llama:from_pretrained:504 - Injected Semantic IDs into Config: 151678-155773 2026-07-03 18:59:16.787 | INFO | fish_speech.models.text2semantic.llama:from_pretrained:520 - Loading model from /home/user/.cache/huggingface/hub/models--fishaudio--s2-pro/snapshots/1de9996b6be38b745688de084d87a5633f714e4e, config: DualARModelArgs(model_type='dual_ar', vocab_size=155776, n_layer=36, n_head=32, dim=2560, intermediate_size=9728, n_local_heads=8, head_dim=128, rope_base=1000000, norm_eps=1e-06, max_seq_len=32768, dropout=0.0, tie_word_embeddings=True, attention_qkv_bias=False, attention_o_bias=False, attention_qk_norm=True, codebook_size=4096, num_codebooks=10, semantic_begin_id=151678, semantic_end_id=155773, use_gradient_checkpointing=True, initializer_range=0.01976423537605237, is_reward_model=False, scale_codebook_embeddings=True, audio_embed_dim=2560, n_fast_layer=4, fast_dim=2560, fast_n_head=32, fast_n_local_heads=8, fast_head_dim=128, fast_intermediate_size=9728, fast_attention_qkv_bias=False, fast_attention_qk_norm=False, fast_attention_o_bias=False, norm_fastlayer_input=True) 2026-07-03 18:59:46.228 | INFO | fish_speech.models.text2semantic.llama:from_pretrained:552 - Loading sharded safetensors weights 2026-07-03 18:59:46.717 | INFO | fish_speech.models.text2semantic.llama:from_pretrained:588 - Model weights loaded - Status: <All keys matched successfully> 2026-07-03 18:59:48.707 | INFO | fish_speech.models.text2semantic.inference:init_model:366 - Restored model from checkpoint 2026-07-03 18:59:48.708 | INFO | fish_speech.models.text2semantic.inference:init_model:371 - Using DualARTransformer/usr/local/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py:144: FutureWarning:torch.nn.utils.weight_normis deprecated in favor oftorch.nn.utils.parametrizations.weight_norm`.
WeightNorm.apply(module, name, dim)
* Running on local URL: http://0.0.0.0:7860, with SSR ⚡ (experimental, to disable set ssr_mode=False in launch())
* To create a public link, set share=True in launch().

7. Open a browser to localhost:7860

Method 2: Build and run locally

  1. Clone the github project: https://github.com/fishaudio/fish-speech
  2. Open a WSL Ubuntu 24.04 installation.
  3. Ensure you have nVidia support in WSL installed. Rebooting after this is often required.
  4. Follow the installation/build instructions.
    • Run the conda setup steps
    • Run the UV steps for CPU or GPU depending on your install
    • Skip the docker part
  5. WebUI On the left menu, select the ‘Inference’ from the list of items
    • Download the model weights with the hf command
      • You can test using the command line inference steps if you want to test it
    • Scroll down to the WebUI inference
    • Install Gradio if you want the older style (not so much recommended, but easier than Awesome WebUI)
    • Install the ‘Awesome WebUI’
    • Start the ‘Awesome WebUI’ using the python command
  6. Server
    • Select the ‘Server’ item from the list of left-hand items
    • Run the python command to start the server locally.
    • Try out one of the api_client.py commands to test it out

Things you can do with the server:

Other links:

Can cheaper, faster robotics revitalized modern manufacturing and transform the military?

Can cheaper, faster robotics revitalized modern manufacturing and transform the military?

It feels like American industrial and manufacturing landscape has been left behind in the digitalization revolution. But recent changes demonstrated in both Ukraine and a small robot company in Pittsburg may be pointing to the coming revolution.

Gecko is a scrappy robot company founded by a college senior that saw workers spending hours putting up dangerous scaffolding to check and fix pipes in a power plant. What if he could build robots to scale around the pipes and check and fix them? It turns out they could – and it is revolutionizing maintenance in refineries and energy infrastructure across the country. The robots now no longer can crawl and inspect/repair – but they can create new digital maps of a plant’s infrastructure. Inspections are taking orders of magnitude less time. Plants can have their systems remapped instead of relying on out of date diagrams.

It turns out someone else has the same problem: the military. Systems like Gecko allow the navy to build and repair ships faster. Gecko’s small robots reduced nuclear submarine inspection times from 300 hours to just 6 hours. But this revolution is bigger than just repair of existing systems.

The war in Ukraine is now being won not by ‘exquisite’, complex, and exorbitantly expensive weapons systems. Instead, it’s being won by swarms of low-cost drones. Million dollar tanks are being disabled by $200 drones with explosives. Military experts around the world are watching Ukraine and re-thinking everything. Even before the Ukraine war, the US navy was already started the move from big capital ships to cheaper, faster to build modular ships.

Anduril wrote a paper in 2024 that goes a step further. They claim that these low-cost robotics and AI systems are making existing gigantic expensive systems vulnerable and outdated. Further, the complex and hugely expensive 1st world weapon systems cost too much and take too long to make in quantities beyond short wars. What is needed is to establish fast, cheap, commercial manufacturing of these systems that can be built and deployed rapidly. This is leading to a revolution of automated manufacturing.

The future is not going to belong to giant, expensive, monolithic systems – but fast, easy to build, capable systems built in large numbers.

Give the article a read.

People aren’t buying 5090’s for game features

People aren’t buying 5090’s for game features

After 18 months, graphics features like multi-frame generation are working really well, but only a handful of games support path tracing to use it. How few? 7 titles. Even worse, the Steam list of top 100 most anticipated titles – only 1 has path tracing that allows multi-frame generation.

And yet, the price of 5090’s continues to remain sky high and selling better than ever.

It’s no wonder when AI chips sell for 10-100x and are driving all the sales. Graphics has taken a back seat.

New UI trends?

New UI trends?

We’ve gone through glass-morphism, squircle-morphism and lots of other morphism. These changes by a bunch of UI designers chasing whatever is cool looking is why the web site that was working just fine for you gets re-designed every 6 month.

Malewicz talks about some of these trends and the design languages used to describe and classify them.

Domestic modification of 4090 with 48gb of ram

Domestic modification of 4090 with 48gb of ram

4090 graphics cards came with 24gb of memory. As the AI boom sucked up everything on the market, some modders (mostly in China) learned you could upgrade the VRAM on a 4090 from 24GB to 48GB. These mods were often done poorly and had high failure rates; but more quality modders come online – including these guys in Michigan that seem to be reputable and stand behind their modding.

Still, it’s an odd world – the world is quickly aligning to standards of fitting their models into 32, 96, 128, or 256 of ram so these might not be as interesting as they once were. Still, someone locally was selling one of their cards recently.

Microsoft quietly extended Windows 10 Consumer Security Updates

Microsoft quietly extended Windows 10 Consumer Security Updates

In other announcements, Microsoft has quietly extended Windows 10 support for another year to 2027. You can enroll in Extended Security Updates (ESU) program any time until the program ends on October 12, 2027.

This is totally not an admission that Windows 10 is still over 25% of the install base even 5 years after Windows 11 released.

To enroll, make sure your device meets the following requirements – most notably they want you to use your Microsoft account and not any local accounts:

  • Devices need to be running Windows 10, version 22H2 Home, Professional, Pro Education, or Workstations edition.
  • Devices need to have the latest Windows update installed. Learn how to install Windows updates.
  • The Microsoft account used to sign in to the device must be an administrator account.
    • The ESU license will be associated with the Microsoft account used to enroll. You may be prompted to sign in with a Microsoft account if you typically sign into Windows with a local account.
Compute! (Gazette) magazine is back

Compute! (Gazette) magazine is back

Edwin “James” Nagle decided to see who actually owned the COMPUTE! magazine trademark. It turns out, nobody did. The name and assets of COMPUTE! were traded and sold and eventually put up as collateral which when the loan failed, reverted to a company that no longer existed.

He stepped in and got the rights from the US patent and trademark office and now officially has revived the Compute! Gazette magazine – complete with type-in programs!

Here’s his excellent talk on the subject.

Oh – and he considered the old print versions of the magazine he now owns the copyrights for as ‘free and open source’.