CUDA with WSL2+Docker on Windows for Kokoro

CUDA with WSL2+Docker on Windows for Kokoro

In order to get Kokoro text-to-speech to work with my 5090 GPU inside a Docker container on Windows, I had to get the CUDA setup inside WSL2 using Docker images on Windows. Here were the helpful links:

Setup the WSL image with CUDA and Docker support.

  1. Ensure you have WSL2 installed on your Windows desktop
  2. Install Docker for Windows
  3. Reboot
  4. Ensure you have the latest NVidia GPU driver installed on your Windows desktop
  5. Ensure you have the latest NVidia CUDA sdk installed on your Windows desktop
  6. A system reboot here is a good idea
  7. Ensure you have Microsoft Visual Studio installed for best results
    • The CUDA sdk requires you have Visual Studio 2019, 2022, or 2026 installed. I have found 2022 is a solid version and seems to work well, but haven’t tried 2019 or 2026 recently.
    • Note you need to use the ‘Visual Studio x64 Native tools for VS2022’ command prompt if you expect to run command line compiler operations.
  8. Now you need to install CUDA support inside WSL NVidia guide
    • First follow the instructions to ensure Docker support
    • Second follow the installation of CUDA Toolkit and CUDA Developer Tools inside WSL2 on an Ubuntu 24.04 (or later) WSL distro.
  9. Optional: install nvcc so y
    • sudo apt install nvidia-cuda-toolkit
    • nvcc –version
      • This prints out the version and ensures the nvcc sdk compiler is working inside your WSL image
  10. Ensure Docker Desktop for Windows is running on your desktop
  11. Turn on support for Docker in WSL
    • Settings -> From the General tab, enable Use WSL 2 based engine.
    • Reboot
  12. Ensure your WSL image inside WSL has Docker support enabled
    • Settings -> Resources -> WSL integration tab -> Find and enable the slider for the WSL image under ‘Enable integration with additional distros’
  13. I had to reboot my system at this point to get both the WSL and Windows Docker Desktop to see each other properly
  14. After reboot, ensure Windows Docker Desktop is running
  15. Set the user permissions in the WSL session to enable access (permission denied while trying to connect to the docker API at unix:///var/run/docker.sock)
    • sudo usermod -a -G docker <username>

Now run the Kokoro docker command inside your WSL2 image:

  • CPU only:
    • docker run –gpus all -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-cpu:latest
  • NVidia gpu:
    • docker run –gpus all -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-gpu:latest-cu128

Connect with the default web link:

Subsequent runs:

  1. Ensure the Docker Desktop is running
  2. Start the WSL -d <image> you set up in the first part that’s has all the CUDA stuff installed
  3. run the Kokoro docker command inside WSL
  4. Connect your web browser to the localhost address

Raw WSL history (commands in WSL)

You can’t follow this 100% directly. Note you have to stop and restart/reboot WSL and desktop as you do the other steps above.

# NVidia CUDA setup inside WSL2 Ubuntu 24.04 image
wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/13.3.0/local_installers/cuda-repo-wsl-ubuntu-13-3-local_13.3.0-1_amd64.deb
sudo dpkg -i cuda-repo-wsl-ubuntu-13-3-local_13.3.0-1_amd64.deb
sudo cp /var/cuda-repo-wsl-ubuntu-13-3-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda-toolkit-13-3
exit
sudo apt install nvidia-cuda-toolkit
nvcc --version
exit
sudo usermod -a -G docker matt
exit
docker run --gpus all -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-gpu:latest-cu128
Wielder’s Edge Series’ Articles

Wielder’s Edge Series’ Articles

This is a really interesting set of DEV articles written by a software engineer. Initially a set of 8 articles, it’s expanded to 21 and counting. It does an excellent job of describing and analyzing exactly the conditions engineers are going through – from watching an AI agent do a week’s worth of work in 1 hour while they were in a meeting to seeing 27.5% of programmer jobs disappear in 2 years.

Definitely a good read.

Napoleon’s Thoughts on Jesus

Napoleon’s Thoughts on Jesus

Near the end of his life, the exiled Emperor Napoleon had a conversation with one of his generals about the deity of Christ.

General Bertrand: “I can not conceive, sire, how a great man like you can believe that the Supreme Being ever exhibited himself to men under a human form, with a body, a face, mouth, and eyes.

Napoleon Bonaparte: “Let Jesus be whatever you please – the highest intelligence, the purest heart, the most profound legislator, and, in all respects, the most singular being who has ever existed – I grant it.

General Bertrand: “Still, he was simply a man, who taught his disciples, and deluded credulous people, as did Orpheus, Confucius, Brama.”

To this Napoleon responded by saying:

“I know men, and I tell you Jesus Christ was not a man.

Superficial minds see a resemblance between Christ and the founders of empires and the gods of other religions. That resemblance does not exist.

There is between Christianity and other religions the distance of infinity.

Alexander, Cæsar, Charlemagne and myself founded empires. But on what did we rest the creations of our genius? Upon sheer force. Jesus Christ alone founded His empire upon love; and at this hour millions of men will die for Him. In every other existence but that of Christ how many imperfections!

From the first day to the last He is the same; majestic and simple; infinitely firm and infinitely gentle. He proposes to our faith a series of mysteries and commands with authority that we should believe them, giving no other reason than those tremendous words, ‘I am God.’”

Source: C., Abbott John S. The History of Napoleon Bonaparte, University Press of the Pacific, Honolulu, HI, 1883.

Gaussian splats in your browser

Gaussian splats in your browser

I have written about Gaussian Splatting graphics before. It has the ability to produce unbelievably photorealistic environments (with some limitations like animations being a very rough spot for it) – but are now entering the realm of realtime.

Developer Iakov Sumygin has created a minimalistic playable video game using Gaussian splats for the environment data, and you can load it up and try it right now in your browser.

It definitely shows of the quality – and the limitations – of the technology. It’s an interesting experiment showing how far we’ve come.

Running Gemma 4 with a 5090 on llama.cpp

Running Gemma 4 with a 5090 on llama.cpp

First, grab a trustworthy Gemma 4 gguf model (unsloth is great). I have been fooling around with Q4, Q6, Q8 models of gemma-4-26B and gemma-4-31B models.

opertyE2BE4B31B Dense
Total Parameters2.3B effective (5.1B with embeddings)4.5B effective (8B with embeddings)30.7B
Layers354260
Sliding Window512 tokens512 tokens1024 tokens
Context Length128K tokens128K tokens256K tokens
Vocabulary Size262K262K262K
Supported ModalitiesText, Image, AudioText, Image, AudioText, Image
Vision Encoder Parameters~150M~150M~550M
Audio Encoder Parameters~300M~300MNo Audio

Grab the latest version of llama.cpp and compile it with CUDA support for GPU usage (or CPU if you don’t have a CUDA enabled GPU).

Then set up your server command line and it seems some get about 600 Tok/s using 26B on a 5090, or maybe try out the turbo-quant variant of llama-cpp (I have not).

Then take off and make some nifty projects with it.