The world has gotten very familiar to retro hardware re-creations, game emulation, re-releases, speed runs, creating new games for old platforms, as well as new exploits, tools, and discoveries. The nitty gritty work of doing all of this, however, is a labor of love. For those that dig into the binary, there’s tricky copyright concerns that need to be managed, only scraps of information about old hardware and software, highly optimized/tricky code that is tough to read, and almost no financial gain – except for commercial re-releases.
Made Up of Wires walks us through a live bit of decompiling of the PS1 classic: Castlevania: Symphony of the Night to give you a taste of the work involved in this kind of work. Not really that different than any other reverse engineering but surprisingly accessible as these old games were relatively small and simple.
Ben Eater decided to build his own VGA video card. Well, technically it’s more of a display adapter/controller since the card doesn’t provide any rendering or accelerate the image buffer generation portion – but it’s still a pretty fun watch.
This is pretty much how computer graphics started. Someone built a display controller. Then others added some helper hardware to speed up the buffer fills, then blitting, then rendering, AI upscaling/noise reduction, and now full on AI rendering. What a wild technology ride – but it was this early stuff that really got me excited about technology. You could create and build all of this kind of amazing stuff yourself.
Back in the day, I worked on this little project called Larrabee – which later turned into the Intel Xeon Phi coprocessor. It was an ambitious and exciting platform. It consisted of a ton of 512 bit wide instructions to operate like a lot of streaming GPU architectures, yet was fully general purpose x86.
It turned out that getting performance out of this hardware was difficult. In order to get the full potential of the hardware, you simply had to utilize the vector units. Without that, it is like writing a single threaded app on a 8 core system. Single SIMD lane operation just wasn’t going to cut it as was written about in 2017 International Journal of Parallel Programming article:
“Our results show that, although the Xeon Phi delivers a relatively good speedup in comparison with a shared-memory architecture in terms of scalability, the relatively low computing power of its computational units when specific vectorization and SIMD instructions are not fully exploited makes this first generation of Xeon Phi architectures not competitive”
The paper, and the host of others linked on the page as references, are a good read and gives some hints why fixed-function GPUs have an advantage when it comes to raw streaming throughput. Hint: cache and data flow behavior is as, if not more, important as utilizing vectorization in such architectures.
Cloud storage is increasingly becoming less free. You can’t go long before your iPhone or Google account notifies you that you’re almost full or already full – and give you a link to a handy-dandy subscription. But there is one place where you can upload all you want and the storage is still free – YouTube.
Adam Conway wrote up a fun little program that does exactly that. He creates video frames full of data and uploads them to YouTube. He tried QR codes, but YouTube compression artifacts made that untenable. Instead, he went brute force and each 1 or 0 was a 5×5 block of pixels set to the same color. At 1920×1080, that generates about 10KB of storage per frame.
He fired it up and gave it a whirl. It worked! He even posted the code on github. It’s definitely too slow and uses a tremendous amount of storage. To use for any meaningful data as you need to take the input file and encode each bit into a 5×5 pixel in an image, then encode the images together into a video file.
Do you want to do AI work but have have a laptop, NUC, or other tiny form factor computer that cannot accept a gigantic GPU? Does your system have an Oculink port? Then maybe one of this external GPU doc is for you.
Minisforum DEG1 eGPU Dock allows you to plug in an external GPU to your small form factor PC. The only trick is that you’ll need an Oculink port. A number of small form factor PC’s now come with Oculink (like this AtomMan X7 Ti).
OCuLink is short for “Optical-Copper Link” that allows you to connect PCIe devices using an external cable rather than an internal slot. OCuLink has been around in the server world for about a decade, but starting in 2024 has becoming increasingly present on tiny form factor pc’s like the Intel’s NUCs. OCuLink is gaining popularity because it’s cheaper than complex solutions like Thunderbolt and offers almost direct PCI speeds. OCuLink is virtually an extension of your device’s PCIe slot, boasting a bandwidth of up to 16 GB/s which is much faster than Thunderbolt 4 which caps out at 5 GB/s.
You can also buy desktop PC versions of Oculink (like this one) to try things out. They’re kind of unique because they come with 2 components. A shim M.2 card to plug into your PCIe slots, and then it connects via Oculink to a small connector board that your graphics card plugs into:
Here’s a review of the setup and performance. It’s extremely impressive. You can play Cyberpunk in 4K raytracing on a connected 4090 in Ultra at a steady 70fps. Even in overdrive it maintains a steady 50+ fps. Horizon Forbidden West at 4k Very High settings plays at a stable 80-100fps – even without framegen.
While it’s still too much of a Frankenstein approach right now that isn’t consumer friendly, but I think OCuLink has really raised the bar and is going to make Thunderbolt and USB have to really up their game.
Ever want to know what it’s like to work in a game studio? Double Fine has released a 33 episode series called PsychOdyssey which shows them developing Psychonauts 2 over 7 years.
The Long Dark was a great game I started playing during early access and really enjoyed. The lonely and desolate wilderness feel really worked well with the the struggle against very simple but brutal natural elements.
The game has been in development longer than some teenagers have even been alive – and has consequently changed a lot over that time. Kudos to Long Dark team for making a time capsule that lets you go back to those early drops by entering a release code in Steam.
While one should ALWAYS be cautious of trainers and save game editors (and there are some on the list that do have viruses (so it’s a good idea to scan them with a virus scanner and only run them in a virtual machine) here’s some of the older trainers for these early drops on GameCopyWorld.
Stable Diffusion really opened the world to what is possible with generative AI. Stable Diffusion 2 and 3 …well…did not go so well. For a while now, Stable Diffusion 1.5 was your best bet on locally generated AI art but it is really showing it’s age.
Now there is a new player in open source generative AI you can run locally. The developers from Stability.ai have founded Black Forest Labs and released their open source tool: Flux.1
While there are plenty of online generative AI’s like Midjourney, Adobe Firefly and others, they usually require paid or only give limited usage. What’s great about Flux.1 is that is allows completely local installation and usage.
Like many open source packages, there are free and paid versions. Their paid Pro version gives the most impressive results via their api (no purely local generation), a local dev version that can be used by developers but not for commercial use, and a free schnell version for personal use. Both the dev and shnell versions are available for local install and use.
So, lets get started with the shnell version – but the instructions are the same for dev except using 2 different model/weight files.
Instructions for installing Flux.1 on nVidia based Windows 10/11 system:
You might want to enable Windows Long Path support as python sometimes requires it for dependent packages. Be sure to reboot your system after enabling it.
Supported graphics card.
32gb of system ram (though again, you can use the smaller model if you have less ram)
Open a command prompt and make a local working root directory somewhere, I’ll use c:\depot\
You have a few options. First, you need to pick if you’re using the non-commercial Dev version or Schnell version. After that, each has the option of a single easy to use checkpoint package file, or each of the model data files individually. I’ll be using the Schnell ones, but you just need to get the Dev ones from the Dev branch if you want those instead.
C:\depot\ComfyUI>python main.py
A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.1 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.
If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.
Traceback (most recent call last): File "C:\depot\ComfyUI\main.py", line 83, in <module>
import comfy.utils
File "C:\depot\ComfyUI\comfy\utils.py", line 20, in <module>
import torch
File "C:\Users\matt\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\torch\__init__.py", line 2120, in <module>
from torch._higher_order_ops import cond
File "C:\Users\matt\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\torch\_higher_order_ops\__init__.py", line 1, in <module>
from .cond import cond
File "C:\Users\matt\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\torch\_higher_order_ops\cond.py", line 5, in <module>
import torch._subclasses.functional_tensor
File "C:\Users\matt\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\torch\_subclasses\functional_tensor.py", line 42, in <module>
class FunctionalTensor(torch.Tensor):
File "C:\Users\matt\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\torch\_subclasses\functional_tensor.py", line 258, in FunctionalTensor
cpu = _conversion_method_template(device=torch.device("cpu"))
C:\Users\matt\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\torch\_subclasses\functional_tensor.py:258: UserWarning: Failed to initialize NumPy: _ARRAY_API not found (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\utils\tensor_numpy.cpp:84.)
cpu = _conversion_method_template(device=torch.device("cpu"))
Total VRAM 24576 MB, total RAM 32492 MB
pytorch version: 2.4.0+cu121
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 3090 : cudaMallocAsync
Using pytorch cross attention
C:\depot\ComfyUI\comfy\extra_samplers\uni_pc.py:19: SyntaxWarning: invalid escape sequence '\h'
"""Create a wrapper class for the forward SDE (VP type).
****** User settings have been changed to be stored on the server instead of browser storage. ******
****** For multi-user setups add the --multi-user CLI argument to enable multiple user profiles. ******
[Prompt Server] web root: C:\depot\ComfyUI\web
C:\Users\matt\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\kornia\feature\lightglue.py:44: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
@torch.cuda.amp.custom_fwd(cast_inputs=torch.float32)
Import times for custom nodes:
0.0 seconds: C:\depot\ComfyUI\custom_nodes\websocket_image_save.py
Starting server
To see the GUI go to: http://127.0.0.1:8188
Open your web browser and go to http://127.0.01:8188
Click on the ‘Queue Prompt’ button to execute the current prompt
Technically it queues up the work and you should see progress in the command window where you launched python main.py
got prompt
model weight dtype torch.float8_e4m3fn, manual cast: torch.bfloat16
model_type FLOW
Using pytorch attention in VAE
Using pytorch attention in VAE
Model doesn't have a device attribute.
C:\Users\matt\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\transformers\tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
warnings.warn(
Model doesn't have a device attribute.
loaded straight to GPU
Requested to load Flux
Loading 1 new model
Requested to load FluxClipModel_
Loading 1 new model
C:\depot\ComfyUI\comfy\ldm\modules\attention.py:407: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.)
out = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=mask, dropout_p=0.0, is_causal=False)
100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:04<00:00, 1.18s/it]
Requested to load AutoencodingEngine
Loading 1 new model
Prompt executed in 23.65 seconds
When it completes you should see your image. You can then save your image or tweak the parameters.
Debugging help:
numpy is not available
My first runs, I got this from the console when I queued up a request:
got prompt
model weight dtype torch.float8_e4m3fn, manual cast: torch.bfloat16
model_type FLOW
Using pytorch attention in VAE
Using pytorch attention in VAE
Model doesn't have a device attribute.
C:\Users\matt\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\transformers\tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
warnings.warn(
Model doesn't have a device attribute.
loaded straight to GPU
Requested to load Flux
Loading 1 new model
Requested to load FluxClipModel_
Loading 1 new model
C:\depot\ComfyUI\comfy\ldm\modules\attention.py:407: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.)
out = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=mask, dropout_p=0.0, is_causal=False)
100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:04<00:00, 1.19s/it]
Requested to load AutoencodingEngine
Loading 1 new model
!!! Exception during processing!!! Numpy is not available
Traceback (most recent call last):
File "C:\depot\ComfyUI\execution.py", line 152, in recursive_execute
output_data, output_ui = get_output_data(obj, input_data_all)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\depot\ComfyUI\execution.py", line 82, in get_output_data
return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\depot\ComfyUI\execution.py", line 75, in map_node_over_list
results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\depot\ComfyUI\nodes.py", line 1445, in save_images
i = 255. * image.cpu().numpy()
^^^^^^^^^^^^^^^^^^^
RuntimeError: Numpy is not available
Prompt executed in 26.44 seconds
C:\depot\ComfyUI>pip install numpy==1.26.4
Defaulting to user installation because normal site-packages is not writeable
Collecting numpy==1.26.4
Downloading numpy-1.26.4-cp312-cp312-win_amd64.whl.metadata (61 kB)
Downloading numpy-1.26.4-cp312-cp312-win_amd64.whl (15.5 MB)
---------------------------------------- 15.5/15.5 MB 57.4 MB/s eta 0:00:00
Installing collected packages: numpy
Attempting uninstall: numpy
Found existing installation: numpy 2.0.1
Uninstalling numpy-2.0.1:
Successfully uninstalled numpy-2.0.1
Successfully installed numpy-1.26.4
C:\depot\ComfyUI>
Uninstalling all pip/python package, clear your pip cache, then re-install the requirements
The first time I installed, I got an error when downloading the numpy library during step in which you pip install the requirements. In order to clear the pip cache, uninstall all pip packages, then re-install all requirements again, I did the following:
I was recently making my own retro 486 DX 66 PC build and needed to add an ISA sound card that supported both DOS and Windows games. A genuine Sound Blaster card would definitely work, but buying an genuine Sound Blaster Pro will run you well over $150+ (over $200 with it’s box)
In googling around, I found this great thread on Vogons where someone asked the same question: Is there a cheaper alternative than finding a Sound Blaster/Sound Blaster Pro? It turns out there is – the really excellent ESS AudioDrive ES1868.
I had not heard of the ESS AudioDrive ES1868 ISA sound card before, but it is considered one of the best Sound Blaster clone cards. It has tons of features such as Sound Blaster Pro 2 compatibility (something even the Sound Blaster 16 doesn’t have!). It is extremely easy to set up for DOS and Windows, has mixer inputs for line-in, microphone, CD input, wavetable, and is a really quiet card (as opposed to Sound Blaster 16’s that suffered from chronic hum and pop issues to the point it was often called the ‘NoiseBlaster’). The drivers are easy to set up and even support non-PnP configuration. It makes the card work with 99% of DOS games. Even better, the cards are readily available for around $25-$30.
I bought a card for $25 off eBay and installed it without issue. The ESS drivers are available on Phil’s Computer Lab link (below). I download the drivers, ran the installer, and set the parameters during install to the same as a default Sound Blaster card: A220 I7 D1 H5 P330 T6 Address: 220h IRQ: 7 DMA: 1 Port: 330h Type: 6
I then popped up my copy of Wolfenstein 3D, chose the Sound Blaster output option with default parameters and got all the awesome audio of yesteryear.
Learning everything there is to know about the different Sound Blaster and clone sound cards:
DOS Days has really excellent write-ups on all the various Sound Blaster cards with pros and cons of each. I’m really glad I read up on the different models before buying a generic Sound Blaster 16. There’s a tremendous wealth of information about issues unique to each card. Definitely a site worth reading before buying a card from eBay.