OpenAI announced on Tuesday the release of its newest picture-making machine POINT-E, which can produce 3D point clouds directly from text prompts. Existing systems like DreamFusion typically require multiple hours — and GPUs — to generate their images, Point-E only needs one GPU and a minute or two. Paper here, and code here.
DALL-E 2 beta now provides an open API which allows users to embed the ability to generate new images from text prompts or edit existing images in their own apps.
Microsoft is already leveraging it in Bing and Microsoft Edge with its Image Creator tool, which lets users create images if web results don’t return what they’re looking for. Fashion design app CALA is using the DALL-E 2 API for a tool that allows customers to refine design ideas from text descriptions or images, while photo startup Mixtiles is bringing it to an artwork-creating workflow.
Pricing for the DALL-E 2 API varies by resolution. For 1024×1024 images, the cost is $0.02 per image; 512×512 images are $0.018 per image; and 256×256 images are $0.016 per image. Volume discounts are available to companies working with OpenAI’s enterprise team.
There have been numerous attempts to make Stable Diffusion plugins for Adobe Photoshop. Many are not well baked nor free at this point but some are worth taking a look at.
KerasCV is another implementation of stable diffusion. It offers some interesting features that makes it one of the fastest implementations of stable diffusion:
Graph mode execution
XLA compilation via a optimized linear algebra library via jit_compile=True
You can get stable diffusion to work on the new Mac M1 (and M2’s). Above is me getting stable diffusion running on my Mac Mini M1 with 16gb of ram (using these instructions).
It’s definitely not as fast as dedicated GPU’s, but it does have the advantage of not running out of memory like you will with graphics cards that only have 8gb or 16gb of memory. Most GPUs require at least 24gb of ram to run 512×512 stable diffusion images (without using lower resolution models). I was able to generate 512×512 and 768×768 images with my 16gb mac mini m1
First we had simple cut-and-paste. Then intelligent selection cut-and-paste. Then content aware cut-and-paste. Now we have select-and-replace with completely auto-generated art via Pair-Diffusion. Amazing times we live in.
Riffusion (Riff-fusion) is a music AI that you type in prompts and it generates music for you. It’s not going to win any awards anytime soon but it does seem to handle smooth and electronic tunes pretty well. Honestly, if I heard some of this in an elevator, I doubt I would notice.
One more step towards our automatically generated content future.
The top is the image the person saw, the bottom image is what the AI re-created from their brain scans
Researchers at Osaka University in Japan are among the ranks of scientists using A.I. to make sense of human brain scans. While others have tried using AI with MRI scans to visualize what people are seeing, the Osaka approach is unique because it used Stable Diffusion to generate the images. This greatly simplified their model so it required only a few thousands, instead of millions, of training parameters.
Normally, Stable Diffusion takes text descriptions/prompts which are run through a language model. That language model is trained against a huge library of images to generate a text-to-image latent space that can be queried to generate new amalgamated images (yes, a gross simplification).
The Osaka researchers took this one step further. The researchers used functional MRI (fMRI) scans from an earlier, unrelated study in which four participants looked at 10,000 different images of people, landscapes and objects while being monitored in an fMRI. The Osaka team then trained a second A.I. model to link brain activity in fMRI data with text descriptions of the pictures the study participants looked at.
Together, these two models allowed Stable Diffusion to turn fMRI data into relatively accurate images that were not part of the A.I. training set. Based on the brain scans, the first model could recreate the perspective and layout that the participant had seen, but its generated images were of cloudy and nonspecific figures. But then the second model kicked in, and it could recognize what object people were looking at by using the text descriptions from the training images. So, if it received a brain scan that resembled one from its training marked as a person viewing an airplane, it would put an airplane into the generated image, following the perspective from the first model. The technology achieved roughly 80 percent accuracy.
The team shared more details in a new paper, which has not been peer-reviewed, published on the preprint server bioRxiv.
Refik Anadol makes projection mapping and LED screen art. His unique approach, however, is embracing massive data sets churned through various AI algorithms as his visualization source.
I think one of his unique additions to the space is visualizing the latent space generated during machine learning stages.
I wrote about Stable Dreamfusion previously. Dreamfusion first takes normal Stable Diffusion text prompts to generate 2D images of the desired object. Stable Dreamfusion then uses those 2D images to generate 3D meshes.
A hamburger
The authors seemed to be using A100 nVidia cards on an Ubuntu system. I wanted to see if I could get this to work locally on my home Windows PC, and found that I could do so.
System configuration I am using for this tutorial:
nVidia GeForce GTX 3090
Intel 12th gen processor
Windows 10
Setting Stable Dreamfusion up locally:
Step 1: Update your Windows and drivers
Update Windows
Ensure you have the latest nVidia driver installed.
Step 2: Install Windows Subsystem for Linux (WSL)
Install Windows Subsystem for Linux (WSL). WSL install is a simple command line install. You’ll need to reboot after you install. You want to make sure you install Ubuntu 22.04, which is the default in Feb 2023 since that is what Stable Dreamfusion likes. Currently WSL installs the latest Ubuntu distro by default, so this works: wsl –install If you want to make sure you get Ubuntu 22.04, use this command line: wsl –install -d Ubuntu-22.04
After installing WSL, Windows will ask to reboot.
Upon reboot, the WSL will complete installation and ask you to create a user account.
Start Ubuntu 22.04 on WSL by clicking on the Windows Start menu and typing ‘Ubuntu’ or you can type Ubuntu at a command prompt and type ‘Ubuntu’.
If you don’t have Ubuntu started, go ahead and start Ubuntu 22.04 on WSL by clicking on the Windows Start menu and typing ‘Ubuntu’ (or you can type Ubuntu at a command prompt as well). A new shell terminal should appear.
You need to install the nVidia CUDA SDK on Ubuntu. You’ll choose one of these two options:
Distribution: Ubuntu (you can also select WSL Ubuntu which should be the same as the supposedly GPU accelerated)
Version: 2.0
Installer type: deb (local)
You will then get a set of install instructions at the bottom of the page (wget, apt-get, etc). Simply copy the lines one by one and put them into your Ubuntu terminal. Ensure each step passes without errors before continuing.
The ‘sudo apt-get -y install cuda’ line will install a lot of packages. It can take 10-15 minutes.
Install python3 pip. This is required for the Dreamfusion requirements installation script.
sudo apt install python3-pip
Step 4: Install Stable Dreamfusion and dependent packages
Clone the project as directed: git clone https://github.com/ashawkey/stable-dreamfusion.git
Install with PIP: Install the pre-requisites via pip as directed on the Dreamfusion github page:
pip install -r requirements.txt
I also installed both optional packages nvdiffrast and CLIP.
Add this export line to your .bashrc to ensure python can find libcudnn: export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH
I did not install the build extension options
Exit and restart your shell so that all path changes take effect
Step 5: Run a workload!
Follow the instructions in the USAGE section of the Dreamfusion instructions. Instead of ‘python’ use ‘python3’. They have a number of things you can specify like negative prompts, using the GUI interface (which does not work under WSL),
The very first run will take a long time. It will download several gigabytes of training data, then train 100 epoch’s, which can take up to an hour.
Look in the results directory under the workspace name:
.\stable-dreamfusion\<workspace name>\mesh\ #directory holds the .obj, .mat, and .png files .\stable-dreamfusion\<workspace name>\results\ #directory holds a mp4 video that shows the object rotating
Copying them to Windows: All Windows drives are pre-mounted in \mnt\<drive letter>\ for WSL. Ex: \mnt\c\ So you can copy the output files to your windows side by doing: cp -rP .\<workspace name> \mnt\c\workdir\
Or, use 3D Viewer (though it seems to have issues with material loading at times)
Fixes:
You might get an error about missing libcudnn_cnn_infer.so.8
==> Start Training trial Epoch 1, lr=0.050000 … 0% 0/100 [00:00<?, ?it/s]Could not load library libcudnn_cnn_infer.so.8. Error: libcuda.so: cannot open shared object file: No such file or directory
add this to your .bashrc to ensure it can find libcudnn: export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH