Browsed by
Category: AI

Anyone can build apps that use DALL-E 2 to generate images

Anyone can build apps that use DALL-E 2 to generate images

DALL-E 2 beta now provides an open API which allows users to embed the ability to generate new images from text prompts or edit existing images in their own apps.

Microsoft is already leveraging it in Bing and Microsoft Edge with its Image Creator tool, which lets users create images if web results don’t return what they’re looking for. Fashion design app CALA is using the DALL-E 2 API for a tool that allows customers to refine design ideas from text descriptions or images, while photo startup Mixtiles is bringing it to an artwork-creating workflow.

Pricing for the DALL-E 2 API varies by resolution. For 1024×1024 images, the cost is $0.02 per image; 512×512 images are $0.018 per image; and 256×256 images are $0.016 per image. Volume discounts are available to companies working with OpenAI’s enterprise team.

Link:

Stable Diffusion Photoshop Plugin

Stable Diffusion Photoshop Plugin

There have been numerous attempts to make Stable Diffusion plugins for Adobe Photoshop. Many are not well baked nor free at this point but some are worth taking a look at.

Stable Diffusion on a Mac M1

Stable Diffusion on a Mac M1

You can get stable diffusion to work on the new Mac M1 (and M2’s). Above is me getting stable diffusion running on my Mac Mini M1 with 16gb of ram (using these instructions).

It’s definitely not as fast as dedicated GPU’s, but it does have the advantage of not running out of memory like you will with graphics cards that only have 8gb or 16gb of memory. Most GPUs require at least 24gb of ram to run 512×512 stable diffusion images (without using lower resolution models). I was able to generate 512×512 and 768×768 images with my 16gb mac mini m1

This is one of the really big advantages of a unified memory architecture in which the CPU and GPU can access the same memory without needing to be transported across a PCI bus first.

Links

Riffusion

Riffusion

Riffusion (Riff-fusion) is a music AI that you type in prompts and it generates music for you. It’s not going to win any awards anytime soon but it does seem to handle smooth and electronic tunes pretty well. Honestly, if I heard some of this in an elevator, I doubt I would notice.

One more step towards our automatically generated content future.

AI trained to get images from MRI brain scans

AI trained to get images from MRI brain scans

The top is the image the person saw, the bottom image is what the AI re-created from their brain scans

Researchers at Osaka University in Japan are among the ranks of scientists using A.I. to make sense of human brain scans. While others have tried using AI with MRI scans to visualize what people are seeing, the Osaka approach is unique because it used Stable Diffusion to generate the images. This greatly simplified their model so it required only a few thousands, instead of millions, of training parameters.

Normally, Stable Diffusion takes text descriptions/prompts which are run through a language model. That language model is trained against a huge library of images to generate a text-to-image latent space that can be queried to generate new amalgamated images (yes, a gross simplification).

The Osaka researchers took this one step further. The researchers used functional MRI (fMRI) scans from an earlier, unrelated study in which four participants looked at 10,000 different images of people, landscapes and objects while being monitored in an fMRI. The Osaka team then trained a second A.I. model to link brain activity in fMRI data with text descriptions of the pictures the study participants looked at.

Together, these two models allowed Stable Diffusion to turn fMRI data into relatively accurate images that were not part of the A.I. training set. Based on the brain scans, the first model could recreate the perspective and layout that the participant had seen, but its generated images were of cloudy and nonspecific figures. But then the second model kicked in, and it could recognize what object people were looking at by using the text descriptions from the training images. So, if it received a brain scan that resembled one from its training marked as a person viewing an airplane, it would put an airplane into the generated image, following the perspective from the first model. The technology achieved roughly 80 percent accuracy.

The team shared more details in a new paper, which has not been peer-reviewed, published on the preprint server bioRxiv.

Links:

Anadol’s data projections

Anadol’s data projections

Refik Anadol makes projection mapping and LED screen art. His unique approach, however, is embracing massive data sets churned through various AI algorithms as his visualization source.

I think one of his unique additions to the space is visualizing the latent space generated during machine learning stages.

Some of his projects:

Install Stable Dreamfusion on Windows

Install Stable Dreamfusion on Windows

I wrote about Stable Dreamfusion previously. Dreamfusion first takes normal Stable Diffusion text prompts to generate 2D images of the desired object. Stable Dreamfusion then uses those 2D images to generate 3D meshes.

A hamburger

The authors seemed to be using A100 nVidia cards on an Ubuntu system. I wanted to see if I could get this to work locally on my home Windows PC, and found that I could do so.

System configuration I am using for this tutorial:

  • nVidia GeForce GTX 3090
  • Intel 12th gen processor
  • Windows 10

Setting Stable Dreamfusion up locally:

Step 1: Update your Windows and drivers

  1. Update Windows
  2. Ensure you have the latest nVidia driver installed.

Step 2: Install Windows Subsystem for Linux (WSL)

  1. Install Windows Subsystem for Linux (WSL). WSL install is a simple command line install. You’ll need to reboot after you install. You want to make sure you install Ubuntu 22.04, which is the default in Feb 2023 since that is what Stable Dreamfusion likes. Currently WSL installs the latest Ubuntu distro by default, so this works:
    wsl –install
    If you want to make sure you get Ubuntu 22.04, use this command line:
    wsl –install -d Ubuntu-22.04
  2. After installing WSL, Windows will ask to reboot.
  3. Upon reboot, the WSL will complete installation and ask you to create a user account.
  4. Start Ubuntu 22.04 on WSL by clicking on the Windows Start menu and typing ‘Ubuntu’ or you can type Ubuntu at a command prompt and type ‘Ubuntu’.

Step 2b (optional): Install Ubuntu wherever you want on your Windows system. By default it installs the image on your C:\Users directory – which is kind of annoying.

Step 3: Install dependent packages on Ubuntu

  1. If you don’t have Ubuntu started, go ahead and start Ubuntu 22.04 on WSL by clicking on the Windows Start menu and typing ‘Ubuntu’ (or you can type Ubuntu at a command prompt as well). A new shell terminal should appear.
  2. You need to install the nVidia CUDA SDK on Ubuntu. You’ll choose one of these two options:
    • You will then get a set of install instructions at the bottom of the page (wget, apt-get, etc). Simply copy the lines one by one and put them into your Ubuntu terminal. Ensure each step passes without errors before continuing.
    • The ‘sudo apt-get -y install cuda’ line will install a lot of packages. It can take 10-15 minutes.
  3. Install python3 pip. This is required for the Dreamfusion requirements installation script.
    • sudo apt install python3-pip

Step 4: Install Stable Dreamfusion and dependent packages

  1. You should now follow the install instructions found on the Dreamfusion page.
  2. Clone the project as directed: git clone https://github.com/ashawkey/stable-dreamfusion.git
  3. Install with PIP: Install the pre-requisites via pip as directed on the Dreamfusion github page:
    • pip install -r requirements.txt
    • I also installed both optional packages nvdiffrast and CLIP.
    • Add this export line to your .bashrc to ensure python can find libcudnn:
      export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH
  4. I did not install the build extension options
  5. Exit and restart your shell so that all path changes take effect

Step 5: Run a workload!

Follow the instructions in the USAGE section of the Dreamfusion instructions. Instead of ‘python’ use ‘python3’. They have a number of things you can specify like negative prompts, using the GUI interface (which does not work under WSL),

The very first run will take a long time. It will download several gigabytes of training data, then train 100 epoch’s, which can take up to an hour.

$> python3 main.py --text "a hamburger" --workspace trial -O 
$> python3 main.py --text "a hamburger" --workspace trial -O --sd_version 1.5 
$> python3 main.py --workspace trial -O --test 
$> python3 main.py --workspace trial -O --test --save_mesh 

Check Your Output:

Look in the results directory under the workspace name:

.\stable-dreamfusion\<workspace name>\mesh\ #directory holds the .obj, .mat, and .png files
.\stable-dreamfusion\<workspace name>\results\ #directory holds a mp4 video that shows the object rotating

Copying them to Windows:
All Windows drives are pre-mounted in \mnt\<drive letter>\ for WSL.
Ex: \mnt\c\
So you can copy the output files to your windows side by doing:
cp -rP .\<workspace name> \mnt\c\workdir\

Looking at the generated meshes with materials:

  1. Install Blender
  2. File->Import->Wavefront (.obj) (legacy)
  3. Or, use 3D Viewer (though it seems to have issues with material loading at times)

Fixes:

  1. You might get an error about missing libcudnn_cnn_infer.so.8
==> Start Training trial Epoch 1, lr=0.050000 …
0% 0/100 [00:00<?, ?it/s]Could not load library libcudnn_cnn_infer.so.8. Error: libcuda.so: cannot open shared object file: No such file or directory

add this to your .bashrc to ensure it can find libcudnn:
export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH

2. If you load the object in Blender but it doesn’t load the texture maps, try Alt-Z

Links: