Browsed by
Category: AI

Stable diffusion 2.0 was…well…

Stable diffusion 2.0 was…well…

Stable Diffusion 2.0 seems to have been a step backwards in capabilities and quality. Many people went back to v1.5 for their business.

The difficulty in 2.0 was in part caused by:

  1. Using a new language model that is trained from scratch
  2. The training dataset was heavily censored with a NSFW filter

The second part would have been fine, but the filter was quite inclusive and has removed substantial amount of good-quality data. 2.1 promised to bring the good data back.

Installing Stable Diffusion 2.1

If you’re interested in trying Stable Diffusion 2.1, use this tutorial to installing and use 2.1 models in AUTOMATIC1111 GUI, so you can make your judgement by using it.

You might also try this tutorial by TingTing

Links:

AI generated comic books

AI generated comic books

There’s a creative war going on surrounding AI generated art. While some are fighting AI generated art, others are fully embracing it.

AI Comic Books is a whole website/company dedicated to publishing comic books that rely on AI generated art. Check out the offerings on their website to see where the state of graphic novels is going.

This definitely spawns some discussions on where AI art is going to find it’s place in society. I think the cat is out of the bag; and now we’ll have to deal with the economic and moral questions it is generating; but I think that’s a discussion for another article…

Stable diffusion in other languages

Stable diffusion in other languages

Stable Diffusion was developed by CompVisStability AI, and LAION. It mainly uses the English subset LAION2B-en of the LAION-5B dataset for its training data and, as a result, requires English text prompts to producing images.

This means that the tagging and correlating of images and text are based on English tagged data sets – which naturally tend to come from English-speaking sources and regions. Users that use other languages must first use a translator from their native language to English – which often loses the nuances or even core meaning. On top of that, it also means the latent model images Stable Diffusion can use are usually limited to English-speaking region sources.

For example, one of the more common Japanese terms re-interpreted from the English word businessman is “salary man” which we most often imagine as a man wearing a suit. You would get results that look like this, which might not be very useful if you’re trying to generate images for a Japanese audience.

rinna Co., Ltd. has developed a Japanese-specific text-to-image model named “Japanese Stable Diffusion”. Japanese Stable Diffusion accepts native Japanese text prompts and generates images that reflect the naming and tagged pictures of the Japanese-speaking world which may be difficult to express through translation and whose images may simply not present in the western world. Their new text-to-image model was trained on source material that comes directly from Japanese culture, identity, and unique expressions – including slang.

They did this by using a two step approach that is instructive on how stable diffusion works.

First, the latent diffusion model is left alone and they replaced the English text encoder with a Japanese-specific text encoder. This allowed the text encoder to understand Japanese natively, but would still generate western style tagged images because the latent model remained intact. This was still better than just translating the stable diffusion prompt.

Now Stable Diffusion could understand what the concept of a ‘businessman’ was but it still generated images of decidedly western looking businessmen because the underlying latent diffusion model had not been changed:

The second step was to retrain the the latent diffusion model from more Japanese tagged data sources with the new text encoder. This stage was essential to make the model become more language-specific. After this, the model could finally generate businessmen with the Japanese faces they would have expected:

Read more about it on the links below.

Links:

A.I. coming to a bed near you

A.I. coming to a bed near you

Bryte Balance bed proports to use A.I. to sense pressure imbalances of those laying on the bed and then automatically controls a number of adjustable ‘rebalancers’ that give anyone laying on it a better night’s sleep.

Combine that with some ultra lux phone app controls and you got the making of a luxury bed being used at some of the top luxury hotels and resorts in the world – like the 5 star Park Hyatt New York and Carillon Miami.

You can own one yourself if you like. It’ll only set you back $6,299

It’s nothing… forever

It’s nothing… forever

Nothing, Forever is a 24 hour a day Twitch stream with an amazing premise. It runs 24 hours a day, 365 days of the year and delivers new content every minute. Everything you see, hear, or experience (with the exception of the artwork and laugh track) is always brand new content, continually generated via machine learning and AI algorithms. It never repeats (except when the AI generates the same content).

It was launched by Mismatch Media, a media lab focused on creating experimental forms of television shows, video games, and more, using generative and other machine learning technologies.

Give it a watch and be amazed. Sadly, it’s probably better than probably 50% of current TV shows.

Free AI Art Prompt Builders

Free AI Art Prompt Builders

If you’re not interested in buying AI prompts from a Prompt Marketplace, you still are in luck. There are a number of free resources and AI Prompt Builder tools out there to help you along the way – or help you out of artist blocks you might run into.

Midjourney Prompt Generator – Either select one of the samples or provide your own, and the generator uses a GPT-2 model that has been fine-tuned on midjourney-prompts dataset. The prompt dataset contains 250,000 text prompts supplied to the Midjourney text-to-image service by users.

Phraser.tech is an tool for Midjourney and Dall-E art generators that walks you through numerous questions and steps to help you create precisely tailored prompts with the best parameters.

MidJourney Prompt Helper helps you experiment with different styles, lighting, cameras, colors, and other creative elements.

Drawing Prompt Generator is a simple helper to aid in getting rid of artists’ block. Simply gaze at a stream of unrelated objects might help you get the creative juices flowing.

Promptomania Builder is a strong but very easy-to-use helper with upscaling and different variations to become a prompt master.

MidJourney Random Commands Generator – is a prompt tool for generating complex outputs. It was created for entertainment purposes by enthusiasts.

Using a Neural Net as compression for character animation

Using a Neural Net as compression for character animation

This was published in 2018, but it’s a fascinating dual purpose use of neural nets. Firstly, there was a massively increasing issue with character animation. Character animation is quickly becoming highly complex as it has becoming more realistic. The problem compounds when you want to make sure you can do things like crouch and aim at the same time. Or crouch and walk across uneven terrain while looking left or right. You can imagine all the different kinds of combinations of motion that must be described and handled. This all started taking massively more time to develop by artists; but even worse it was taking up more and more storage space on disk and especially in memory space.

Daniel Holden of Ubisoft wondered if he could use a neural net to not only reduce the combinations they had to handle into a net but also utilize the inherent nature of neural nets to compress data. It turns out he could – and he presents what he found in this excellent presentation.

Links:

AI solver reduces a 100,000-equation quantum physics problem to four equations

AI solver reduces a 100,000-equation quantum physics problem to four equations

Physicists recently use a neural net to compressed a daunting quantum problem that required 100,000 equations into a solution that requires as few as four equations—all without sacrificing accuracy.

The problem consists of how electrons behave as they move on a gridlike lattice. When two electrons occupy the same lattice site, they interact. This setup, known as the Hubbard model, is an idealization of several important classes of materials and enables scientists to learn how electron behavior gives rise to sought-after phases of matter, such as superconductivity, in which electrons flow through a material without resistance.

The Hubbard model is deceptively simple, however. For even a modest number of electrons the problem requires serious computing power. That’s because when electrons interact, their fates can become quantum mechanically entangled: Even once they’re far apart on different lattice sites, the two electrons can’t be treated individually, so physicists must deal with all the electrons at once rather than one at a time. With more electrons, more entanglements crop up, making the computational challenge exponentially harder.

One way of studying a quantum system is by using what’s called a renormalization group. That’s a mathematical apparatus physicists use to look at how the behavior of a system—such as the Hubbard model—changes when scientists modify properties such as temperature or look at the properties on different scales. Unfortunately, a renormalization group that keeps track of all possible couplings between electrons can contain tens of thousands, hundreds of thousands or even millions of individual equations that need to be solved. On top of that, the equations are tricky: Each represents a pair of electrons interacting.

Di Sante and his colleagues wondered if they could use a machine learning tool known as a neural network to make the renormalization group more manageable. The neural network is like a cross between a frantic switchboard operator and survival-of-the-fittest evolution. First, the machine learning program creates connections within the full-size renormalization group. The neural network then tweaks the strengths of those connections until it finds a small set of equations that generates the same solution as the original, jumbo-size renormalization group. The program’s output captured the Hubbard model’s physics even with just four equations.

“It’s essentially a machine that has the power to discover hidden patterns,” Di Sante says.

The work, published in the September 23 issue of Physical Review Letters, could revolutionize how quantum scientists investigate systems containing many interacting electrons. Moreover, if scalable to other problems, the approach could potentially aid in the design of materials with sought-after properties such as superconductivity or utility for clean energy generation.

Links:

Microsoft can synthesize your voice with just a 3 second clip

Microsoft can synthesize your voice with just a 3 second clip

Microsoft researchers announced a new text-to-speech AI model called VALL-E that can closely simulate a person’s voice when given a three-second audio sample. Once it learns a specific voice, VALL-E can synthesize audio of that person saying anything—and do it in a way that attempts to preserve the speaker’s emotional tone and background environmental noise balance.

The scientists also note that since VALL-E could synthesize speech that maintains speaker identity, it may carry potential risks in misuse of the model, such as spoofing voice identification or impersonating a specific speaker.

You can find audio samples and the paper here.

It sure would make breaking into Werner Brandes office a lot easier (1992 movie Sneakers) than convincing your friend to record snippets of a really terrible date.