Browsed by
Category: Art+Design

Installing Stable Diffusion 2.0/2.1

Installing Stable Diffusion 2.0/2.1

Stable Diffusion 2.0 was largely seen as a dud. Past version 1.5 you should be aware that the outcry of various artists against having their works sampled resulted in the 2.x branches trying to use less of these public sources. This means it has a more limited training set and likely more limited output variety.

If you are interested in trying Stable Diffusion 2.1, use this tutorial to installing and use 2.1 models in AUTOMATIC1111 GUI, so you can make your judgement by using it.

Here are 2 different Stable Diffusion 2.1 tutorials:

You might also try this tutorial by TingTing

https://youtube.com/watch?v=cFgmXLLnHp0%3Fversion%3D3%26rel%3D1%26showsearch%3D0%26showinfo%3D1%26iv_load_policy%3D1%26fs%3D1%26hl%3Den-US%26autohide%3D2%26wmode%3Dtransparent
Retro games with modern graphics – using AI

Retro games with modern graphics – using AI

We’re already seeing a real revolutions in retro gaming via emulation. Preservation of old hardware is important, but it’s also seen as almost impossible task as devices mass produced to only last 5-10 years in the consumer market reach decades of age. Failure rates will eventually reach 100% over enough time (unless people re-create the hardware). But with modern emulators, you can still play all the different games on modern hardware.

On a separate development note, we’ve also seen graphics effects like anti-aliasing and upscaling get the AI treatment. Instead of hand-coded anti-aliasing kernels, they can be generated automatically by AI and the results are now included in all major hardware vendors.

But what about the very graphics content itself? Retro game art has it’s own charm, but what if we gave it the AI treatment too?

Jay Alammar wanted to see what he could achieve by pumping in some retro game graphics from the MSX game Nemesis 2 (Gradius) into Stable Diffusion, Dall-E, and Midjourney art generators. He presents a lot of interesting experiments and conclusions. He used various features like in-painting, out-painting, Dream Studio and all kinds of other ideas to see what he could come up with.

The hand-picked results were pretty great:

He even went so far as to convert the original opening sequence to use the new opening graphics here:

I think this opens up a whole new idea. What if you replaced the entire game graphics elements with updated AI graphics? The results would essentially just become a themed re-skinning with no gameplay (or even level changes), but this definitely brings up the idea of starting your re-theming for new levels (fire levels, ice levels, space levels, etc) by auto-generating the graphics.

Then it brings up the non-art idea of re-theming the gameplay itself – possibly using AI generated movement or gameplay rules. Friction, gravity, jump height, etc – could all be given different models (Mario style physics, Super Meat Boy physics, slidy ice-level physics) and then let the AI come up with the gravity, bounce, jump parameters.

Interesting times…

Links:

Shrinking 4 years to 4 days with AI generated music video

Shrinking 4 years to 4 days with AI generated music video

Photographer and filmmaker Nicholas Kouros spent “hundreds of hours” over 4 years creating a stop-motion meme-themed music video using paper prints and cutouts for a song called Ruined by the metal band Blame Kandinsky. He then created a new version using AI – in 4 days.

The work on the original physical shoot was intense:

“Cutting out all individual pieces was a serious task. Some of the setups were so labor-intensive, I had friends over for days to help out,” says Kouros.

“Every piece was then assembled using various methods, such as connecting through rivets and hinges. We shot everything at 12fps using Dragonframe on a DIY rostrum setup with a mirrorless Sony a7S II and a Zeiss ZE f/2 50mm Macro-Planar lens.”

In a move that likely avoided copyright issues, he used freely usable images. “Most of Ruined was made using public domain paintings and art found on museum websites like Rijks or the Met

After everything had been shot, the RAW image sequences were imported to After Effects and later graded in DaVinci Resolve.

Using AI instead

Kouros then created a second music video but this time he used AI. The video took a fraction of the time to make. “In direct contrast with my previous work for the same band, Vague by Blame Kandinsky, it took a little over four days of experimenting, used a single line of AI text prompting, and 20 hours of rendering,”

“The text prompt line used was: ‘Occult Ritual, Rosemary’s Baby Scream, Flemish renaissance, painting by Robert Crumb, Death.’”

Kouros describes his experience with AI as “fun” and was impressed with the results that the image synthesizer gave him.

What was his final take?

“In my opinion, this specific style of animation won’t stand the test of time, but it will probably be a reminder of times before this AI thing really took off.

I embrace new tech as it comes along and I have already started making images with the aid of image generators.
I’ve actually learned more about art history in this last year using AI, than in seven years of art schools.”

Links:

Amazing balance

Amazing balance

World Dance New York seems to teach some really high-end classes from Hip-hop, Flamenco, Samba, Fire dancing, and belly dance all the way to prenatal and even self defense.

They publish some really high quality performances like this one that shows a great combination of class and artistry. How she can dance while balancing a sword on her head is astounding.

Projection Mapping with MadMapper

Projection Mapping with MadMapper

CETI (Creative and Emergent Technology Institute) is a local creative group that experiments with different technologies for creating unique experiences. Sarah Turner is a local artist that has been experimenting with different media and video technologies – which she calls the Mobile Projection Unit. This project has set up projection mapping displays at a number of different art and media festivals.

In this video she goes over some of the things she’s learned from these projection mapping setups:

Auto-generation of 3D models from text

Auto-generation of 3D models from text

I’ve already written about nVidia’s GET3D code that can generate a wide variety of 3D objects using AI trained networks. These objects, however, are more finely tuned to generate specific objects (chairs, cars, etc). This requires a large labeled 3D dataset. nVidia provides simple ones, but if you want them to generate specific kinds of styles or from different eras (only 50’s era cars, only 1800’s style furnature), you’ll need to collect, label, and train the model for that.

There’s another player in town called DreamFusion that goes a slightly different direction. Some Google and a UC Berkley researchers are using a similar method to generate 3D models from text. This gets around the problem of needing lots of pre-trained data by using images generated from 2D text-to-image diffusion models (like Stable Diffusion, DALL-E, and MidJourney). They developed an error/loss metric that they then use to evaluate the generated 2D images and potential for 3D generation and then do so. They come up with some astounding results.

There is also a paper by Nikolay Jetchev called ClipMatrix that attempts the same text-to-2D-to-3D generation. He also seems to be experimenting with animations and something called VolumeCLIP that does ray-casing.

This kind of end-to-end workflow pipeline is exactly the kind of content makers want. Unfortunately, it also means that it could likely decimate an art department. This kind of technology could easily be used to fill the non-critical areas of ever-expanding 3D worlds in games and VR with very minimal effort or cost. In theory, it could even be done pseudo-realtime. Imagine worlds in which you can walk in any direction – forever – and see constantly new locations and objects.

Links:

CLIPMatrix and VolumeCLIP AI based 3D model generation

CLIPMatrix and VolumeCLIP AI based 3D model generation

As I mentioned in my previous article, there is a paper by Nikolay Jetchev called ClipMatrix that attempts to generate 3D models from 2D images that are generated by text-to-image diffusion models (like stable diffusion, DALL-E, MidJourney, etc). A list of his other papers can be found here.

He now seems to be working on auto-generated models that are animated automatically. (Content note: he seems to love to generate content based on heavy metal lyrics, demons, and other fantastical creations that I don’t think demonstrate this could work on more ‘normal’ looking models):

Originally tweeted by Nikolay Jetchev (@NJetchev) on March 10, 2022.

In looking at his Twitter stream, he also seems to be working on a version called VolumeCLIP that appears to generate voxel objects he can ray-cast into..

“The Fire Dwarf Blacksmith”

Originally tweeted by Nikolay Jetchev (@NJetchev) on January 26, 2023.

Christmas cakes of Japan

Christmas cakes of Japan

I’ve gone on big vacations in Japan several times now – and I’m always astounded at the artistry and sublime tastes of their food and pastries. In recent years, this artistry and their masterful creations have graced the tradition of Japanese Christmas cakes.

The history of the Christmas cake in Japan started in the waning days of the Meiji period. In 1910, Fujiya, a European-style pastry shop in Tokyo’s port city of Yokohama, introduced what is widely considered to be the very first Japanese Christmas cake. According to a representative from Fujiya’s PR department, “the base of the cake was a rich, liqueur-soaked fruitcake” in the European style. But the bakers considered its plain brown appearance not eye-catching enough, so they decorated it with snow-white royal icing, complete with little Christmas trees. Over the next decade, bakers around the country decorated their Christmas desserts with strawberries after growing methods made them available in December.

Today, Christmas cake is synonymous with strawberry shortcake, a light and fluffy confection with alternating layers of soft sponge and delicate whipped cream, topped with perfectly sweet fresh strawberries. Some of the most amazing creations are found in the highest end hotels and come at astounding prices (the Renne (‘reindeer’ in French) cake shown above from the Palace Hotel Tokyo is topped with a tall sculpted cone depicting reindeer antlers. The cake and the cone collectively are about a foot wide and 20 inches high, contains more than 100 perfect strawberries, and sessl for the hefty price of 70,000 yen ($640))

After fried chicken, Christmas cake is the most popular food consumed during Japan’s yuletide season. In 1997, strawberry shortcake was immortalized by the Japanese software company SoftBank in what is arguably the world’s first set of emojis about the confection.

Links:

Stable diffusion high quality prompt thought process

Stable diffusion high quality prompt thought process

Content warning: Some of the links have some moderately NSFW pictures. There is no outright nudity, but it does deal with generating rather busty images. This article should be fine, but be aware following the links.

While this guide is translated from a Japanese source and uses the Waifu/Danbooru model to generate more anime-looking models, it works really well for generating ultra-realistic Victorian pictures using stable diffusion’s standard 1.5 model. Here’s some I made using his prompt with just 30 minutes of experimenting:

Fair warning, the original author is trying to generate more…busty women that look better as anime characters under the Waifu model. I won’t comment on his original purpose, but I thought this was an interesting description of how a ‘prompt engineer’ moved from an idea to generating a stable diffusion prompt.

First he started with a good description of what he wanted:

I want a VICTORIAN GIRL in a style of OIL PAINTING
Eye and Face are important in art so she must have PERFECT FACESEXY FACE and her eye have DETAILED PUPILS
I want she to have LARGE BREASTTONED ABS and THICK THIGH.
She must look FEMININE doing EVOCATIVE POSESMIRK and FULL BODY wearing NIGHT GOWN
The output must be INTRICATEHIGH DETAILSHARP
And in the style of {I’m not give out the artist names to avoid trouble. Apologize.}

This lead him to generate the following prompt. Note his use of parenthesize () to add emphasis, and terms inside square brackets [] to minimize the direction.

Prompt :
VICTORIAN GIRL,FEMININE,((PERFECT FACE)),((SEXY FACE)),((DETAILED PUPILS)).(ARTIST),ARTIST,ARTIST,(ARTIST). OIL PAINTING. (((LARGE BREAST)),((TONED ABS)),(THICK THIGH).EVOCATIVE POSE, SMIRK,LOOK AT VIEWER, ((BLOUSE)).(INTRICATE),(HIGH DETAIL),SHARP

Unfortunately, you don’t need to experiment for long to realize stable diffusion needs a lot of help with anatomy. It often generates nightmare fuel images that that have multiple heads, messed up arms, hands with too many fingers, eyes with terrifying pupils (or no pupils), too many limbs – well, you get the idea. So you need to make sure those things don’t show up by banning them via setting the negative prompts (again, not commenting on original purpose):

Negative Prompt :
((nipple)), ((((ugly)))), (((duplicate))), ((morbid)), ((mutilated)), (((tranny))), (((trans))), (((transsexual))), (hermaphrodite), [out of frame], extra fingers, mutated hands, ((poorly drawn hands)), ((poorly drawn face)), (((mutation))), (((deformed))), ((ugly)), blurry, ((bad anatomy)), (((bad proportions))), ((extra limbs)), cloned face, (((disfigured))). (((more than 2 nipples))). [[[adult]]], out of frame, ugly, extra limbs, (bad anatomy), gross proportions, (malformed limbs), ((missing arms)), ((missing legs)), (((extra arms))), (((extra legs))), mutated hands, (fused fingers), (too many fingers), (((long neck)))

Finally, he made these settings to the stable diffusion settings. Note you want to keep the aspect ratio in a portrait-like format (768 tall x 512 wide). Going taller can result in multiple heads, going wider can result in more than one person in the scene.

Restore Face: ON
Steps: 42
Sampler: DDIM
CFG scale: 10
Height: 768
Width: 512

Links:

Unreal Engine 5 Metahuman and Stable Diffusion

Unreal Engine 5 Metahuman and Stable Diffusion

CoffeeVectors wanted to see what kind of workflow he could build bridging Epic’s MetaHuman with AI image synthesis platforms like Stable Diffusion. His experiments were really interesting and instructive to other beginners.

He started with a simple face generated by stable diffusion:

He then fed that image into Metahuman as a new starting point and cycled through a few generations. With each cycle, you can change the prompt and steer things in slightly different directions. It becomes less about a single initial prompt and more about understanding/modifying the larger system of settings interacting with each other. The results were actually quite good:

I thought he had a good observation on creating prompts here:

Don’t get tunnel vision on the prompts. There’s more to a car than the engine. Prompts are important but they’re not everything. With platforms like Dall-E 2 where underlying variables aren’t exposed, the prompts do play a dominant role.

But with Stable Diffusion and Midjourney, there are more controls available to you that affect the output. If you’re not getting what you want from prompts alone in Stable Diffusion, for instance, it could be because you need to shop around the sampler methods and CFG Scale values. Even the starting resolution affects the images you get because it changes the initial noise pattern

Definitely worth the read if you’re interested.

Links: