Browsed by
Month: February 2023

Christmas cakes of Japan

Christmas cakes of Japan

I’ve gone on big vacations in Japan several times now – and I’m always astounded at the artistry and sublime tastes of their food and pastries. In recent years, this artistry and their masterful creations have graced the tradition of Japanese Christmas cakes.

The history of the Christmas cake in Japan started in the waning days of the Meiji period. In 1910, Fujiya, a European-style pastry shop in Tokyo’s port city of Yokohama, introduced what is widely considered to be the very first Japanese Christmas cake. According to a representative from Fujiya’s PR department, “the base of the cake was a rich, liqueur-soaked fruitcake” in the European style. But the bakers considered its plain brown appearance not eye-catching enough, so they decorated it with snow-white royal icing, complete with little Christmas trees. Over the next decade, bakers around the country decorated their Christmas desserts with strawberries after growing methods made them available in December.

Today, Christmas cake is synonymous with strawberry shortcake, a light and fluffy confection with alternating layers of soft sponge and delicate whipped cream, topped with perfectly sweet fresh strawberries. Some of the most amazing creations are found in the highest end hotels and come at astounding prices (the Renne (‘reindeer’ in French) cake shown above from the Palace Hotel Tokyo is topped with a tall sculpted cone depicting reindeer antlers. The cake and the cone collectively are about a foot wide and 20 inches high, contains more than 100 perfect strawberries, and sessl for the hefty price of 70,000 yen ($640))

After fried chicken, Christmas cake is the most popular food consumed during Japan’s yuletide season. In 1997, strawberry shortcake was immortalized by the Japanese software company SoftBank in what is arguably the world’s first set of emojis about the confection.

Links:

Stable diffusion feature showcase

Stable diffusion feature showcase

Having trouble understanding all the knobs in stable diffusion’s webui interface? This is a great website on github that shows what each feature does and gives some examples and tips on getting the most out of the features.

It covers all the big features like outpainting, inpainting, prompt matrix, AI upscaling, attention, loopback, X/Y plot, textual inversion, resizing, sampling method selection, seed resize, variations, and a whole host of all the other options along with before/after pictures to help you understand the features better.

Links:

Stable diffusion high quality prompt thought process

Stable diffusion high quality prompt thought process

Content warning: Some of the links have some moderately NSFW pictures. There is no outright nudity, but it does deal with generating rather busty images. This article should be fine, but be aware following the links.

While this guide is translated from a Japanese source and uses the Waifu/Danbooru model to generate more anime-looking models, it works really well for generating ultra-realistic Victorian pictures using stable diffusion’s standard 1.5 model. Here’s some I made using his prompt with just 30 minutes of experimenting:

Fair warning, the original author is trying to generate more…busty women that look better as anime characters under the Waifu model. I won’t comment on his original purpose, but I thought this was an interesting description of how a ‘prompt engineer’ moved from an idea to generating a stable diffusion prompt.

First he started with a good description of what he wanted:

I want a VICTORIAN GIRL in a style of OIL PAINTING
Eye and Face are important in art so she must have PERFECT FACESEXY FACE and her eye have DETAILED PUPILS
I want she to have LARGE BREASTTONED ABS and THICK THIGH.
She must look FEMININE doing EVOCATIVE POSESMIRK and FULL BODY wearing NIGHT GOWN
The output must be INTRICATEHIGH DETAILSHARP
And in the style of {I’m not give out the artist names to avoid trouble. Apologize.}

This lead him to generate the following prompt. Note his use of parenthesize () to add emphasis, and terms inside square brackets [] to minimize the direction.

Prompt :
VICTORIAN GIRL,FEMININE,((PERFECT FACE)),((SEXY FACE)),((DETAILED PUPILS)).(ARTIST),ARTIST,ARTIST,(ARTIST). OIL PAINTING. (((LARGE BREAST)),((TONED ABS)),(THICK THIGH).EVOCATIVE POSE, SMIRK,LOOK AT VIEWER, ((BLOUSE)).(INTRICATE),(HIGH DETAIL),SHARP

Unfortunately, you don’t need to experiment for long to realize stable diffusion needs a lot of help with anatomy. It often generates nightmare fuel images that that have multiple heads, messed up arms, hands with too many fingers, eyes with terrifying pupils (or no pupils), too many limbs – well, you get the idea. So you need to make sure those things don’t show up by banning them via setting the negative prompts (again, not commenting on original purpose):

Negative Prompt :
((nipple)), ((((ugly)))), (((duplicate))), ((morbid)), ((mutilated)), (((tranny))), (((trans))), (((transsexual))), (hermaphrodite), [out of frame], extra fingers, mutated hands, ((poorly drawn hands)), ((poorly drawn face)), (((mutation))), (((deformed))), ((ugly)), blurry, ((bad anatomy)), (((bad proportions))), ((extra limbs)), cloned face, (((disfigured))). (((more than 2 nipples))). [[[adult]]], out of frame, ugly, extra limbs, (bad anatomy), gross proportions, (malformed limbs), ((missing arms)), ((missing legs)), (((extra arms))), (((extra legs))), mutated hands, (fused fingers), (too many fingers), (((long neck)))

Finally, he made these settings to the stable diffusion settings. Note you want to keep the aspect ratio in a portrait-like format (768 tall x 512 wide). Going taller can result in multiple heads, going wider can result in more than one person in the scene.

Restore Face: ON
Steps: 42
Sampler: DDIM
CFG scale: 10
Height: 768
Width: 512

Links:

Unreal Engine 5 Metahuman and Stable Diffusion

Unreal Engine 5 Metahuman and Stable Diffusion

CoffeeVectors wanted to see what kind of workflow he could build bridging Epic’s MetaHuman with AI image synthesis platforms like Stable Diffusion. His experiments were really interesting and instructive to other beginners.

He started with a simple face generated by stable diffusion:

He then fed that image into Metahuman as a new starting point and cycled through a few generations. With each cycle, you can change the prompt and steer things in slightly different directions. It becomes less about a single initial prompt and more about understanding/modifying the larger system of settings interacting with each other. The results were actually quite good:

I thought he had a good observation on creating prompts here:

Don’t get tunnel vision on the prompts. There’s more to a car than the engine. Prompts are important but they’re not everything. With platforms like Dall-E 2 where underlying variables aren’t exposed, the prompts do play a dominant role.

But with Stable Diffusion and Midjourney, there are more controls available to you that affect the output. If you’re not getting what you want from prompts alone in Stable Diffusion, for instance, it could be because you need to shop around the sampler methods and CFG Scale values. Even the starting resolution affects the images you get because it changes the initial noise pattern

Definitely worth the read if you’re interested.

Links:

Expanding and enhancing Stable Diffusion with specialized models

Expanding and enhancing Stable Diffusion with specialized models

Now that you have Stable Diffusion 1.5 installed on your local system, have learned how to make cool generative prompts, it might be time to take the next step of trying different latent models.

There is more than one model out there for stable diffusion, and they can generate vastly different images:

Check out this article to learn how to install and use different popular models you can use with stable diffusion:

  • F222 – People found it useful in generating beautiful female portraits with correct body part relations. It’s quite good at generating aesthetically pleasing clothing.
  • Anything V3 – a special-purpose model trained to produce high-quality anime-style images. You can use danbooru tags (like 1girl, white hair) in text prompt.
  • Open Journey – a model fine-tuned with images generated by Mid Journey v4.
  • DreamShaper – model is fine-tuned for portrait illustration style that sits between photorealistic and computer graphics
  • Waifu-diffusion – Japanese anime style
  • Arcane Diffusion – TV show Arcane style
  • Robo Diffusion – Interesting robot style model that will turn everything your subject into robot
  • Mo-di-diffusion – Generate Pixar-like style models
  • Inkpunk Diffusion – Generate images in a unique illustration style
Better stable diffusion and AI generated art prompts

Better stable diffusion and AI generated art prompts

Now that you have stable diffusion on your system, how do you start taking advantage of it?

One way is to try some sample prompts to start with. Techspot has some good ones (halfway through the article) to whet your appetite.

You can get inspiration by looking at good examples on free public prompt marketplaces.

Then you might want to learn how to fix some common problems.

When you’re really ready to dive in, this article from Metaverse gives you a list of excellent getting started guides to help get you from beginner to proficient in generating your own awesome art.

The key to it all is learning the syntax, parameters, and art of crafting AI prompts. It’s as much art as it is science. It’s complex enough that there are everything from beginner examples, free guides, tools to help, all the way to paid marketplaces.

Learning gotten a lot better in the last 6 months since people started learning how to use AI generated prompts last year.

Installing Stable Diffusion 1.5

Installing Stable Diffusion 1.5

To install Stable Diffusion 1.5 (released Oct 20, 2022) locally, I found this video was really excellent – except for a few points:

  1. You MUST use python 3.10.6 (I used 3.9.7 as recommended). The latest version (as of Feb 2023) is Python 3.11.1 – which stable diffusion does NOT seem to like and won’t run.

You might also want to read through this older stable diffusion 1.4 install guide, but he uses model checkpoints which haven’t been updated since version 1.4.

Gotchas and Fixes:

  • If you have an incompatible version of Python installed when you try to run webui-user.bat for the first time, stable diffusion will set itself up to point at this bad python version directory. Even if you uninstall and install the correct python version, stable diffusion will still look at the wrong python version. You can go fiddle with the different setup files – but it’s faster just to blow away the pulled git source at the top level and re-pull it to ensure you don’t have cruft laying around.

Installer Links:

Stable diffusion 2.0 was…well…

Stable diffusion 2.0 was…well…

Stable Diffusion 2.0 seems to have been a step backwards in capabilities and quality. Many people went back to v1.5 for their business.

The difficulty in 2.0 was in part caused by:

  1. Using a new language model that is trained from scratch
  2. The training dataset was heavily censored with a NSFW filter

The second part would have been fine, but the filter was quite inclusive and has removed substantial amount of good-quality data. 2.1 promised to bring the good data back.

Installing Stable Diffusion 2.1

If you’re interested in trying Stable Diffusion 2.1, use this tutorial to installing and use 2.1 models in AUTOMATIC1111 GUI, so you can make your judgement by using it.

You might also try this tutorial by TingTing

Links:

AI generated comic books

AI generated comic books

There’s a creative war going on surrounding AI generated art. While some are fighting AI generated art, others are fully embracing it.

AI Comic Books is a whole website/company dedicated to publishing comic books that rely on AI generated art. Check out the offerings on their website to see where the state of graphic novels is going.

This definitely spawns some discussions on where AI art is going to find it’s place in society. I think the cat is out of the bag; and now we’ll have to deal with the economic and moral questions it is generating; but I think that’s a discussion for another article…

Stable diffusion in other languages

Stable diffusion in other languages

Stable Diffusion was developed by CompVisStability AI, and LAION. It mainly uses the English subset LAION2B-en of the LAION-5B dataset for its training data and, as a result, requires English text prompts to producing images.

This means that the tagging and correlating of images and text are based on English tagged data sets – which naturally tend to come from English-speaking sources and regions. Users that use other languages must first use a translator from their native language to English – which often loses the nuances or even core meaning. On top of that, it also means the latent model images Stable Diffusion can use are usually limited to English-speaking region sources.

For example, one of the more common Japanese terms re-interpreted from the English word businessman is “salary man” which we most often imagine as a man wearing a suit. You would get results that look like this, which might not be very useful if you’re trying to generate images for a Japanese audience.

rinna Co., Ltd. has developed a Japanese-specific text-to-image model named “Japanese Stable Diffusion”. Japanese Stable Diffusion accepts native Japanese text prompts and generates images that reflect the naming and tagged pictures of the Japanese-speaking world which may be difficult to express through translation and whose images may simply not present in the western world. Their new text-to-image model was trained on source material that comes directly from Japanese culture, identity, and unique expressions – including slang.

They did this by using a two step approach that is instructive on how stable diffusion works.

First, the latent diffusion model is left alone and they replaced the English text encoder with a Japanese-specific text encoder. This allowed the text encoder to understand Japanese natively, but would still generate western style tagged images because the latent model remained intact. This was still better than just translating the stable diffusion prompt.

Now Stable Diffusion could understand what the concept of a ‘businessman’ was but it still generated images of decidedly western looking businessmen because the underlying latent diffusion model had not been changed:

The second step was to retrain the the latent diffusion model from more Japanese tagged data sources with the new text encoder. This stage was essential to make the model become more language-specific. After this, the model could finally generate businessmen with the Japanese faces they would have expected:

Read more about it on the links below.

Links: