Browsed by
Author: matt

Zig Ziglar and the power of setting goals

Zig Ziglar and the power of setting goals

“Give me a stock clerk with a goal and I’ll give you a man that can make history. Give me a man without a goal, and I’ll give you a stock clerk.” – J.C. Penny

My dad almost always had a Zig Ziglar tape playing in the car when I was growing up. These simple principles of setting and achieving goals have stuck with me almost my entire life – and have lead me along a tremendously successful career that I could have never imagined as a small kid growing up in rural Indiana.

I can attest that by following his simple and time proven method of goal-setting, I have achieved many of the biggest goals I have wanted in my life. Give it a listen.

nVidia GPU’s top the Stable Diffusion performance charts

nVidia GPU’s top the Stable Diffusion performance charts

Toms Hardware did a great benchmarking test on which GPU’s do the best on Stable Diffusion.

They tried a number of different combinations and experiments such as changing the sampling algorithms (though they didn’t make much difference in performance), output size, etc. I wish, however, they discussed and compared the differences in memory sizes on these cards more clearly. Stable Diffusion is a memory hog, and having more memory definitely helps. They also didn’t check any of the ‘optimized models’ that allow you to run stable diffusion on as little as 4GB of VRAM.

There were some fun anomalies – like the RTX 2080 Ti often outperforming the RTX 3080 Ti.

AMD and Intel cards seem to be leaving a lot of performance on the table because their hardware should be able to do better than it is currently doing. Arc GPU’s matrix cores should provide similar performance to the RTX 3060 Ti and RX 7900 XTX, give or take, with the A380 down around the RX 6800. In practice, Arc GPUs are nowhere near those marks. This doesn’t shock me personally since nVidia has been much more invested and in the forefront of developing and optimizing AI libraries.

lekktor Demoscene compressor

lekktor Demoscene compressor

The 90’s Demoscene subculture was famous for building incredible visual demos in astoundingly small executables sizes. Many demo scene gatherings had maximum size requirements – often in just a few hundred, or even a few DOZEN, kilobytes. Figuring out how to get the most amazing tech in the smallest size was one of the great innovation points for these contests.

Once developers had exhausted their technical chops on generating amazing art with miniscule code (using all the tricks in the book they could think of), they quickly found that hand-tuned compression became far too tedious and brittle. So, they started building tools to do this compression for them.

I wrote about a more modern take on this where MattKC tried to fit an entire game into a QR code. Part of his adventure was compressing the executable using an old demoscene tool called Crinkler.

There were others, one of which was called lekktor which was used first on their .kkrieger demo. The story behind the development is a fun read as is an interview he did in 2005.

Apparently it used a form of code coverage as part of the analysis which ran while you ran the application. This had the dubious effect of allowing people to use the down arrow down on menus but not the up arrow – because nobody ever pressed the up arrow when training the compressor.

Links:

Auto-generation of 3D models from text

Auto-generation of 3D models from text

I’ve already written about nVidia’s GET3D code that can generate a wide variety of 3D objects using AI trained networks. These objects, however, are more finely tuned to generate specific objects (chairs, cars, etc). This requires a large labeled 3D dataset. nVidia provides simple ones, but if you want them to generate specific kinds of styles or from different eras (only 50’s era cars, only 1800’s style furnature), you’ll need to collect, label, and train the model for that.

There’s another player in town called DreamFusion that goes a slightly different direction. Some Google and a UC Berkley researchers are using a similar method to generate 3D models from text. This gets around the problem of needing lots of pre-trained data by using images generated from 2D text-to-image diffusion models (like Stable Diffusion, DALL-E, and MidJourney). They developed an error/loss metric that they then use to evaluate the generated 2D images and potential for 3D generation and then do so. They come up with some astounding results.

There is also a paper by Nikolay Jetchev called ClipMatrix that attempts the same text-to-2D-to-3D generation. He also seems to be experimenting with animations and something called VolumeCLIP that does ray-casing.

This kind of end-to-end workflow pipeline is exactly the kind of content makers want. Unfortunately, it also means that it could likely decimate an art department. This kind of technology could easily be used to fill the non-critical areas of ever-expanding 3D worlds in games and VR with very minimal effort or cost. In theory, it could even be done pseudo-realtime. Imagine worlds in which you can walk in any direction – forever – and see constantly new locations and objects.

Links:

CLIPMatrix and VolumeCLIP AI based 3D model generation

CLIPMatrix and VolumeCLIP AI based 3D model generation

As I mentioned in my previous article, there is a paper by Nikolay Jetchev called ClipMatrix that attempts to generate 3D models from 2D images that are generated by text-to-image diffusion models (like stable diffusion, DALL-E, MidJourney, etc). A list of his other papers can be found here.

He now seems to be working on auto-generated models that are animated automatically. (Content note: he seems to love to generate content based on heavy metal lyrics, demons, and other fantastical creations that I don’t think demonstrate this could work on more ‘normal’ looking models):

Originally tweeted by Nikolay Jetchev (@NJetchev) on March 10, 2022.

In looking at his Twitter stream, he also seems to be working on a version called VolumeCLIP that appears to generate voxel objects he can ray-cast into..

“The Fire Dwarf Blacksmith”

Originally tweeted by Nikolay Jetchev (@NJetchev) on January 26, 2023.

My heads are gone!

My heads are gone!

Are you losing the heads of those images you’re generating in stable diffusion?

Try adding these keywords to your prompt:

  • “A view of”
  • “A scene of”
  • “Viewed from a distance”
  • “Standing on a “
  • “longshot”, “full shot”, “wideshot”, “extreme wide shot”, “full body”
  • start the prompt with “Head, face, eyes”
  • Try adjusting the aspect ratio of the image to be taller instead of wider. Be careful not to go too tall (or two wide) or you’ll get the double-head or start generating combinations of two people.
  • The source material has been scanned in a taller aspect ratio, try adjusting the x-side of your ratio
  • Use img2img on a crop that includes part of the chest to make it match the rest of the drawing
  • Cinematography terms tend to work well. In order of close to far: Extreme close-up, close-up, medium close-up, medium shot, medium full shot, full shot, long shot, extreme long shot.

Links:

Christmas cakes of Japan

Christmas cakes of Japan

I’ve gone on big vacations in Japan several times now – and I’m always astounded at the artistry and sublime tastes of their food and pastries. In recent years, this artistry and their masterful creations have graced the tradition of Japanese Christmas cakes.

The history of the Christmas cake in Japan started in the waning days of the Meiji period. In 1910, Fujiya, a European-style pastry shop in Tokyo’s port city of Yokohama, introduced what is widely considered to be the very first Japanese Christmas cake. According to a representative from Fujiya’s PR department, “the base of the cake was a rich, liqueur-soaked fruitcake” in the European style. But the bakers considered its plain brown appearance not eye-catching enough, so they decorated it with snow-white royal icing, complete with little Christmas trees. Over the next decade, bakers around the country decorated their Christmas desserts with strawberries after growing methods made them available in December.

Today, Christmas cake is synonymous with strawberry shortcake, a light and fluffy confection with alternating layers of soft sponge and delicate whipped cream, topped with perfectly sweet fresh strawberries. Some of the most amazing creations are found in the highest end hotels and come at astounding prices (the Renne (‘reindeer’ in French) cake shown above from the Palace Hotel Tokyo is topped with a tall sculpted cone depicting reindeer antlers. The cake and the cone collectively are about a foot wide and 20 inches high, contains more than 100 perfect strawberries, and sessl for the hefty price of 70,000 yen ($640))

After fried chicken, Christmas cake is the most popular food consumed during Japan’s yuletide season. In 1997, strawberry shortcake was immortalized by the Japanese software company SoftBank in what is arguably the world’s first set of emojis about the confection.

Links:

Stable diffusion feature showcase

Stable diffusion feature showcase

Having trouble understanding all the knobs in stable diffusion’s webui interface? This is a great website on github that shows what each feature does and gives some examples and tips on getting the most out of the features.

It covers all the big features like outpainting, inpainting, prompt matrix, AI upscaling, attention, loopback, X/Y plot, textual inversion, resizing, sampling method selection, seed resize, variations, and a whole host of all the other options along with before/after pictures to help you understand the features better.

Links:

Stable diffusion high quality prompt thought process

Stable diffusion high quality prompt thought process

Content warning: Some of the links have some moderately NSFW pictures. There is no outright nudity, but it does deal with generating rather busty images. This article should be fine, but be aware following the links.

While this guide is translated from a Japanese source and uses the Waifu/Danbooru model to generate more anime-looking models, it works really well for generating ultra-realistic Victorian pictures using stable diffusion’s standard 1.5 model. Here’s some I made using his prompt with just 30 minutes of experimenting:

Fair warning, the original author is trying to generate more…busty women that look better as anime characters under the Waifu model. I won’t comment on his original purpose, but I thought this was an interesting description of how a ‘prompt engineer’ moved from an idea to generating a stable diffusion prompt.

First he started with a good description of what he wanted:

I want a VICTORIAN GIRL in a style of OIL PAINTING
Eye and Face are important in art so she must have PERFECT FACESEXY FACE and her eye have DETAILED PUPILS
I want she to have LARGE BREASTTONED ABS and THICK THIGH.
She must look FEMININE doing EVOCATIVE POSESMIRK and FULL BODY wearing NIGHT GOWN
The output must be INTRICATEHIGH DETAILSHARP
And in the style of {I’m not give out the artist names to avoid trouble. Apologize.}

This lead him to generate the following prompt. Note his use of parenthesize () to add emphasis, and terms inside square brackets [] to minimize the direction.

Prompt :
VICTORIAN GIRL,FEMININE,((PERFECT FACE)),((SEXY FACE)),((DETAILED PUPILS)).(ARTIST),ARTIST,ARTIST,(ARTIST). OIL PAINTING. (((LARGE BREAST)),((TONED ABS)),(THICK THIGH).EVOCATIVE POSE, SMIRK,LOOK AT VIEWER, ((BLOUSE)).(INTRICATE),(HIGH DETAIL),SHARP

Unfortunately, you don’t need to experiment for long to realize stable diffusion needs a lot of help with anatomy. It often generates nightmare fuel images that that have multiple heads, messed up arms, hands with too many fingers, eyes with terrifying pupils (or no pupils), too many limbs – well, you get the idea. So you need to make sure those things don’t show up by banning them via setting the negative prompts (again, not commenting on original purpose):

Negative Prompt :
((nipple)), ((((ugly)))), (((duplicate))), ((morbid)), ((mutilated)), (((tranny))), (((trans))), (((transsexual))), (hermaphrodite), [out of frame], extra fingers, mutated hands, ((poorly drawn hands)), ((poorly drawn face)), (((mutation))), (((deformed))), ((ugly)), blurry, ((bad anatomy)), (((bad proportions))), ((extra limbs)), cloned face, (((disfigured))). (((more than 2 nipples))). [[[adult]]], out of frame, ugly, extra limbs, (bad anatomy), gross proportions, (malformed limbs), ((missing arms)), ((missing legs)), (((extra arms))), (((extra legs))), mutated hands, (fused fingers), (too many fingers), (((long neck)))

Finally, he made these settings to the stable diffusion settings. Note you want to keep the aspect ratio in a portrait-like format (768 tall x 512 wide). Going taller can result in multiple heads, going wider can result in more than one person in the scene.

Restore Face: ON
Steps: 42
Sampler: DDIM
CFG scale: 10
Height: 768
Width: 512

Links:

Unreal Engine 5 Metahuman and Stable Diffusion

Unreal Engine 5 Metahuman and Stable Diffusion

CoffeeVectors wanted to see what kind of workflow he could build bridging Epic’s MetaHuman with AI image synthesis platforms like Stable Diffusion. His experiments were really interesting and instructive to other beginners.

He started with a simple face generated by stable diffusion:

He then fed that image into Metahuman as a new starting point and cycled through a few generations. With each cycle, you can change the prompt and steer things in slightly different directions. It becomes less about a single initial prompt and more about understanding/modifying the larger system of settings interacting with each other. The results were actually quite good:

I thought he had a good observation on creating prompts here:

Don’t get tunnel vision on the prompts. There’s more to a car than the engine. Prompts are important but they’re not everything. With platforms like Dall-E 2 where underlying variables aren’t exposed, the prompts do play a dominant role.

But with Stable Diffusion and Midjourney, there are more controls available to you that affect the output. If you’re not getting what you want from prompts alone in Stable Diffusion, for instance, it could be because you need to shop around the sampler methods and CFG Scale values. Even the starting resolution affects the images you get because it changes the initial noise pattern

Definitely worth the read if you’re interested.

Links: