Browsed by
Category: AI

Midjourney intro + prompt guide

Midjourney intro + prompt guide

Matt Wolfe briefly walks you through getting Midjourney set up (via Discord) and then gives you some great geting started prompts to help you learn different styles and image generation capabilities.

He also recommends Guy Parsons who gives out lots of tips on building prompts and who has a free e-book with some of his best tips.

Comparing AI art generators for common artist workloads

Comparing AI art generators for common artist workloads

Gamefromscratch decides to do side-by-side results tests on DALL-E 2, Stable Diffusion, and MidJourney for a variety of art generation tasks.

His conclusion is they are not going to replace artist for all tasks, but for concept art, pixel art, and some other simple tasks these AI generators can replace artists.

How close are we a complete world transformation due to AI?

How close are we a complete world transformation due to AI?

Tom Scott distils down his encounter with AI doing a job he used to do (almost equally well) and then reflects on why this could be a completely transformative development for the world – much like when the internet really took off in the late 90’s. I think he’s probably right. As someone that has played with AI art generation and watching the ground breaking papers that are using AI for even traditional rendering and modeling tasks in just the graphics world, I think we’re just at the first part of his sigmoid curve.

This transformation is likely to be very different than just the early internet upheavals of the music industry, cellular phones, and stores/commerce that he describes. Those were largely transformations of market form with the same commercial and societal needs.

I think this is different in at least 2 ways. First, AI is bringing about a change in which thought, analysis, creativity, and response to problems themselves is likely about to be abdicated (and somewhat blindly by the lazy or those that aren’t critically looking at what is being generated). And we’ll be abdicating that power to systems aren’t truly or fully understood, controlled, or protected.

With things like chatGPT, we will very easily start abdicating the hard work of thinking itself. If we no longer crafting the actual language of our responses, doing the hard logical work of building arguments for our daily actions or policies we live by – we will never develop the critical thinking ability to even question what is generated. Instead, they are generated for us. What would that do to us long term? Especially we we already see that chatGPT and other AI systems can get things terribly wrong – and not give us the first clue they are wrong.

Secondly, like all tools, they could even be controlled/manipulated by nefarious agents. Today, our most deadly and horrific tools of destruction (nuclear bombs and sophisticated strategic weapons) are today largely contained within government military systems and by needing the highly specialized ability to build them.

AI can be wielded by anyone, anywhere in the world, with any motivation (political, personal, etc.). With just a small rack of commercially available servers, one has the ability to unleash the kind of infinitely scalable social media posting, auto-responding, narrative controlling, news story generating, and possibly subverted think-for-you devices upon the whole world.

We have known since at least 2019 that this is happening on all major social media platforms despite the best efforts of some of the smartest people in Silicon Valley working on it. Smarter every day did a series of stories on the problem. Research has proven again and again these things are happening and are very, very easy to do and very, very hard to stop:

A few clever AI systems that would likely cost less than a single cruise missile could easily overwhelm social media forums, message boards, Wikipedia edits, generated news articles, etc – before we could ever hope to verify the claims or combat its ability to generate hundreds of thousands of responses, up/down votes, planted webpage articles, etc every hour. How could one even verify the claims if everything is suspect? Why WOULDN’T a country do this if it cost less than what a single missile costs? Even better, what if the AI can be subverted to bias certain responses (which we have already seen too)?

In the post-truth internet, people are well into putting their trust in anonymous influencer opinions and echo chamber forum posts before well verified facts. What will this mean in our internet era in which ‘objective facts are less influential in shaping public opinion than appeals to emotion and personal belief.’?

My how far we’ve come from the idea that the internet would become a forum in which people share ideas and the best ones rise to the top. How dangerously naïve we were…

Leaking corporate code via chatGPT

Leaking corporate code via chatGPT

After catching snippets of text generated by OpenAI’s powerful ChatGPT tool that looked a lot like company secrets, Amazon is now trying to head its employees off from leaking anything else to the algorithm.

This issue seems to have come to a head recently because Amazon staffers and other tech workers throughout the industry have begun using ChatGPT as a “coding assistant” of sorts to help them write or improve strings of code, the report notes.

While this isn’t necessarily a problem from a proprietary data perspective, it’s a different story when employees start using the AI to improve upon existing internal code — which is already happening, according to the lawyer.

Installing Stable Diffusion 2.0/2.1

Installing Stable Diffusion 2.0/2.1

Stable Diffusion 2.0 was largely seen as a dud. Past version 1.5 you should be aware that the outcry of various artists against having their works sampled resulted in the 2.x branches trying to use less of these public sources. This means it has a more limited training set and likely more limited output variety.

If you are interested in trying Stable Diffusion 2.1, use this tutorial to installing and use 2.1 models in AUTOMATIC1111 GUI, so you can make your judgement by using it.

Here are 2 different Stable Diffusion 2.1 tutorials:

You might also try this tutorial by TingTing

https://youtube.com/watch?v=cFgmXLLnHp0%3Fversion%3D3%26rel%3D1%26showsearch%3D0%26showinfo%3D1%26iv_load_policy%3D1%26fs%3D1%26hl%3Den-US%26autohide%3D2%26wmode%3Dtransparent
Retro games with modern graphics – using AI

Retro games with modern graphics – using AI

We’re already seeing a real revolutions in retro gaming via emulation. Preservation of old hardware is important, but it’s also seen as almost impossible task as devices mass produced to only last 5-10 years in the consumer market reach decades of age. Failure rates will eventually reach 100% over enough time (unless people re-create the hardware). But with modern emulators, you can still play all the different games on modern hardware.

On a separate development note, we’ve also seen graphics effects like anti-aliasing and upscaling get the AI treatment. Instead of hand-coded anti-aliasing kernels, they can be generated automatically by AI and the results are now included in all major hardware vendors.

But what about the very graphics content itself? Retro game art has it’s own charm, but what if we gave it the AI treatment too?

Jay Alammar wanted to see what he could achieve by pumping in some retro game graphics from the MSX game Nemesis 2 (Gradius) into Stable Diffusion, Dall-E, and Midjourney art generators. He presents a lot of interesting experiments and conclusions. He used various features like in-painting, out-painting, Dream Studio and all kinds of other ideas to see what he could come up with.

The hand-picked results were pretty great:

He even went so far as to convert the original opening sequence to use the new opening graphics here:

I think this opens up a whole new idea. What if you replaced the entire game graphics elements with updated AI graphics? The results would essentially just become a themed re-skinning with no gameplay (or even level changes), but this definitely brings up the idea of starting your re-theming for new levels (fire levels, ice levels, space levels, etc) by auto-generating the graphics.

Then it brings up the non-art idea of re-theming the gameplay itself – possibly using AI generated movement or gameplay rules. Friction, gravity, jump height, etc – could all be given different models (Mario style physics, Super Meat Boy physics, slidy ice-level physics) and then let the AI come up with the gravity, bounce, jump parameters.

Interesting times…

Links:

Shrinking 4 years to 4 days with AI generated music video

Shrinking 4 years to 4 days with AI generated music video

Photographer and filmmaker Nicholas Kouros spent “hundreds of hours” over 4 years creating a stop-motion meme-themed music video using paper prints and cutouts for a song called Ruined by the metal band Blame Kandinsky. He then created a new version using AI – in 4 days.

The work on the original physical shoot was intense:

“Cutting out all individual pieces was a serious task. Some of the setups were so labor-intensive, I had friends over for days to help out,” says Kouros.

“Every piece was then assembled using various methods, such as connecting through rivets and hinges. We shot everything at 12fps using Dragonframe on a DIY rostrum setup with a mirrorless Sony a7S II and a Zeiss ZE f/2 50mm Macro-Planar lens.”

In a move that likely avoided copyright issues, he used freely usable images. “Most of Ruined was made using public domain paintings and art found on museum websites like Rijks or the Met

After everything had been shot, the RAW image sequences were imported to After Effects and later graded in DaVinci Resolve.

Using AI instead

Kouros then created a second music video but this time he used AI. The video took a fraction of the time to make. “In direct contrast with my previous work for the same band, Vague by Blame Kandinsky, it took a little over four days of experimenting, used a single line of AI text prompting, and 20 hours of rendering,”

“The text prompt line used was: ‘Occult Ritual, Rosemary’s Baby Scream, Flemish renaissance, painting by Robert Crumb, Death.’”

Kouros describes his experience with AI as “fun” and was impressed with the results that the image synthesizer gave him.

What was his final take?

“In my opinion, this specific style of animation won’t stand the test of time, but it will probably be a reminder of times before this AI thing really took off.

I embrace new tech as it comes along and I have already started making images with the aid of image generators.
I’ve actually learned more about art history in this last year using AI, than in seven years of art schools.”

Links:

nVidia GPU’s top the Stable Diffusion performance charts

nVidia GPU’s top the Stable Diffusion performance charts

Toms Hardware did a great benchmarking test on which GPU’s do the best on Stable Diffusion.

They tried a number of different combinations and experiments such as changing the sampling algorithms (though they didn’t make much difference in performance), output size, etc. I wish, however, they discussed and compared the differences in memory sizes on these cards more clearly. Stable Diffusion is a memory hog, and having more memory definitely helps. They also didn’t check any of the ‘optimized models’ that allow you to run stable diffusion on as little as 4GB of VRAM.

There were some fun anomalies – like the RTX 2080 Ti often outperforming the RTX 3080 Ti.

AMD and Intel cards seem to be leaving a lot of performance on the table because their hardware should be able to do better than it is currently doing. Arc GPU’s matrix cores should provide similar performance to the RTX 3060 Ti and RX 7900 XTX, give or take, with the A380 down around the RX 6800. In practice, Arc GPUs are nowhere near those marks. This doesn’t shock me personally since nVidia has been much more invested and in the forefront of developing and optimizing AI libraries.

Auto-generation of 3D models from text

Auto-generation of 3D models from text

I’ve already written about nVidia’s GET3D code that can generate a wide variety of 3D objects using AI trained networks. These objects, however, are more finely tuned to generate specific objects (chairs, cars, etc). This requires a large labeled 3D dataset. nVidia provides simple ones, but if you want them to generate specific kinds of styles or from different eras (only 50’s era cars, only 1800’s style furnature), you’ll need to collect, label, and train the model for that.

There’s another player in town called DreamFusion that goes a slightly different direction. Some Google and a UC Berkley researchers are using a similar method to generate 3D models from text. This gets around the problem of needing lots of pre-trained data by using images generated from 2D text-to-image diffusion models (like Stable Diffusion, DALL-E, and MidJourney). They developed an error/loss metric that they then use to evaluate the generated 2D images and potential for 3D generation and then do so. They come up with some astounding results.

There is also a paper by Nikolay Jetchev called ClipMatrix that attempts the same text-to-2D-to-3D generation. He also seems to be experimenting with animations and something called VolumeCLIP that does ray-casing.

This kind of end-to-end workflow pipeline is exactly the kind of content makers want. Unfortunately, it also means that it could likely decimate an art department. This kind of technology could easily be used to fill the non-critical areas of ever-expanding 3D worlds in games and VR with very minimal effort or cost. In theory, it could even be done pseudo-realtime. Imagine worlds in which you can walk in any direction – forever – and see constantly new locations and objects.

Links:

CLIPMatrix and VolumeCLIP AI based 3D model generation

CLIPMatrix and VolumeCLIP AI based 3D model generation

As I mentioned in my previous article, there is a paper by Nikolay Jetchev called ClipMatrix that attempts to generate 3D models from 2D images that are generated by text-to-image diffusion models (like stable diffusion, DALL-E, MidJourney, etc). A list of his other papers can be found here.

He now seems to be working on auto-generated models that are animated automatically. (Content note: he seems to love to generate content based on heavy metal lyrics, demons, and other fantastical creations that I don’t think demonstrate this could work on more ‘normal’ looking models):

Originally tweeted by Nikolay Jetchev (@NJetchev) on March 10, 2022.

In looking at his Twitter stream, he also seems to be working on a version called VolumeCLIP that appears to generate voxel objects he can ray-cast into..

“The Fire Dwarf Blacksmith”

Originally tweeted by Nikolay Jetchev (@NJetchev) on January 26, 2023.