Browsed by
Category: AI

Generating a music video from a text prompt

Generating a music video from a text prompt

Sora is an artificial intelligence video generator that is capable of producing multi-shot clips of a minute or longer from nothing more than a text prompt — but so far only a select few have used it to create content. OpenAI is working on security issues and slowly rolling it out this year.

One of the artists given early access to Sora is August Kamp, a musician, researcher and creative activist. She described Sora as representing a “turning point” for artists as it means the only limitation on visuals is the human imagination. 

“Taking these pictures that I’ve held onto [in my mind] for two years and saying ‘August – we can share these with folks’. that’s what I think is special about this tool,” she said.

Article:

Google DeepMind Trained Robots Playing Soccer

Google DeepMind Trained Robots Playing Soccer

Google developed a deep reinforcement learning–based framework for full-body control of humanoid robots, enabling a game of one-versus-one soccer. The robots exhibited emergent behaviors in the form of dynamic motor skills such as the ability to recover from falls and also tactics like defending the ball against an opponent.

Pretty cool. I wonder when we’ll finally replace athletes and replace them with robots.

Articles:

‘Photo’ Was Made From an 84-Year-Old Woman’s Memory using prompt engineer

‘Photo’ Was Made From an 84-Year-Old Woman’s Memory using prompt engineer

An interviewer and a prompt engineer will sit down with the subject whose memory they are trying to retrieve, as the person recalls a specific event or place the promptographer will input the descriptions into an AI image generator and what follows is a bit of back and forth to get the image right.

“You show the image generated from that prompt to the subject and they might say, ‘Oh, the chair was on that side’ or ‘It was at night, not in the day’,” explains Garcia. “You refine it until you get it to a point where it clicks.”

It’s more like guided painting/drawing of a scene from a description – but using generative AI to do the work is pretty unique.

The team recently worked with an 84-year-old woman from Barcelona called Maria. Maria has vivid memories of peering out from her balcony as a child to try and catch a glimpse of her father who was incarcerated in a prison opposite where they lived.

These childhood memories only existed inside Maria’s mind, but the AI researchers worked with her to bring these reminisces to life by describing the place and the historical context (Maria’s father had been jailed by General Franco).

“It’s very easy to see when you’ve got the memory right because there is a very visceral reaction,” Pau Garcia, founder of Domestic Data Streamers, tells MIT Reivew. “It happens every time. It’s like, ‘Oh! Yes! It was like that!’”

AI image from the Spanish Civil War was co-created by a 90-year-old woman called Nuria who vividly remembers men waiting outside bomb shelters with shovels and picks ready to rescue anyone trapped inside.

It’s not as complex as previous methods of reconstructing images from brain scans, but it’s an interesting approach.

Articles:

More advances in recreating images from brain scans

More advances in recreating images from brain scans

I wrote about using AI trained models to re-create images from brain scans before. Improvements have been rapidly developing.

In a paper published in Neural Networks, researchers at the National Institutes for Quantum Science and Technology (QST) in Japan were reportedly able to use artificial intelligence (AI) to reconstruct images solely from people’s brain activity with over 75% accuracy.

They recorded the brain activity of subjects who viewed 1,200 various images while in a functional magnetic resonance imaging (fMRI) machine. “Score charts” that included 6.13 million factors such as color, shape, and texture were also created by making the AI recognize the images. The subjects were then shown another set of images that were different to the original images. Their brain activity was measured under the fMRI 30-60 minutes later while asked to imagine what kind of image they had seen.

According to the publication, the scientists’ groundbreaking method allowed them to use AI to reconstruct original images with a 75.6% accuracy rate — which is a big step from previous efforts with allowed a 50.4% accuracy rate

Articles:

Crowds attacking self-driving vehicles

Crowds attacking self-driving vehicles

A Waymo driverless taxi was attacked and burned to the ground Feb 10, 2024 in San Francisco’s Chinatown around 9PM PT. A crowd formed around the car, covered it in spray paint, broke out its windows, and set it on fire.

The Verge couldn’t figure out who did it, but I have a pretty good guess where to go look first. It’s not like they’re being subtle about it since they talked about doing exactly this in the New York Times.

Rabbit R1

Rabbit R1

Co-designed by Teenage Engineering, what makes the Rabbit R1 special is the interface: instead of a grid of apps, you get an AI assistant that talks to your favorite apps and does everything for you.

You could get the R1 to research a holiday destination and book flights to it, or queue up a playlist of your favorite music, or book you a cab. In theory, you can do almost anything you can already do on your phone, just by asking. It remain a lot of questions over exactly how it works and protects your privacy in the way it describes.

Pre-orders are available at the Rabbit website with deliveries expected around March/April 2024.

Let’s hope it does better than the Humane AI pin that is already floundering and laying off staff. At least the Rabbit doesn’t require a monthly subscription.

Articles:

Robotic excavator autonomously builds a stone wall

Robotic excavator autonomously builds a stone wall

HEAP (Hydraulic Excavator for an Autonomous Purpose), a modified 12-ton Menzi Muck M545, began by scanning a construction site, created a 3D map of it, then recorded the locations of boulders that had been dumped at the site. The robot then lifted each boulder off the ground and utilized machine vision technology to estimate its weight, center of gravity, and shape.

An internal algorithm then determined the best location for each boulder and built a stable, stacked/mortarless 20-ft high, 213-ft long stone wall.

Articles:

DreaMoving – now anyone can dance

DreaMoving – now anyone can dance

I previously wrote about the Everybody Dance Now technology that allows you to take a video of a source dancer and then apply it to a video of a target person.

Now we have text-to-video technology called DreaMoving. You can start with a reference image, type in a description of the kind of moving you want, and get a generated video clip.

Article:

Deep Nostalgia

Deep Nostalgia

MyHeritage Deep Nostalgia is a tool that came out about 3 years ago and can add animation to static faces in your photos to bring life to them. People first tried it on historical images:

Then it became a trend on TikTok to upload images of relatives that have long since died. While it’s not perfect, it brought many to tears to see their loved ones again.

Of course, this can be a double-edged sword. This technology can bring the past to life, but it can also be used to create fake videos of living people.

AI can guess where you are from a single picture

AI can guess where you are from a single picture

Rainbolt is one of the world’s best players of Geoguessr – a game in which you are given a 360 picture and you get about 20 seconds to guess where in the world it was taken. A team at Stanford took 2 months and built an AI that can guess 92% of countries correctly and a median miss error of only 44km – which is astounding.

Here’s a head-to-head competition between the AI Predicting Image Geolocations (or PIGEON) and a pro geogussr player:

But there’s another side of this kind of technology. NPR did an interview and presented a few personal photos to the algorithm. PIGEON was able to guess the location the photo was taken to a really high degree of accuracy. This means you can find places taken in old family snapshots, but it also means that algorithms like this can reveal everywhere you are, and have been, based on your social media posts.

How it works

The algorithm that PIGEON uses is an interesting combination of AI model techniques. Besides the AI based learning, they use some interesting methods such as ‘geocells’ that uses political/geographic regions to help narrow locations instead of just naïve squares.

Rainbolt even pointed out that PIGEON picked up on camera lens smudges in the sky that were very common in Canadian google image captures:

There’s so many other details. Definitely check out their paper here: https://arxiv.org/abs/2307.05845

Summary

This is yet another example of where 3 graduate students were able to develop a system that is better than the best experts in the world. And in this case, they did it in less than 3 months with off the shelf software and hardware.

You can only imagine where things will be in just a few years. Anyone that doesn’t think AI is already changing the world is missing it as it’s happening.