Josh, I’ve heard a lot about “AI-generated art” and see a lot of memes that look really insane. What’s happening, is the machine picking up the paintbrush now?
It’s not a paintbrush. What you are looking at is a neural network (probably an algorithm that mimics how our neurons signal each other) trained to generate images from text. It’s basically a lot of math.
neural network? Do you want to generate an image from text?So, for example, you plug in “Kermit the Blade Runner Frog” Put it in your computer and it spits out a photo of …?
You aren’t thinking enough out of the box! Of course, you can create all the Kermit images you need. But the reason I’m hearing about AI art is that you can create images from ideas that no one has ever expressed. A Google search for “Kangaroo made of cheese” doesn’t really find anything. But here are nine of them generated by the model.
You said earlier that it’s all a math load, but – To put it as simply as possible, how does it actually work?
I’m not an expert, but basically what they did was get the computer to “see” millions and billions of pictures of cats, bridges, and so on. These are usually stripped from the internet, along with the captions associated with them.
The algorithm can identify the pattern of images and captions and start predicting which captions and images will eventually come together. Once the model can predict what the image will look like based on the captions, the next step is to reverse it and create a whole new image from the new “captions”.
When these programs are creating new images, do they find something in common? For example, all images tagged as “Kangaroo” are usually large blocks of the form: thisAnd “cheese” are usually a collection of pixels that look like this: this – And just spin up that variation?
It’s a little more than that. Take a look at this 2018 blog post to see how problematic the old model was. When it was capped with “a flock of giraffes on a ship,” it created a bunch of giraffe-colored lumps standing in the water. Therefore, the fact that we have recognizable kangaroos and several types of cheese shows that there has been a huge leap in the “understanding” of the algorithm.
group. So what has changed so that what it makes no longer completely resembles a horrifying nightmare?
There has been a lot of development in the techniques and the datasets they train. In 2020, a company named OpenAi released GPT-3. This is an algorithm that can generate text that is almost creepy to what humans can write. One of the most hyped text-to-image generation algorithms, DALLE is based on GPT-3. Recently, Google released Imagen using its own text model.
These algorithms are heavily data-packed and require thousands of “exercises” to improve predictions.
‘Exercise’?Are real people still involved?Is it like telling the algorithm if what they are making is right or wrong?
In fact, this is another big step forward. With one of these models, the actual generated image may only show a handful. Similar to the way these models were initially trained to predict the best captions for an image, only the best image for the text provided is displayed. They are marking themselves.
But there are still weaknesses in this generation process, right?
It cannot be emphasized that this is not intelligence. Algorithms, like you and I, do not “understand” the meaning of words and images. This is like the best guess based on what you’ve seen before. As a result, there are significant restrictions on both what you can do and what you probably shouldn’t do (such as the possibility of graphic images).
Now, if the machine is now making photos on request, how many artists would this remove from work?
For now, the use of these algorithms is either severely restricted or expensive. I’m still on the waiting list to try DALLE. However, the computing power is also cheaper, there are many huge image datasets, and even the general public is creating their own models. Same as the one used to create the kangaroo image.
I don’t think everyone knows what will happen to the artist. However, there are still many edge cases where these models fail, so I don’t rely on them alone.
Is there any other problem with creating an image purely based on pattern matching and marking myself in the answer? Do you have questions about prejudice, wording, or unfortunate relationships?
What we notice in the company announcements of these models is that they tend to use harmless examples. Lots of generated animal images. This is one of the big problems when training pattern matching algorithms using the internet, many of which are absolutely terrible.
A few years ago, the 80 million image dataset used to train the algorithm was deleted by MIT researchers because of “derogatory terms and offensive images as a category.” What we noticed in our experiments is that the word “business” seems to be related to the generated male image.
So now it’s enough for memes and still creates strange nightmarish images (especially facial images), but not as much as it used to. But who knows about the future? Thanks to Josh.