Sudowrite scores a new point

Image

Not a G.I.R.L. (Guy In Real Life) – illustration loosely based on the female main character of my test project.

This week, I tested Sudowrite on a project that should be hard for an automaton without real intelligence: A modern romantic comedy that takes place partly in an online roleplaying game and partly in real life. The online character has a different name, appearance, and behavior from the physical player. Mixing these up should be really easy if you don’t actually understand the text, and would be catastrophic for the storytelling. In this story, there is actually a hint of a third layer, as the female main character is a woman both offline and online, but online claims to be a man in real life to avoid sexual harassment.

To my amazement, the artificial intelligence was quite good at keeping track of dual identities. There was the occasional slip-up with the female character, whose online name (Manilla) was similar to her real name (Magnhild). I have still not found any such confusion with the male main character, who is more different in both name and appearance from his character. As expected, the AI seems to have given up on the subplot of the female in real life whose female online character pretended to be male in real life. Hey, I am not even sure if you, noble reader, can wrap your head around this without effort. Probably, if you’re here in the first place. But the AI seems to have just decided to skip it, and I can’t blame it. I am impressed it did as well as it did.

***

The other problems remain. The AI is amazing at making a chapter-by-chapter outline of the whole novel, but struggles with getting from chapter to scene. In one of the early chapters, it wrote the same scene three times in a row but slightly different each time. It goes through an intermediate stage called story beats, and these are short and fairly generic by default. You can go in and heavily edit and expand them to avoid this kind of slip-up, but it is clear that the AI struggles with the process of creating a scene from its own story beats. This is kind of ironic, since you would think it was easier for it to write a good scene based on its own story beats rather than those of another, but the opposite is true. Rewriting the story beats constitutes by far most of the work of creating an AI draft with Sudowrite. I am not impressed with the prose either, but it is not outright embarrassing like the scene composition. It is low-grade commercial prose, I’d say. Like the Amazon novels that cost $0.99 to $1.99. If you want to sell for $2.99, you should probably write better.

But my message today was actually meant to be positive. I am impressed that an AI can keep track of dual identities reliably. Online romance is not the only category where this would be useful. Superhero stories usually feature secret identities, and so may spy stories and shapeshifter stories like werewolf or vampire supernatural romance or drama. I guess the AI has been trained with so many books that it recognizes this trope innately.

***

If there are still people who have read my immense archives, they may recall that I made fun of early versions of the speech recognition Dragon NaturallySpeaking, comparing it to a homesick high school exchange student for its limited command of the English language. But a few years later it would pick up my Scandinavian-accented English better than some native English speakers. I fully expect something similar to happen to AI writing tools. They may be involuntarily amusing now, but in a few years – if we manage to avoid a global disaster – they may surpass amateur writers like me in pretty much every aspect.

(Machine) learning is not theft

Hermione Granger by Edvard Munch

Hermione Granger (from the Harry Potter series) painted by Edvard Munch. If you think MidJourney here is plagiarizing Munch’s original, I have a very expensive bridge to sell you.

I have recently mentioned using artificial intelligence to create visual art. Text-to-image applications like DALL-E 2, MidJourney, and Stable Diffusion all use machine learning based on enormous numbers of pictures scraped from the Internet. Now some contemporary artists have discovered that some of their work is used in the underlying database used for training AI, and are upset that they have not been asked and not been compensated.

This reaction is caused by their ignorance, of course. I can’t blame them: Modern society is very complex, and human brains are limited. Yes, even mine. I could not fix a car engine if my life depended on it, for instance. I have only vague ideas of what it would take to limit toxic algae bloom. And to be honest, I could not make my own AI even if I had the money. I just happen to have a very loose idea of how they work because it interests me, because I don’t have a family to worry about, and because I don’t have a job that requires me to spend my free time thinking about it.

Anyway, I shall take it upon myself to explain why you should politely ignore the cries of the artists who feel deprived of money and acknowledgment by AI text-to-image technology.

The fundamental understanding is thatĀ learning is not theft. I hope we can agree on this. Obviously, there are exceptions to this, such as industry secrets like the recipes for Coca-Cola or the source code for Microsoft Windows. If someone learns those and uses them to create a competing product, it is considered theft of intellectual property. But if an art student studies your painting along with thousands of other paintings, and then goes on to paint their own paintings, that is not theft. If they make a painting that is a copy of yours, then yes, that is plagiarism and this infringes on your copyright. But simply learning from it along with many, many others? That is fair use, very much so. If you don’t want people to learn from you, then you need to keep your art to yourself. You can’t decide who gets to look at your art unless you keep it private.

The excitement is probably based on not knowing how the “diffusion” model of AI works. So let me see if I can popularize that. Given our everyday use of computers, it is easy to think that the AI keeps a copy of your painting in its data storage and can recall this at some later time. After all, that is what Microsoft Office does with letters, right? But machine learning is a fundamentally different process. The AI has no copy of your artwork stored in its memory, just a general idea of your style and of particular topics. This stems from how “diffusion” works.

When a program like MidJourne or Stable Diffusion gets a text prompt, it starts from a “diffuse” canvas covered in a single shade of color (or grayscale, if a black & white image is requested). It then goes through many steps of moving these pixels into shapes that fit the description it has been given. (It can do this because it has gone through the opposite process millions of times, gradually blurring the images away. Thus the name “diffusion”.) You can, if you have the patience, watch the images gradually become less and less diffuse, slowly starting to resemble the topic of the prompt. In other words, it starts with a completely diffuse image that becomes clearer and clearer. You can upscale such an image and the AI will add details that seem appropriate for the context. (Especially until recently. this could include adding extra fingers or even eyes, but the latest editions are getting better at this.)

It is worth noticing that there is also an excessively long random seed included in the process, meaning that you could give the AI the same prompt thousands and thousands of times and get different versions of the image every time. Sometimes the images will be similar, sometimes strikingly different, depending on how detailed your request is. Once an image catches your eye, you can make variants of it, and these too have a virtually unlimited number of variations.

At no point in this process does the AI bring up the original image, because thereĀ are no original images stored in its memory, just a general, diffuse idea of what the topic should look like. And in the same way, it only has a general, diffuse idea of what a particular artist’s style is. My “Munch” paintings certainly look more like Munch than Monet, but it is still unlikely that Edvard Munch would actually have painted the exact same picture. In this case, of course, it is literally impossible, and that is exactly the scenario where we want to use engines like these. “What if Picasso had painted the Sixtine Chapel? What if Michelangelo and van Gogh had cooperated on painting a portrait of Bill Gates?” The AI is simply not optimized for rote plagiarism, but for approximation. It is like a human who spent 30 years in art school practicing a little of this and a little of that, becoming a pretty good jack of all trades but a master of none. They can’t exactly recall any of the tens of thousands of pictures they have been practicing on, but they’ve got the gist of it.

***

As for today’s picture, it was made by MidJourney using the simple prompt “Hermione Granger, painted by Edvard Munch –ar 2:3” where ar stands for aspect ratio, the width compared to the height. This generated four widely different pictures, and I chose one of them and asked for variations of that. This retains the essential elements of the picture but allows for minor variations as you see above. So it is not because the AI had an original picture to plagiarize – I asked it to make variations on its own picture. With some AI engines, you can in fact upload an existing picture and modify it, but this is entirely your choice, just like if you modify a picture in Gimp or Photoshop. The usual legal limitations apply, you can not hide behind “an AI did it!”. So far, AIs are not considered persons. Maybe one day?