Can we stop AI from inheriting our biases? | Julia Mann | TEDxRWTHAachen
Transcriber: Gilang Zaidan Fikri Reviewer: Manlin Fang Hi, everybody. I have a confession to make. I am biased, but I’m fine with it. I’m okay because I know that you are too. And so it’s no surprise that AI is too. My name is Julia Mann. I’m working in the field of AI. But I’m not an expert in the ethics of AI. I have experienced the biases in AI myself because I belong to a minority group in my field. So today I’d like to talk to you about biases. What it is, where it comes from and how comes that AI is biased. And I think, most importantly for all of us, how to reduce the biases in AI. But before we start, I thought I’m going to do a small experiment. So please join me in closing your eyes and imagining an engineer. Please open your eyes. This is what I imagined. And you probably saw something totally different, right? I see some nodding. Now, there are multiple reasons why your image doesn’t fit my image. And one of them could be the language. English is a natural gender language, which means the nouns don’t have a gender. German and French, for example. They are gendered languages. So in German we’d say ingenieur for male person and ingenieurin for female person, right? So if I did my experiment in German, then our results would probably be more coherent. Now let's do the experiment again. And I try to reduce this language ambiguity and be more precise. So please close your eyes again. And imagine a bride at a wedding. And please try to remember the very, very first image that you see, not the one after your brain tries to manipulate it. Now open your eyes, please. Please raise your hand if you saw something similar. A woman in a whitish long dress. Quite a lot of you, including my husband. Thank you. Um, now, please raise your hand if you saw something similar to this. Just a couple of you. Thank you. Now, I did not want to expose you here as a biased person. I just wanted to demonstrate what I said at the beginning. That we all are biased somehow. Because being biased is a human trait. There are a lot of definitions of being biased, but essentially it means having a tendency or a preference for something or someone. And there are also a lot of different types of biases. But essentially there are two major groups the conscious and the subconscious bias. So with the little two experiments that we did, I demonstrated that we mostly subconsciously biased the first image that comes into my mind when I think of a bride is somebody that looked just like me, right? Of course, my conscious self knows that a bride can look in so many different ways, and you probably felt something similar. Now let's turn our attention to AI, because that's what we're here, right? So let’s repeat the experiment using AI. So please take your phone. I know you’ve been told to keep it on silent, but you can use it now. It’s fine, I checked it. So just pick your phone, please. And use a search engine, whichever you like. I'm not making any advertisement here. And just type in bride and look for images of a bride. And I'll give you just a couple of seconds to do that. Some people might need to switch on the phone before I just go ahead. What do you see? I see already some reactions. I assume the first few images that you see are very similar to the images that you just had in your mind just a couple of minutes ago, and in humans we would call that availability bias, which means that my previous thoughts, they influence my thoughts. Now, my previous experience influences it. And something very similar happens in AI. So your previous searches, if you used that search engine before, they influence the results that you’re seeing now. Now why is AI biased? Some of you might know that AI is based on maths, and maths is factual, right? Two plus two is four. It’s a fact. There’s nothing bias about it. So why is AI still biased? Because it uses data. And as you probably know, data can be biased. There are multiple reasons why data sets are biased. Bias can arise from the way we collect the data. So who we collect it from, where we collect it from, who collects the data? Bias can also arise from history and also change in the society. And of course, bias arises from our behavior, our culture, and even religion. And sometimes we do actually want to have biased data sets. Just as an example, if I want to develop medication for a disease that happens or that occurs mainly in women, then I want to test it on a group of people with mainly female participants, don’t I? And that would be strictly seen a biased data set. Now, another reason why AI is biased is because it’s made by us humans. And the the developers that program the algorithms, they’re just like you and me. They have their own experience. They have their own backgrounds. So they have their own biases that they incorporate into the algorithm, usually subconsciously. Now, not only the algorithms are made by us humans, but also the data is made by us humans. And let me explain. The large language models that you currently using for generating images, such as Midjourney, for example, or Dall-E, you might have heard of it. They are trained on a very big data set called lion, and the current version of the lion data set consists of 6 billion data points, and they are collected from publicly available internet. Now, when the data is collected from publicly available internet, then it's usually uncurated, which means it's not ordered, it's not sorted. It's just raw and unfiltered. And just like you and I, we would struggle with data that is raw and unfiltered. I’ll struggle too, and I guess you can imagine what the data looks from publicly available internet. No talk about that. So in short, AI is biased because of the complexity, but also diversity of us humans. And although we know that we are biased, there actually currently far more severe consequences of AI being biased. And a very extreme example is where colored people, people of color were predicted as criminals, and had to spend multiple days in prison because an AI powered algorithm thought that they did something that they didn’t do. In fact, since 2018, eight people were falsely convicted and had to spend time in prison for something that they did not do. And you might think, well, in six years, eight people are a big thing, right? For them, it was a big thing. And the families. And this happened, in my opinion, mainly for two reasons. First of all, the data was incomplete. It was lacking the diversity of us humans. It was lacking of all the different types and colors of us humans. And the second reason is because people who use that algorithm, they relied too much on it without knowing the risks and what can happen when we use it. And this is exactly the reason why I chose the topic today, because I think this should be known to everybody. Just like we know we need to wear sunscreen when it’s super sunny outside, right? Not like today. You should know it. All the AI experts know that AI is biased. All the AI experts know that ChatGPT hallucinates, and you should know it too. Now, I don’t want to scare you, of course. But we still have the question. How can we reduce or maybe eliminate the bias in AI. Now, we’ll learn from the past that we should not use AI powered algorithms in certain situations, like in court, and they’re currently regulations on the way. You might have heard of the AI act by the European Union that regulate the development and the applications of AI, but to fix the bias is actually not that easy because of its nature. The AI algorithms are looking for patterns for things that occur the most often, things that are most, most, most significant, things that do not occur too often. They are considered as outliers and are disregarded. And here we are again at the data problem. In fact, there have been attempts to make this lion data set a very big data set less bias. So, for example, images pornographic images were deleted from the data set. Which makes sense, right? We don't want to have to have that in it. But what was the result? The data set was then male dominated because a lot of images of women were deleted. So the data set again was biased. So as you can see, it’s actually not that easy to fix the bias in AI. A data set that has bias, has some extreme mathematically seen tails, we call it, and a data set that does not have bias is called uniformly distributed. So if we manage to get a data set from a data set with outliers to a uniformly distributed data set, a very diverse data set. Then we can reduce the bias in AI. And this is actually where you and I can contribute to. If you remember I said that the data is collected from the internet that is then used for these large language models. And this is where you and I upload our data to, isn't it? So next time we upload the data, let’s make sure that it’s real data without any filter with red spots in your face, if that’s how it is, with no makeup, if that’s how it is. But together we can show AI, how authentic and diverse we humans are. Thank you.