← back · transcript · dkkYmYFrkFY · view dossier

Transcript

Can we stop AI from inheriting our biases? | Julia Mann | TEDxRWTHAachen

Transcriber: Gilang Zaidan Fikri
Reviewer: Manlin Fang Hi, everybody. I have a confession to make. I am biased, but I’m fine with it. I’m okay because I know that you are too. And so it’s no surprise that AI is too. My name is Julia Mann. I’m working in the field of AI. But I’m not an expert in the ethics of AI. I have experienced the biases in AI myself because I belong to
a minority group in my field. So today I’d like to talk
to you about biases. What it is, where it comes from
and how comes that AI is biased. And I think, most importantly for all
of us, how to reduce the biases in AI. But before we start, I thought
I’m going to do a small experiment. So please join me in closing your eyes and imagining an engineer. Please open your eyes. This is what I imagined. And you probably saw
something totally different, right? I see some nodding. Now, there are multiple reasons why
your image doesn’t fit my image. And one of them could be the language. English is a natural gender language, which means the nouns don’t have a gender. German and French, for example.
They are gendered languages. So in German we’d say
ingenieur for male person and ingenieurin for female person, right? So if I did my experiment in German, then our results would
probably be more coherent. Now let's do the experiment again. And I try to reduce this language
ambiguity and be more precise. So please close your eyes again. And imagine a bride at a wedding. And please try to remember the very,
very first image that you see, not the one after your brain
tries to manipulate it. Now open your eyes, please. Please raise your hand if you
saw something similar. A woman in a whitish long dress. Quite a lot of you, including my husband. Thank you. Um, now, please raise your hand
if you saw something similar to this. Just a couple of you. Thank you. Now, I did not want to expose
you here as a biased person. I just wanted to demonstrate
what I said at the beginning. That we all are biased somehow. Because being biased is a human trait. There are a lot of
definitions of being biased, but essentially it means having a tendency
or a preference for something or someone. And there are also a lot of
different types of biases. But essentially there are two major groups
the conscious and the subconscious bias. So with the little two
experiments that we did, I demonstrated that we mostly
subconsciously biased the first image that comes into
my mind when I think of a bride is somebody that looked just like me, right? Of course, my conscious self knows
that a bride can look in so many different ways, and you
probably felt something similar. Now let's turn our attention to AI,
because that's what we're here, right? So let’s repeat the experiment using AI. So please take your phone. I know you’ve been told to keep it
on silent, but you can use it now. It’s fine, I checked it. So just pick your phone, please. And use a search engine,
whichever you like. I'm not making any advertisement here. And just type in bride
and look for images of a bride. And I'll give you just a couple
of seconds to do that. Some people might need to switch
on the phone before I just go ahead. What do you see? I see already some reactions. I assume the first few images
that you see are very similar to the images that you just had in
your mind just a couple of minutes ago, and in humans we would
call that availability bias, which means that my previous thoughts,
they influence my thoughts. Now, my previous experience influences it. And something very similar happens in AI. So your previous searches, if you used that search engine before, they influence the results
that you’re seeing now. Now why is AI biased? Some of you might know
that AI is based on maths, and maths is factual, right?
Two plus two is four. It’s a fact. There’s nothing bias about it. So why is AI still biased? Because it uses data. And as you probably know,
data can be biased. There are multiple reasons
why data sets are biased. Bias can arise from the way
we collect the data. So who we collect it from, where we
collect it from, who collects the data? Bias can also arise from history
and also change in the society. And of course, bias arises
from our behavior, our culture, and even religion. And sometimes we do actually
want to have biased data sets. Just as an example, if I want
to develop medication for a disease that happens or that occurs
mainly in women, then I want to test it on a group of
people with mainly female participants, don’t I? And that would be strictly
seen a biased data set. Now, another reason why AI is biased
is because it’s made by us humans. And the the developers that
program the algorithms, they’re just like you and me. They have their own experience. They have their own backgrounds. So they have their own biases that
they incorporate into the algorithm, usually subconsciously. Now, not only the algorithms
are made by us humans, but also the data is made by us humans. And let me explain. The large language models that you
currently using for generating images, such as Midjourney, for example,
or Dall-E, you might have heard of it. They are trained on a very
big data set called lion, and the current version of
the lion data set consists of 6 billion data points, and they are collected from
publicly available internet. Now, when the data is collected
from publicly available internet, then it's usually uncurated, which means
it's not ordered, it's not sorted. It's just raw and unfiltered. And just like you and I,
we would struggle with data that is raw and unfiltered. I’ll struggle too, and I guess you can imagine what the data looks from publicly
available internet. No talk about that. So in short, AI is biased because of the complexity, but also diversity of us humans. And although we know that we are biased, there actually currently far more severe
consequences of AI being biased. And a very extreme example
is where colored people, people of color were
predicted as criminals, and had to spend multiple days in prison
because an AI powered algorithm thought that they did something
that they didn’t do. In fact, since 2018, eight people
were falsely convicted and had to spend time in prison
for something that they did not do. And you might think, well, in six years,
eight people are a big thing, right? For them, it was a big thing.
And the families. And this happened, in my opinion,
mainly for two reasons. First of all, the data was incomplete. It was lacking the diversity of us humans. It was lacking of all the different
types and colors of us humans. And the second reason is because
people who use that algorithm, they relied too much on it
without knowing the risks and what can happen when we use it. And this is exactly the reason
why I chose the topic today, because I think this should
be known to everybody. Just like we know we need
to wear sunscreen when it’s super sunny outside, right? Not like today.
You should know it. All the AI experts know that AI is biased. All the AI experts know that ChatGPT
hallucinates, and you should know it too. Now, I don’t want to scare you, of course. But we still have the question. How can we reduce or maybe eliminate the bias in AI. Now, we’ll learn from the past
that we should not use AI powered algorithms
in certain situations, like in court, and they’re currently
regulations on the way. You might have heard of the AI act
by the European Union that regulate the development
and the applications of AI, but to fix the bias is actually not
that easy because of its nature. The AI algorithms
are looking for patterns for things that occur
the most often, things that are most,
most, most significant, things that do not occur too often. They are considered as outliers
and are disregarded. And here we are again at the data problem. In fact, there have been attempts
to make this lion data set a very big data set less bias. So, for example, images pornographic
images were deleted from the data set. Which makes sense, right? We don't
want to have to have that in it. But what was the result? The data set was then male dominated because a lot of images
of women were deleted. So the data set again was biased. So as you can see,
it’s actually not that easy to fix the bias in AI. A data set that has bias, has some extreme mathematically
seen tails, we call it, and a data set that does not
have bias is called uniformly distributed. So if we manage to get a data set
from a data set with outliers to a uniformly distributed data set,
a very diverse data set. Then we can reduce the bias in AI. And this is actually where you
and I can contribute to. If you remember I said that the data
is collected from the internet that is then used for these
large language models. And this is where you and I upload
our data to, isn't it? So next time we upload the data, let’s make sure that
it’s real data without any filter with red spots in your face,
if that’s how it is, with no makeup,
if that’s how it is. But together we can show AI, how authentic and diverse we humans are. Thank you.