The lesser talked about relationship between humans and AI
How humans improve AI model accuracy through labeling and content moderation
We are seeing a new Artificial Intelligence (AI) hype gaining momentum. This hype is largely fueled by applications which use so-called generative AI. It is a type of AI that can create images or write essays based on just a text prompt that a user provides. The most prominent examples of these applications are DALL-E 2, Stable Diffusion, and Midjourney (text-to image) and GPT-3 or ChatGPT (text-to-text).
Generative AI makes it easy for anyone to use it, even without any prior knowledge. But despite recent the advancements in AI, these AI applications have a dependency that is not often talked about: Humans.
AI’s dependence on humans in the loop
While media coverage looks at the potential that AI holds or that we should be scared of it taking our jobs, AI’s actual dependency on humans to deliver more accurate results is a much lesser known fact.
AI is largely based on recognizing patterns in vast amounts of data and comparing new data points against these patterns it. To do that with a high level of accuracy, AI needs labeled data in many cases to deliver these results. Imagine feeding the AI model a picture of a wrench during model training and providing it the metadata (aka label) “wrench” along with it. In many cases these labels need to be created somehow (e.g. wrench, hammer, toolbox, invoice, contract).
The business models to do this kind of “data labeling” have been in place for a number of years and are mainly based on human labor. The most prominent platform is Amazon Mechanical Turk, which allows anyone to contract out jobs (such labeling) to a virtual workforce. As the submitter of a job, you define tasks, provide data, and assign it to a pool of resources. It’s not uncommon for people in emerging economies to be the ones who work on these tasks. The typical data labeling tasks include reviewing images or drawing bounding boxes around areas of interest in an image or document — for example an object or a damaged parts of a car, so an AI model can recognize these parts as being relevant for its training. Other examples include humans listening to audio snippets from a smart speaker to improve the accuracy of an AI-generated transcription (CNN). But it doesn’t stop there.
Human-assisted content review
This model of human labor to label, annotate, or review data is also known as “human in the loop”. The next level of humans in the AI loop is to improve responses of generative AI models such as ChatGPT. At the most basic level, this is a classification problem: Is this piece of content permitted by the company’s guidelines? Yes or No? Although AI models might already achieve a high accuracy of correctly identifying permitted and non-permitted content, they are not able to classify every piece of content correctly every time. This is where human content reviewers come in.
If the AI model is in doubt (e.g. accuracy below a defined threshold), the content is presented to a reviewer who needs to make a decision whether to accept or to reject this piece of content. To make this decision, they often only have a minute or less time. But for an AI model to spot content which it should surface to a human for review, it first needs to be trained to recognize this type of content. That means it needs to be trained on sufficient examples to know what is not permitted. For large language models such as ChatGPT these steps occur during model training, while social media, for example, platforms rely on human content moderation and review in real-time.
Models like ChatGPT are trained on vast amounts of data from the internet. This also includes content from the darkest corners of the web including the kinds of topics, language, and views that one would not want the AI to regurgitate. If used commercially, the reputational and financial risks from AI replying with racial slurs, bias, etc. are very high — similar to Microsoft’s experiment of Chatbot Tay a few years ago (Microsoft). Unlike the example of supervised models which require labeled data, generative AI models are self-supervised. This means that they create any labels themselves and learn without any human assistance. However, this also increases the risk of the model returning unwanted language (or blatant nonsense).
To minimize the likelihood upfront that such scenarios do happen, even AI like ChatGPT have received help from humans. Because, human language is ambiguous. Expressions can have a double meaning or are dependent upon context. So, someone needs to make a judgement call — someone being a human. Companies like OpenAI, the creator of ChatGPT, have assigned humans to spot violations of the content guideline during the model training. And this means to review and to determine violating content and language, so these could be filtered out in production.
Last week, TIME published a story about this kind of content review for ChatGPT training using human labor in Kenya, about the work conditions, the impact on people doing the work, and the hourly pay. I highly encourage you to read it to get a different perspective that is not often talked about.
Wherever there’s light, there’s shadow
The AI field is currently going through its next hype with generative AI. A lot of light shines on the opportunities ahead. Although human data labeling and content review are necessary to increase the accuracy of even the most popular generative AI models, the type of content that people need to review, and the impact that this task has on their mental health, are the darker sides of the AI hype we need to be aware of.
Want to learn what the biggest trends in AI & Intelligent Automation are this year?
When: Tuesday, January 31, 2023 at noon EST | 18:00 CET
» Join 480+ participants LIVE on LinkedIn and have your questions answered on the air. «
Follow me on LinkedIn for daily posts about how you can set up & scale your AI program in the enterprise. Activate notifications (🔔) and never miss an update.
Together, let’s turn hype into outcome. 👍🏻
—Andreas