Generative AI Has A Growing Problem: Humans
The Impact Of AI On Optimizing Task Work And Maintaining Generative AI Model Quality
There’s a common perception that artificial intelligence knows everything, can do anything, and will eventually take over humankind. This perception is especially prevalent for the latest type of AI: generative AI. But I’m not so convinced.
Because, what the majority of people outside of the tech industry don’t know: there’s still a considerable amount of human labor involved in training AI systems. Task workers doing crowd-sourced labor bear the brunt of the work. They ensure that the data that AI models are being trained on is actually useful. And they ensure that the models’ responses are aligned with the policies of the organization that builds them (like ChatGPT in the case of OpenAI).
In other cases, to moderate and prevent unwanted input and output, vendors apply content filtering techniques and have people review the information. Earlier this year, I wrote about the lesser talked about relationship between humans and AI and the need for AI supply chain transparency. But why do AI systems still rely on human labor?
Why Foundation Models Need Humans
The so-called foundation models that power the generative AI technology need lots of data — high-quality, accurate data. But that’s not what reality looks like. Data tends to be messy and incomplete. Think of the data your sales team enters in your Customer Relationship Management (CRM) system. It is rarely immediately available for building models. Also, AI needs to learn from new data and what is a good or a bad example.
There are plenty of humans who do provide that data, for example on Social Media. This is one of the key reasons why companies like Meta are launching AI products that acquire and learn from your data (e.g. Threads). Others are limiting the amount of posts to be displayed in a time period (e.g. X) to prevent misuse or exploitation by AI or are updating their terms & conditions to allow using publicly posted information for future model training (also X).
Social media represents a cross-section of people and perspectives — and hence, of language and worldviews as a whole. This also means that there are polarizing, extreme views and content in this training data, which is unwanted in language models that are supposed to be more neutral and used for a variety of tasks. It’s a major reason why generative AI needs human review review1.
Currently, AI models rely on human-created data that is used for training. Historically, human task workers have been reviewing images to identify objects that AI by itself has not been able to identify properly.
For example, insurance companies are relying on task workers to identify and mark areas of car damage in images that owners and adjusters have taken. This information is then being used to train a model. It’s important that the model has a high level of accuracy for the areas of damage it identifies. In the example above, it can mean the difference of paying to replace a headlight or paying to replace the entire front bumper. Humans reviewing the data that is used for model training can identify difficult cases more easily and thereby help improve model accuracy.
Generative AI For Increased Productivity Of Task Workers
Task work is often anonymous. Task workers typically only have insights into the scope of the task that they are working on — not the larger project or the company that will use their work. This can become isolating and task workers can lack a sense of purpose.
They might work on several tasks on a given day: reviewing images of car damage for an hour, finding invoice numbers in PDFs for another, and then, labeling or summarizing transcripts.
With the advent of generative AI-driven productivity gains, however, task workers have started using this technology to speed up their own work as well2 and complete more tasks per day — and hence, increase their pay.
The Vicious Cycle Of AI Learning From AI
The issue with humans using AI-generated data: the results of their work feed the foundation models that generate the results which need improvement by humans in the first place. It’s a vicious cycle.
Incorrect results exacerbate and materialize in these models if not checked. These foundation models are then used to generate code, create or summarize text, and more. But it’s exactly those errors that humans are needed for to review and correct. And so, the goal is circumvented which creates a downward spiral.
Layer on top earlier findings that foundation models learning from data created by foundation models (instead of humans) degrade with every model generation3 and you have an even more vicious cycle.
Where does that leave us, if these foundation models are expected to power large parts of our economy (but could actually be compromised)?
Develop your AI leadership skills
Join my bi-weekly live stream and podcast for leaders and hands-on practitioners. Each episode features a different guest who shares their AI journey and actionable insights. Learn from your peers how you can lead artificial intelligence, generative AI & automation in business with confidence.
Join us live
September 28 - Steve Wilson, Project Lead, OWASP & Chief Product Officer, Contrast Security, will join to discuss the most common vulnerabilities of LLM-based applications and how to mitigate them.
October 12 - Matt Lewis, Chief AI Officer, will discuss how you can grow your role as a Chief AI Officer.
October 24 - Harpreet Sahota, Developer Relations Experts, will join when we talk about augmenting off-the-shelf LLMs with new data.
November 07 - Tobias Zwingmann, AI Advisor & Author, will share which open source technology you need to build your own generative AI application.
Watch the latest episodes or listen to the podcast
Find me here
September 28 - Moderator ofWowDAO Worldwide AI & Web3 Summit, AI Hackathon Winner Ceremony.
October 11 - Put Generative AI to Work, Unveiling Tomorrow's Possibilities – Insights from 30 AI Visionaries on the Future of Generative AI in Business.
Follow me on LinkedIn for daily posts about how you can lead AI in business with confidence. Activate notifications (🔔) and never miss an update.
Together, let’s turn HYPE into OUTCOME. 👍🏻
—Andreas
Veselovsky et al. (2023), Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks, Last accessed: 31 July 2023
TIME, The Workers Behind AI Rarely See Its Rewards. This Indian Startup Wants to Fix That, Last accessed: 28 July 2023
Shumailov et al. (2023), The Curse of Recursion: Training on Generated Data Makes Models Forget, Last accessed: 04 September 2023