The Next Wave In Generative AI Deployment: AI Agents
Six Common Types Of Agents And Concrete Examples For Businesses
The last few weeks have brought several innovations in foundation models, including announcements from OpenAI, Google, and Anthropic. What most of the coverage has been missing: the bigger picture. Yes, these models are another leap forward, especially those that are multi-modal such as OpenAI GPT-4o and Google Gemini. But it’s not about building better chatbots.
It’s rather the answer to “What’s next in Generative AI?” After the initial scenarios, such as generating, summarizing, and translating text (and other types of media), are implemented, the next level of capabilities is just around the corner. And along comes the next level of productivity gains.
It’s just that this time, it won’t be automating clicks (Robotic Process Automation), individual approval steps in a process (Machine Learning), or language tasks (large language models). This next phase is all about using agents to automate problems with limited uncertainty and complexity. So, let’s jump in…
Six Common Types of AI Agents
Agents are software components that can make decisions under uncertainty based on defined objectives and interact with their environment. Agents have existed for decades.
For example, the thermostat in your house is an agent. A sensor measures the current room temperature and if that temperature is outside of a defined threshold during the next measurement (e.g. colder than what you have set it to), the thermostat fires up your heating until the target temperature is reached.
But Generative AI adds unique, new abilities to agents: they use Generative AI models to understand an abstract goal, divide it into subgoals, evaluate possible options for achieving these subgoals, and execute the steps necessary to do so.
Agents are built into applications and available as stand-alone extensible frameworks. They can execute tasks on a user’s behalf, which is an important value proposition for businesses that are always looking to increase the level of automation in their operations.
Agents understand the environment they are situated in and its state. How do they work? Different types of agents process information differently, for example, with an increasing scope of perceiving and interacting with their environment:
Simple reflex agents operate based on a direct mapping of situations to actions, making them suitable for environments with clear cause-and-effect relationships.
— Example: A thermostat. Or basic e-mail filtering based on user-defined “If-Then-Else” conditions.
Model-based agents maintain an internal model of the world, enabling them to plan and adapt to changes more effectively.
— Example: Risk assessment simulating multiple future scenarios and conditions.
Goal-based agents prioritize actions that lead to desired outcomes, ideal for tasks with specified objectives.
— Example: Inventory optimization to determine optimal reorder time and quantity based on market and sales data.
Utility-based agents evaluate actions based on utility functions, making decisions that maximize overall performance or satisfaction.
— Example: Pricing strategy optimization to maximize revenue under consideration of inventory levels and willingness to pay.
Learning agents improve their performance over time through experience, making them valuable in dynamic and unpredictable environments.
— Example: Fraud detection identifying evolving types of anomalies and suspicious transactions.
Hierarchical agents utilize multiple levels of agents to manage complex tasks efficiently, suitable for intricate systems with various subsystems.
— Example: Sales preparation compiling a customized pitch based on historic transactions, preferences, financial data, market data, white spaces, etc.
A Glimpse at Agents in Business
The majority of tasks in a business are knowledge-driven. Whether it is knowledge of where to find information (location), who to talk to (contact), or how to perform a certain task (process), knowledge relies on information and data that the information is based on.
The space has accelerated since the first LLM-based agent frameworks were released last year. Currently, emerging agents have largely been based on text-based prompts, which is not the most efficient way to implement them.
Take a Generative AI-based application that takes images as input and generates an audio output. Converting what AI recognizes in an image to text before that information can be processed by a (text-based) LLM for analysis and reasoning before the generated output is passed to a voice model creates delays and costs.
These are key reasons why early multi-modal AI assistants such as the Rabbit R1 or the Humane AI Pin have received mixed to bad reviews, and consumers have dampened their excitement. Due to ongoing miniaturization and scale effects, we can expect the next iteration of these and similar products to be significantly more capable.
In business, multi-modal AI agents could:
Customer support: Your business manufactures coffee makers. A user interacts with your company’s chatbot. The user uploads an image of their coffee maker that has stopped working. Using visual search, the chatbot calls an AI agent to identify the model. Based on the model information, it pulls the relevant troubleshooting information from the device’s user manual and provides the recommended steps back to the user. The user can follow the steps to get their coffee maker working again.
Competitive positioning: Your product marketing team wants to understand how their closest competitor messages their product based on publicly available videos. Pointing the AI agent to a set of videos on YouTube, it processes the transcripts, analyzes the style and tone, compiles a summary of the anticipated target audience, and explains why the chosen phrases and examples resonate. Taking it a step further, the AI agent can propose effective strategies to counter the positioning — for example, in spoken language, like a marketing consultant.
Sales pitch: You are a junior manufacturing sales rep selling configurable products. You prepare for a meeting with your customer. Your AI assistant (agent) gathers information about the customer, recent financial statements, news, and contracts, and generates a brief for you. Key players at the company you’re going to meet with are also added, as well as a few tips on structuring the conversation for maximum impact.
Multi-modal models like OpenAI GPT-4o provide AI agents with multi-modal capabilities to perceive their environment in even richer, more complete ways, and they enable them to act in more dimensions beyond just text.
Summary
Generative AI is more than chatbots on steroids. The most recent advancements allow AI agents to perceive and interact with their environment and make decisions under uncertainty.
From six common types of agents to examples of where we will use them in business, recent announcements surrounding multi-modal capabilities will enable the next wave of Generative AI beyond generating, summarizing, and translating text.
Businesses will benefit from agents that can perceive and interact with their environment in modalities other than text, ensuring a richer user experience and a broader set of capabilities for which these agents can be used.
What would you like to learn more about regarding agents?
Explore related articles
Become an AI Leader
Join my bi-weekly live stream and podcast for leaders and hands-on practitioners. Each episode features a different guest who shares their AI journey and actionable insights. Learn from your peers how you can lead artificial intelligence, generative AI & automation in business with confidence.
Join us live
June 25 - Srujana Kaddevarmuth, Senior AI CoE Leader, will be on the show to discuss how you can scale your enterprise AI products.
July 02 - Jérémy Ravenel, Founder of naas.ai, will share what to look for when building AI agents for business functions.
Watch the latest episodes or listen to the podcast
Follow me on LinkedIn for daily posts about how you can lead AI in business with confidence. Activate notifications (🔔) and never miss an update.
Together, let’s turn HYPE into OUTCOME. 👍🏻
—Andreas