Multi-Modal Generative AI Applications In Business — A Glimpse At The Future

A Picture Speaks More Than A Thousand Words — Or Tokens.

Sep 27, 2023

The last 12 months have brought us a tidal wave of AI innovation that, some days, even feels like a hurricane. Consumers have been the first target group to become part of this wave of AI through their use of applications like DALL-E 2, Midjourney, ChatGPT, Bard — you name it. Slowly but surely, these generative AI innovations are making their way into business software: image generation with assurances of legal compliance1 or a new category of AI-powered assistants (aka copilots) that users can interact with through chat.

Most of these tools have started out with just one modality: text-to-image or text-to-text. This summer, OpenAI introduced support for image-to-text/code capabilities2. All of a sudden, ChatGPT had become multi-modal. And this week again, OpenAI has demonstrated the kinds of capabilities you can enable when you combine different modalities. As users (and decision makers) will come to expect these kinds of capabilities to be available in their in-house applications as well, here’s where the potential lies.

Starting Your Work With An Image

OpenAI demo’ed a pretty powerful use case for consumers. Being able to take a picture of a object (e.g. bike) and asking ChatGPT questions about how to lower your bike seat, whether you have the right tools, and which one exactly you need to use3 — and do all of that from images uploaded to ChatGPT. But this combination of modalities becomes even more powerful, when you apply the same idea in your business. Because, at the end of the day: time saved equals money saved.

Let’s look at two examples:

Field Service: Take the example of a field service technician. A core part of their job is to maintain machines, devices, or, more generally speaking, assets. Machines that have stopped working mean lost revenue and added cost for repair. Any amount of minutes you can shave off the time it takes to diagnose and repair the failure translates into tangible business value, for example Dollars. Why? Because, not only can you reduce the time it takes to complete the repair (labor), you can also identify the needed parts and tools more quickly, and further minimize downtime (missed revenue opportunity).

Logistics: Early on in my career, one of my tasks was accepting deliveries of Laptops and Desktops we had ordered from a distributor. Depending upon the number of items (and what else needed to be done at the same time), this could be a pretty tedious task. Now, depending upon the industry your company is in, this might not just be a monthly or weekly activity, but even a daily or more frequent one.
For example, your company receives a shipment of motors for the lawnmower that it manufactures. Before officially accepting the delivery, you need to check that the received quantity of motors matches the ordered quantity. A multi-modal AI capability in your logistics app could help:

Identify the item quantity listed in the shipping documents (text-based processing),
Determine the number of items that are actually on the pallet or trailer (parts of an image),
…and compare the two (simple math).

Innovations like this could significantly accelerate the delivery process. In addition, it could also reduce the financial loss from falsely accepted, incomplete shipments.

Requirements And Potential Benefits

At the surface, these use cases look suspiciously similar to the image recognition/ segmentation and object detection use cases from a few years ago — pre-generative AI. AI can identify objects in images and tell you the likelihood that what it has identified is in fact a hammer or a screwdriver, or a bike.

But here’s the difference with this generation of AI technology: unlike the previous generation of image recognition, this time around, the model can describe what it recognizes in the image, and give you instructions what you should do next without you having to define or train it upfront. This is huge! It saves time and reduces cost when building generative AI based applications.

These image-based use capabilities will work well for everyday objects like bicycles that the models have seen plenty examples of. Businesses looking to take images of their proprietary products (e.g. lawnmowers) as input will likely need to augment off-the-shelf models with their own data. And availability of usable data will, yet again, be the ultimate test.

In line with the theme of recent workplace studies that have found productivity, performance and quality to increase when employees use generative AI for their work, multi-modal AI capabilities could help various groups in a business along these dimensions — from new hires to junior employees and from exceptional to routine tasks. Add the models’ additional new ability to process and to generate audio, and you can stretch the possibilities even further…

Where would you use multi-modal, generative AI in business?

Thank you for reading The AI MEMO. Feel free to share it with your network.

Join my bi-weekly live stream and podcast for leaders and hands-on practitioners. Each episode features a different guest who shares their AI journey and actionable insights. Learn from your peers how you can lead artificial intelligence, generative AI & automation in business with confidence.

Join us live

September 28 - Steve Wilson, Project Lead, OWASP & Chief Product Officer, Contrast Security, will join to discuss the most common vulnerabilities of LLM-based applications and how to mitigate them.
October 12 - Matt Lewis, Chief AI Officer, will discuss how you can grow your role as a Chief AI Officer.
October 24 - Harpreet Sahota, Developer Relations Experts, will join when we talk about augmenting off-the-shelf LLMs with new data.
November 07 - Tobias Zwingmann, AI Advisor & Author, will share which open source technology you need to build your own generative AI application.

Watch the latest episodes or listen to the podcast

Find me here

September 28 - Moderator of WowDAO Worldwide AI & Web3 Summit, AI Hackathon Winner Ceremony.
October 11 - Put Generative AI to Work, Unveiling Tomorrow's Possibilities – Insights from 30 AI Visionaries on the Future of Generative AI in Business.

Follow me on LinkedIn for daily posts about how you can lead AI in business with confidence. Activate notifications (🔔) and never miss an update.

Together, let’s turn HYPE into OUTCOME. 👍🏻
—Andreas

TechCrunch, Adobe indemnity clause designed to ease enterprise fears about AI-generated art, 26 June 2023, Last accessed: 27 September 2023.

ArsTechnica, Report: OpenAI holding back GPT-4 image features on fears of privacy issues, 18 July 2023, Last accessed: 27 September 2023.

OpenAI, ChatGPT can now see, hear, and speak, 25 September 2023, Last accessed: 27 September 2023.