What's Real Anymore in the Age of Generative AI? — Generating the Voices and Videos
How Generative AI Helped Me Fabricate a Video News Story for $30 in One Afternoon (Part 3)
Recap: Generating the Script and Images
Earlier this month, I turned my curiosity and learning about generative AI into a creative project, using publicly available tools. In between running errands, I sat down to try it out. To create the assets with AI, I used nothing but a smartphone where the tools support mobile devices.
In my last post, I have shared how ChatGPT has been able to create a convincing story, fill in the details, provide scenes, and prompts for other generative AI tools — within a matter of minutes. And how Midjourney has allowed me to create the characters and scenes (and multiple variations of them) just as quickly.
Four generative AI tools have helped me create a full video of a fabricated news story in one afternoon. Today, I’ll share the final video and how I’ve used the two additional tools to create it...
Note: On April 25, 2023, MSNBC & others reported about an AI-generated, political video ad, published by the U.S. Republican National Committee. Hence, AI-generated information is no longer a thought experiment of the future and rather an imminent matter of the present.
» Watch the latest episodes on YouTube or listen wherever you get your podcasts. «
Disclaimer
The following is a creative project that I have created using several generative AI tools. The goal of this project is to start a discourse about the potential of generative AI technologies, learning and using the tools, combining the output of one tool with the input of another, and thinking about the ethical implications of AI-generated content at scale. Lastly, this project aims to prepare AI leaders to discuss these aspects with their business peers. The use of specific tools is neither an endorsement nor taking a position on the product. They are rather exemplary for applications for this medium. Finally, the objectives and motives of this project as well as the views expressed in this post and the linked videos are my own.
Despite all the good we can expect generative AI to create, it will also make it a lot easier to create and spread misinformation.
“How will we know what’s real anymore? And how can we tell?”
Here is the final video that I created using four generative AI tools:
The Story
PaperSnap Enterprises, the leader in the “digital paper clip” industry, has had an incident at their main manufacturing facility. A toxic substance has leaked into the air after PaperSnap had cut corners in their safety program. The company had tried to keep the incident a secret, but residents have learned about it when the reporters started arriving in their town, interviewing locals. In light of these revelations, the CEO of PaperSnap is forced to make a public statement in front of the press. This situation and the surrounding circumstances are part of a breaking news segment, covered by anchor Peter Miller and reporter Sarah Johnson.
Adding Additional Depth with Publicly Available AI Tools
In yesterday’s post, I covered how ChatGPT and Midjourney have created the text and images (total subscription for both: $30/mo.). I had come across ElevenLabs and D-ID through my Social feed and for the next part of creating the video, I used their free trials: to create the characters voices and animating their lip movements.
3) Giving Characters A Voice
Technology: Text-to-audio
Duration: approx. 15 minutes
Creating the audio tracks for each scene was a matter of minutes. Copy & paste each script or segment from ChatGPT’s video talk track into ElevenLabs. Select a different voice for each character, generate it. Download the audio file — and you’re done.
Unlike speech-to-text from just a few years ago which sounds choppy, AI is able to add emphasis and modulation to the voice. Depending upon the voice you choose, you can add more emotions to make it even more realistic. I went with the default settings. This sounds quite convincing. What do you think?
Next, I wanted to make the characters even more realistic by adding what seems like actual lip movements.
4) Bringing 2D Characters To Life With Animation
Technology: Image-to-video, audio-to-video
Duration: approx. 15 minutes.
The fastest step in the process was turning the characters I had created in Midjourney into videos. For the characters to move their lips to the voice over: Upload the image from Midjourney and the audio file from ElevenLabs. Within 1-2 minutes, the video clip was ready for download. While frontal shots have the most realistic look, even from an angle, D-ID has produced surprisingly good results. (When using the free version, the watermark needs to remain visible in the video under the terms & conditions.)
Time to edit it now into a full clip…
5) Putting It All Together
Duration: approx. 60 minutes
Creating the final video was one of the longest tasks in this process, mainly due to editing and creating additional scenes such as the title screen, breaking news animation, and adding credits.
Conclusion
I’ve been writing about various aspects of generative AI and its social/ societal impact — for example, using outsourced labor for content moderation, the early hype of prompt engineering, the ethical implications of generative AI, the impact on communication as it gets easier to combine different types of AI-generated media (e.g. text + audio + video), and the responsibility put into the public’s hands.
For this project, ElevenLabs has been able to create realistic, synthetic voices within a matter of minutes. D-ID has allowed me to bring the characters to life even further by animating their head, eye, and lip movements. Both, ElevenLabs and D-ID offer free versions of their tools, bringing down the cost to generate information to zero while delivering high-quality output quickly.
This the third post in a 4-part series. I’m sharing my thoughts and approach to using generative AI to create a video of a fabricated news story. Tomorrow, I’ll share the final post on the ethical and societal implications — stay tuned for more...
How will you know what’s real anymore? How can you tell?
» Watch the latest episodes on YouTube or listen wherever you get your podcasts. «
What’s next?
Appearances
June 8 - Panel discussion with Transatlantic AI eXchange on Web 3.0 Generative and Synthetic Data Application
Join us for the upcoming episodes of “What’s the BUZZ?”:
May 9 - Brian Evergreen, Founder & CEO The Profitable Good Company & Author, will discuss how manufacturing businesses can Create A Human Future With AI.
June 8 - Ravit Dotan, Director The Collaborative AI Responsibility Lab at University of Pittsburgh, will join when we cover how responsible AI practices evolve in times of generative AI.
Follow me on LinkedIn for daily posts about how you can set up & scale your AI program in the enterprise. Activate notifications (🔔) and never miss an update.
Together, let’s turn hype into outcome. 👍🏻
—Andreas