Since publishing the AI Leadership Handbook this fall, several readers have asked me whether there is also an audiobook edition available. So far, the answer has been: “Unfortunately, not.” So far.
Looking at the stats, though, audiobooks are a growing category (at 26.4% CAGR until 20301). Especially leaders and professionals favor audiobooks as a way to upskill and consume information on the go. So, it was about time I looked into creating an audiobook. Here is what I’ve found along the way and how AI will further reshape the publishing industry—if publishers let it.
Disrupting Audiobook Production with Voice Cloning
Recently, another author who’s worked with a publisher shared that their publisher released an audiobook narrated by a voice actor. The benefit is that a professional speaker records it—top pronunciation and intonation included. But for many authors (including myself), it lacks authenticity.
For self-published authors, the process is largely the same. Amazon/ Audible owns about 40% of the market share, followed by Spotify. Traditionally, authors sign up to the platform and post their audiobook projects. Voice actors audition for the narration, and authors select the one they’re most comfortable with. The entire project can easily cost between $2,000-$5,000.
Within the last year, Amazon/ Audible has invited a small group of voice actors to participate in their early adopter AI program. As part of it, voice actors can clone their voice using Amazon’s technology and partially automate the narration2. However, if an author wants to narrate their audiobook in their own voice, they must work with a professional human voice actor or sit down and narrate it themself. Given strict requirements for audio files, bit rates, audio engineering, etc., it is not a trivial process. But why?
Voice cloning, the technology to narrate your own audiobook, already exists. It’s commercially available–and it costs just a fraction of what a human voice actor charges per hour. Reason enough to try it out myself.
Thank you for reading The AI MEMO. Feel free to share it with your network.
Andreas Welsch uses real-world knowledge and examples from interviews with over 60 leaders and experts in AI to help you both introduce and incorporate AI into your organization, from aligning it with your business strategy to turning new-to-AI employees into passionate multipliers to making sure humans stay at the center of your AI use. After listening to this book, you will be able to confidently implement AI in your business, no matter your industry.
Recording High-Quality Audio Anywhere and Anytime with AI
I’ve been following ElevenLabs for a while. Their AI-enabled text-to-speech platform is currently the best in the market. ElevenLabs allows users to create a “Professional Voice Clone,” a synthetic replica of their own voice. The quality is amazing. (My children could not tell the difference when I played an AI-generated recording of my voice.)
Recording myself for about 40 minutes while reading blog posts has been sufficient data to create my voice clone. The longer the sample, the better the output quality. Within less than two hours, ElevenLabs had automatically fine-tuned a model that I could then use to narrate the manuscript.
From there, the production was straightforward: Create an audiobook project in ElevenLabs and paste the chapters of the manuscript. Select the narrator’s voice (e.g. your own), adjust the parameters that control the variance in the generated audio, and generate the audio paragraph by paragraph.
Voice settings in ElevenLabs audiobook project
Here’s an example from the final audiobook using ElevenLabs: “Part I: Building Leadership Foundations
1×
0:00
-1:19
You might not realize it, but your human voice can sound different—in the morning or the evening, when you’re excited or when you’re tired. But not your AI voice clone. It’s consistent. It’s always available (day or night), even on the go; and you don’t need to carry your recording studio equipment around. ElevenLabs even lets you export your files in the format that audiobook platforms expect (bit rate, etc.), fully engineered and mastered.
Currently, the majority of distributors explicitly prohibit the distribution of AI-narrated audiobooks, except audiobooks.com and Rakuten Kobo3. Once the audio files are ready for export in ElevenLabs, you can upload them to these distributors to build and publish your new audiobook.
Using Voice Cloning as an Economic Alternative
Voice actors are currently enjoying protection from publishers and distributors. This could be because of several reasons:
Volume: Publishers will otherwise be flooded with AI-generated content without having proper technology or mechanisms to audit every submission.
Quality: Publishers are concerned about a degrading level of quality in the published audiobooks, which could lead to a drop in users. (We’ve all heard the robotic text-to-speech voices in the past that lack emotion and proper intonation or have seen the ChatGPT-generated books on Amazon.)
Revenue: Publishers don’t participate in the economics of creating audiobooks (unlike when brokering voice actors).
Employment: Actors4 unions and associations are advocating heavily for humans to remain the dominant force for recording.
Whether or not to use an AI-generated voice becomes another economic decision. This is especially true as technology improves even further, shorter audio samples will be required to create a voice clone, and the cost of computing resources will continue to drop. As with most economic decisions: If it can be done cheaper (while maintaining the same quality), it will be done cheaper.
However, the business model for voice actors and narrators can change as well. Pricing should not just be on a transactional basis (per credit). Instead of being paid “Per Finished Hour (PFH)” of content (the finished product), they can complete narrations faster and charge for the uniqueness of their voice.
My Learnings From This Project
The project to produce the audiobook edition of the AI Leadership Handbook myself has been valuable learning for me–from assessing available voice cloning methods and tools to adapting parts of the manuscript for a narrated version, reviewing distribution agreements, and finally publishing the finished product.
From start to finish, the process took about 3-4 days: half a day to adapt the manuscript, two days to complete the AI narration, and one day to review distribution agreements and set up the audiobook. Add to that the cost of $99/mo. for ElevenLabs’ Pro plan (required for high bit rate output and 500k credits).
Using voice cloning has allowed me to create a final product with the necessary diligence, quality, and consistency in a fraction of the time and for a fraction of the cost while having the flexibility to record it anywhere—for example in the parking lot of my local grocery store.
If you enjoy listening to audiobooks or want to see for yourself how far voice cloning has come, I hope you enjoy listening to the AI Leadership Handbook as much as I have enjoyed producing it.
Join my bi-weekly live stream and podcast for leaders and hands-on practitioners. Each episode features a different guest who shares their AI journey and actionable insights. Learn from your peers how you can lead artificial intelligence, generative AI & automation in business with confidence.
Findaway Voices by Spotify is another option for AI-generated narration. They distribute to several other sites, but again, the prerequisite is using their AI narration. And lastly, Google allows authors to have their audiobook narrated by AI as well.
If you’re looking to get maximum reach for your audiobook, dominant platforms like Amazon/Audible and Findaway Voices by Spotify are the way to go (for the time being). For author-narrated audiobooks, you will either need to invest the time to record it or distribute your book via publishers such as Kobo who are ahead of the curve. The AI genie is out of the bottle, and there is no putting it back. Publishers know it, and they’re trying to limit the damage (and control the opportunity), for now.