Abstract
The global podcast market is undergoing a structural transformation as audiences increasingly consume audio content through visual platforms. With YouTube surpassing one billion monthly podcast viewers and over half of Americans having watched a podcast, the medium can no longer afford to remain invisible in thumbnail-driven discovery environments. This article examines the visual turn in audio from the perspective of 1UpMedia, a Singapore-based production house specialising in transforming narrative podcasts into video. Drawing on the company's collaboration with BBC World Service on AI-animated adaptations of Witness History - which generated approximately 80,000 organic views in one month, including 51,000 views on a single two-year-old archival episode - the article presents original data on audience behaviour, comment sentiment analysis, and algorithmic compounding effects. It argues that narrative audio faces a distinct packaging problem that differs from talk-show podcasting, proposes a human-guided AI production framework, and identifies five trends shaping the future audio landscape. The article also presents early-stage visual explorations for European public broadcasters including NRK (Norway), Radio France, and ARD (Germany), and reflects on the implications for broadcasters across the Asia-Pacific region.
1. Introduction: When Audio Learned to Show Its Face
At Radiodays Europe 2026 in Riga, I opened my session with a game for the audience: two truths and one lie about podcasting.
· YouTube is the largest podcast platform.
· Younger audiences discover shows through video.
· Narrative audio is dead.
The audience identified the lie immediately, but the exercise made the point. The first two statements are so well established that they barely provoke debate. What remains contested is the third: the persistent assumption that narrative audio - documentaries, audio dramas, investigative series, storytelling podcasts - is in structural decline.
That assumption is wrong. But it persists because of a real and urgent problem that the industry has been slow to address. The problem is not that audiences dislike narrative audio. The appetite is there - for audio dramas, documentaries, fiction, and deep-dive storytelling. The problem is packaging. Discoverability is broken for narrative audio. In a media ecosystem where discovery is overwhelmingly driven by visual signals - thumbnails, autoplay, scrolling feeds - content that cannot show its face risks becoming invisible. Narrative audio cannot compete in a thumbnail-first world, and that is what this article is about.
As the founder of 1UpMedia, a Singapore-based production house that specialises in transforming narrative podcasts into video, I have spent several years working at the intersection of audio storytelling, visual design, and emerging AI technology. Our clients include the BBC World Service and Mediacorp, Singapore's national media network. We became the first podcast production house to be approved for generative AI productions with the BBC. Our own productions - from the award-winning true crime series Heinous to its visualised counterpart Grim Asia - have given us a front-row seat to a transformation that I believe will reshape the audio industry permanently.
2. The Market Shift: Why Audio Is Going Visual
2-1. The Platform Data
The data from the past two years paints a remarkably consistent picture across every major research firm and platform. Edison Research's Infinite Dial 2025 report found that 73% of Americans aged twelve and over have consumed a podcast in either audio or video form. Critically, 51% have watched a podcast - this is now a majority activity, not a niche behaviour. Weekly podcast consumption in the United States reached 40% in 2025, an all-time high, and total time spent with podcasts has grown by 355% since 2015.1)
YouTube reported over one billion monthly podcast viewers in early 2025 and has emerged as the single most-used platform for podcast discovery in the United States. A Deloitte study found that approximately 27% of American consumers were watching video podcasts weekly by autumn 2025, with Generation Z and Millennials leading adoption. Among Gen Z, 59% consume podcast content on YouTube, making it their most-used podcast platform.2)
Spotify's catalogue now includes nearly half a million video podcast shows, up from approximately 250,000 in mid-2024. Over 390 million users have engaged with video podcast content on the platform. The Spotify Partner Programme, launched in January 2025, catalysed an 80% increase in video consumption within its first year.3) Apple Podcasts began supporting video episodes by early 2026 - meaning every major platform now treats video as core infrastructure, not an experiment.
- 1) Edison Research. (2025). The Infinite Dial 2025. Edison Research. https://www.edisonresearch.com/the-infinite-dial-2025/
- 2) Deloitte. (2026). Technology, media and telecom predictions 2026: Video podcasts dominate. Deloitte Insights. https://www.deloitte.com/
- 3) Spotify. (2026, January). Spotify Partner Program and video podcast growth. Spotify Newsroom.
| Platform | Key Metric | Period |
|---|---|---|
| YouTube | 1 billion+ monthly podcast viewers | Early 2025 |
| YouTube | 700M+ hours of video podcasts streamed on TVs (monthly) | October 2025 |
| Spotify | ~500,000 video podcast shows | Late 2025 |
| Spotify | 390M+ users engaged with video podcasts | Q3 2025 |
| Spotify | 80% increase in video consumption post-Partner Programme | Jan 2025–Jan 2026 |
| Apple Podcasts | Video episode support launched | Early 2026 |
YouTube users streamed over 700 million hours of video podcasts on their televisions in October 2025, nearly double the figure from a year earlier.4) Deloitte predicts that global podcast advertising revenues will reach approximately USD $5 billion in 2026, up nearly 20% year-on-year. The message from the platforms and the market is unambiguous: the future of podcasting is multi-format, and video is no longer optional.
- 4) EMARKETER. (2026, February 27). FAQ on podcasting: Video's rise, CTV growth, and what it means for advertisers in 2026. EMARKETER. https://www.emarketer.com/
2-2. The Discovery Problem for Narrative Audio
Here is what most industry commentary misses: the visual turn does not affect all podcasts equally. Celebrity-hosted talk shows and interview formats have adapted relatively easily. They point a camera at the host, capture the conversation, and publish the footage. This is why the current video podcasting landscape is dominated by personality-led shows - it is simply easier to film people talking.
Narrative audio - the kind of podcasting that represents the medium's highest artistic achievement - faces a fundamentally different challenge. There is no host sitting behind a desk to film. There are no celebrity guests to thumbnail. There are carefully crafted stories built from archival recordings, sound design, narration, and editorial structure. These are the podcasts that win awards and define what makes audio unique. And they are the podcasts that struggle most in a visual-first world.
This is the gap that 1UpMedia was built to address. We do not add cameras to recording studios. We take nar
rative audio that was never designed to be visual and transform it into compelling video through motion design, archival visuals, character animation, and art direction. The podcast itself stays exactly as it is. The original MP3 is untouched. We layer visuals on top: scenes and backgrounds, characters and motion, titles and typography. The result is the same audio with visuals that platforms can surface and audiences can discover.
- 1) Edison Research. (2025). The Infinite Dial 2025. Edison Research. https://www.edisonresearch.com/the-infinite-dial-2025/
- 2) Deloitte. (2026). Technology, media and telecom predictions 2026: Video podcasts dominate. Deloitte Insights. https://www.deloitte.com/
- 3) Spotify. (2026, January). Spotify Partner Program and video podcast growth. Spotify Newsroom.
- 4) EMARKETER. (2026, February 27). FAQ on podcasting: Video's rise, CTV growth, and what it means for advertisers in 2026. EMARKETER. https://www.emarketer.com/
3. From Heinous to Grim Asia: Proving the Model
1UpMedia began as a full-service podcast production house working with television networks and media companies in Southeast Asia. Our flagship production, Heinous: An Asian True Crime Podcast, is a co-production with Mediacorp that has run for over 200 episodes. It became Singapore's largest true crime podcast and was recognised at the Asia Podcast Festival. In 2023, we were named Podcast Publisher of the Year by RadioInfo. In 2024, we became the first Asian podcast production house to be nominated for an Ambie - the industry's equivalent of the Emmy - competing alongside Sony Music and Warner Bros Discovery. We have also received Gold and Bronze awards at the New York Festivals.
But our analytics told us that listeners were increasingly discovering audio content through YouTube and social video. So we asked a question that reshaped our business: what if we could take the narrative audio of Heinous and transform it into a fully visualised series - not a podcast with a static image, but a genuine docu-animation that could stand on its own visually while preserving the audio storytelling?
The result was Grim Asia - a fully visualised docu-animation series built from the narrative audio of Heinous. Each episode required a visual language that complemented the audio without overwhelming it. The original audio remained completely untouched - nothing added, nothing removed. Our YouTube channel, built around this visualised narrative content, now generates 100,000 monthly views organically from approximately 20,000 subscribers. These are not views driven by celebrity guests; they are views driven by long-form narrative true crime - precisely the content that conventional wisdom says cannot succeed on YouTube.
One of the most common objections to podcast visualisation is the fear that video will cannibalise the audio audience. We found the opposite. In the year we launched our YouTube channel, our overall audience grew by 63%, despite Heinous already being Singapore's largest true crime podcast. The video audience was largely additive: these were people who would never have discovered the podcast through audio-only platforms. We consistently observed video viewers converting into audio listeners over time. As one YouTube commenter put it, they had been watching for months before deciding to search for the podcast on Spotify, and had been listening ever since. Video does not replace the podcast. It expands the universe.
Grim Asia became more than a spin-off. It became a proof of concept for an entirely new category of content: narrative podcasts reimagined as visual-first experiences for digital audiences. It demonstrated that audio-to-video transformation, when done thoughtfully, could reach audiences who would never have encountered the original podcast, extend the commercial life of existing audio assets, and create entirely new revenue streams. The experience of building Grim Asia - of confronting the creative and technical challenges of narrative visualisation at production scale - is what positioned us for our most significant international collaboration.
4. The BBC Witness History Collaboration
4-1. Production Process and Results
In early 2026, the BBC World Service released five AI-animated video adaptations of episodes from Witness History, its daily narrative history programme. The project was produced by 1UpMedia, and we became the first podcast production house approved for generative AI productions with the BBC. The five episodes were: The World's First Labradoodle, Brazil's Biggest Bank Heist, Ramesses II's 'Mummy Makeover', The Discovery of Lord Sipan in Peru, and Arrested for Playing Football in Brazil.5)
Our production process follows a strict principle: the podcast stays exactly as it is. The original MP3 is the foundation. We do not re-record, re-edit, or alter the audio. We layer visual elements on top: scenes and backgrounds that establish setting and mood, characters and motion that bring the narrative to life, and titles and typography that guide the viewer. For a broadcaster like the BBC, which operates under rigorous editorial standards, this separation between editorial content and visual adaptation is essential.
The results exceeded expectations. Across five episodes in the first month, the visualised Witness History content generated approximately 80,000 organic views with zero paid promotion. But the most striking result came from a single episode. The Lord of Sipan episode - about how a misadventure led to one of the most important archaeological discoveries in the Americas - was a two-year-old audio episode, buried deep in the BBC's back catalogue. After visualisation and publication on YouTube, it pulled 51,000 views in its first month and continued growing.
This finding directly challenges a persistent industry narrative. There is a widespread assumption that long-form audio is declining, that attention spans are too short, and that narrative podcasts are harder to grow. The Witness History data suggests otherwise: a strong piece of narrative audio, even years old, can reach entirely new audiences when properly packaged for visual platforms. The content was always good. What was missing was the packaging.
- 5) BBC. (2026a, February 24). BBC World Service to launch AI-animated editions of Witness History. BBC Media Centre. https://www.bbc.com/mediacentre/
4-2. What the Data Actually Taught Us
After the initial results came in, I conducted sentiment analysis on the comment sections of the five Witness History videos to understand how audiences were engaging with the content. The core question was whether viewers were discussing the visuals - the animation style, the art direction - or the story itself.
| Category | Share of Comments | Interpretation |
|---|---|---|
| Subject matter (history, characters, events) | 83% | Viewers engaged with the content as storytelling |
| Visual style (animation, art direction) | 17% | Visual layer noted but not dominant topic |
The result was decisive: 83% of comments were about the subject matter; only 17% addressed the animation. On the Sipan episode, the comments mostly debated the archaeological significance of the discovery, corrected terminology, and contributed additional historical context. Of the 17% that commented on the animation the top comment almost exclusively focused on how the artwork is good and how else we can improve, rather than the potential backlash of "Gen-AI slop".
"The animation is beautiful, but it would have been great to see pictures of the artifacts and the site."
— @cafecitoconazucar · 77 likes · May 4, 2026
This finding is important because it reframes the purpose of visualisation. When the visual treatment is done with care - designed to serve the story rather than compete with it - viewers do not engage with it as art. They engage with it as a window. The visuals disappear. The story takes over.
The practical implication is significant: you do not need a masterpiece to unlock YouTube. You need something good enough to make the story accessible on a visual platform. That bar is lower than the industry thinks, and the returns are higher.
Three additional findings emerged from the data. First, short narrative formats perform well on YouTube. Even episodes of approximately ten minutes can perform as strongly as longer ones, provided they land on a complete narrative beat. The variable is not length but story closure. Second, we observed that at least five episodes are required to trigger YouTube's algorithmic compounding behaviour. Around episodes five and six, earlier episodes began gaining views as newer ones dropped, and the algorithm started recommending the back catalogue. This is a pattern, not a fluke, and it does not occur below a critical mass. Third, the archival opportunity is being almost entirely ignored across the industry. The instinct is to visualise new releases as talk shows, but the compounding behaviour we observed was driven by a catalogue, not a launch strategy. The backlog is where the volume is.
4-3. The BBC's Broader Visual Podcast Strategy
The Witness History project was part of a much broader strategic move by the BBC. In October 2025, Beatrice Cooke, Service Executive for BBC iPlayer, presented at an EBU session on how iPlayer was harnessing podcast visualisation to build brands and deepen audience engagement.6) By March 2026, the BBC had expanded its video podcast strategy significantly, launching new titles across BBC iPlayer, BBC Sounds, and YouTube, including visualised formats for Uncanny with Danny Robins and cross-platform extensions of television brands like Sort Your Life Out and Race Across the World.7)
Internal BBC audience research found that three in five podcast fans had watched a podcast in the past week. Jonathan Kanagasooriam, Managing Editor for Podcast Strategy and Video Podcasts at BBC Sounds, argued that video podcasts allow audio to drive impact across the entire BBC. For a public service broadcaster navigating a rapidly fragmenting media landscape, visualisation has become a strategic necessity.
- 5) BBC. (2026a, February 24). BBC World Service to launch AI-animated editions of Witness History. BBC Media Centre. https://www.bbc.com/mediacentre/
- 6) EBU. (2025, October 7). Visualised podcasts on BBC iPlayer [Conference session]. EBU Video Talks. https://www.ebu.ch/video-talks/restricted/2025/10/BDMU/visualised-podcasts-on-bbc-iplayer
- 7) BBC. (2026b, March 13). BBC expands video podcast strategy with multiple new launches. BBC Media Centre.
5. AI in the Middle: A Framework for Responsible Production
The role of AI in podcast visualisation is both the most exciting and the most contested element of the visual turn. How the industry handles AI will determine whether this emerging practice earns and maintains audience trust. At Radiodays Europe 2026, I presented our production framework with a formulation that has since generated significant discussion: AI sits in the middle - not at the start, and not at the end.
| Stage | Led By | Function |
|---|---|---|
| 1. Creative Direction | Human | Editorial intent, story understanding, art direction decisions |
| 2. Production & Iteration | AI (with human guidance) | Iterates visual assets, scales production at speed |
| 3. Quality Control & Sign-off | Human | Curation, editorial review, final approval |
Every project starts with a human. Creative direction, editorial intent, and story understanding require human judgement, cultural sensitivity, and editorial accountability. AI cannot decide how to visually interpret a historical event or what emotional tone an animation should strike. AI then amplifies: it iterates on visual assets, scales production at speed, and enables a small team to produce content that would otherwise require a much larger budget. Every project ends with a human. Curation, quality control, and editorial sign-off are non-negotiable. The two questions we ask of every production are: is this starting from a human? And is this ending with a human?
Our AI principles extend to tool selection. We operate under two firm rules. First, we only use generative AI tools with proper commercial licences. If the terms are unclear or the usage is not above board, the tool does not make the cut. Second, we vet who is behind the technology - whether the developer operates independently and maintains control over their own data.8)
These principles aligned closely with the BBC's own AI Editorial Guidance, built on three core principles: acting in the best interests of the public, prioritising talent and creatives, and being transparent with audiences about the use of technology. The BBC has ruled out the use of generative AI for news stories or factual research, but supports its use for content production under editorial oversight.9) In the Witness History project, transparency was embedded from the outset.
- 8) Yeo, G. J. (2026, March). The visual turn in audio [Conference presentation]. Radiodays Europe 2026, Riga, Latvia.
- 9) BBC. (2025). Editorial guidance on the use of AI. BBC Media Centre. https://share.google/JyEmYKf1O9LecHpum
6. Beyond the BBC: International Explorations
The visual turn in audio is not confined to the English-speaking world. At Radiodays Europe 2026, I presented early visual explorations of how narrative shows from three major European public broadcasters could translate into video formats. NRK's Hele Historien (Norway), Radio France's Les Odyssées (France), and ARD's Kein Mucks! (Germany) each represent different genres and audiences but share a common challenge: rich, award-winning narrative audio that struggles to reach new audiences in a visual-first environment.
Each programme required a different visual approach. A Norwegian history documentary calls for a different aesthetic and pacing than a French children's adventure series or a German audio drama. This is not a one-size-fits-all process; it is a creative practice requiring deep understanding of both the source material and the target audience. The visual language of podcast visualisation - the art styles, the motion design, the use of colour and typography - must be adapted for each programme's editorial identity and cultural context. What works for a BBC World Service audience may not resonate in the same way with a Norwegian or German audience, and the production process must be flexible enough to accommodate these differences.
These explorations are significant because they demonstrate that narrative podcast visualisation is not an Anglo-American phenomenon. Public service broadcasters across Europe are sitting on decades of high-quality narrative audio - much of it produced in languages other than English - that has struggled to find new audiences in the vis
ual-first digital landscape. The Podcast Index indicates that while English accounts for approximately 55% of all podcasts, Spanish, Portuguese, French, and German collectively represent a significant and growing share.10) Visualisation offers a route to unlock this multilingual audio heritage for new audiences. From Singapore, I observe the Asia-Pacific region presenting a particularly compelling parallel. Mobile-first internet usage is the norm across Southeast Asia. Data costs have fallen dramatically in markets such as Nigeria, the Philippines, and Indonesia, making video streaming accessible to hundreds of millions of new users who were previously limited to audio-only consumption. Social video platforms - YouTube, TikTok, and regionally dominant platforms - are the primary modes of media consumption for younger demographics. In this context, an audio-only podcast is effectively invisible to many potential audiences. They discover content through visual platforms, and if audio content does not exist on those platforms, it simply does not exist for them.
At the Asia-Pacific Broadcasting Union, where I have contributed to discussions on the future of podcasting, there is growing recognition that visualisation represents a major opportunity for public broadcasters across the region to reach younger audiences who are disengaging from traditional radio and television. The global podcast audience is projected to reach approximately 619 million listeners in 2026 (Beamly, 2026), with China, India, and Latin America expected to become increasingly significant markets. For producers and broadcasters who can master narrative podcast visualisation across languages and cultures, the addressable audience is enormous and growing.
7. What Comes Next: Five Trends Shaping the Future Audio Landscape
7-1. The Convergence of Podcasting and Television
The boundary between podcasts and television is dissolving. Netflix and Spotify announced a partnership in late 2025 to bring select video podcasts to Netflix.11) YouTube users streamed over 700 million hours of video podcasts on televisions in a single month, and YouTube CEO Neal Mohan has noted that users now watch content on TVs more than on smartphones.12) As video podcasts compete on the same screens as prestige television, production expectations will rise. Narrative visualisation, with its emphasis on art direction and visual storytelling, is better positioned for this future than talk-show-format video.
7-2. AI-Powered Production at Scale
The cost and complexity of podcast visualisation will continue to fall as AI tools improve. Within a few years, production timelines may shrink further, potentially enabling near-real-time visualisation. The challenge will be developing editorial frameworks that maintain trust. We may also see new categories of AI-assisted visual content: personalised visual accompaniments, interactive narrative layers, and embedded data visualisations.
7-3. Cross-Platform Content Architectures
The most successful audio brands of the next five years will be those with coherent cross-platform content architectures - strategies that design a single piece of content to function across audio, long-form video, short-form social, and interactive formats simultaneously. Rather than creating a podcast and then asking how to promote it, producers will begin with a cross-platform brief from the outset.
7-4. Monetisation Beyond Advertising
Deloitte predicts global podcast advertising revenues of approximately USD $5 billion in 2026,13) but the most significant opportunities may lie beyond advertising. Spotify's Partner Programme offers direct creator compensation. Membership platforms and premium feeds are growing. For broadcasters, visualisation opens licensing and syndication opportunities unavailable for audio-only content. A visualised podcast can be sold as a television format, licensed internationally, and repurposed for educational contexts. The Witness History project creates an entirely new class of asset in the BBC's content library.
7-5. Ethical Frameworks and Industry Standards
As AI becomes more deeply embedded in production, the need for ethical frameworks will grow. The BBC's approach provides a strong model, but it is not yet the industry standard. Many production companies use AI without clear guidelines, transparency, or licensing arrangements. The industry needs shared standards for AI disclosure, editorial accountability, and creative attribution. The RSS.com AI disclosure feature and Apple Podcasts' requirement for AI transparency are promising steps,14) but considerably more work is needed.
- 10) PodcastVideos.com. (2026, March 3). AI enhances podcast accessibility: From visuals to disclosure. PodcastVideos.com. https://www.podcastvideos.com/
- 11) Axios. (2025). Netflix and Spotify partner on video podcasts. Axios. https://www.axios.com/2025/10/14/netflix-spotify-video-podcasts-the-ringer
- 12) EMARKETER. (2026, February 27). FAQ on podcasting: Video's rise, CTV growth, and what it means for advertisers in 2026. EMARKETER. https://www.emarketer.com/
- 13) Deloitte. (2026). Technology, media and telecom predictions 2026: Video podcasts dominate. Deloitte Insights. https://www.deloitte.com/
- 14) PodcastVideos.com. (2026, March 3). AI enhances podcast accessibility: From visuals to disclosure. PodcastVideos.com. https://www.podcastvideos.com/
8. Conclusion
At Radiodays Europe, I concluded my presentation with a line that I believe captures this moment: in a visual-first world, the best audio still wins - it just needs to show its face to the world.
This is not a concession to the primacy of video. It is a recognition that the power of audio storytelling - its intimacy, its ability to engage the imagination, its accessibility - can only be fully realised if it is discoverable by the audiences who would benefit from it. In 2026, discoverability is overwhelmingly visual.
The data from the Witness History collaboration speaks clearly. Eighty thousand organic views in a month. Fifty-one thousand views on a two-year-old archival episode. Eighty-three per cent of audience comments engaging with the subject matter, not the animation. A 63% audience growth rate when video was introduced alongside an established audio brand. These are not outliers; they are indicators of a structural shift.
Every broadcaster, every podcast network, every media company sitting on years of narrative audio should think carefully about what this means for their backlog. Old audio does not stay old if you repackage it. The content was always good. What was missing was the right packaging to find the audiences it deserves.
For those of us who love audio - who believe in its unique capacity to inform, to move, and to connect - the visual turn is not a threat. It is the greatest opportunity we have ever had. Narrative audio is not dead. It just needs to show its face.
The implications extend beyond individual production companies or broadcasters. For the broader media industry - including regulators, policymakers, and
public service media organisations - the visual turn raises important questions about how audio heritage is preserved, valued, and made accessible. Billions of hours of narrative audio content exist in archives around the world. Much of this content was produced at significant public expense and represents irreplaceable cultural and journalistic value. Visualisation offers a practical mechanism for unlocking this heritage for new generations of audiences who may never encounter it in its original audio form.
For Korea's broadcasting industry, which has a rich tradition of radio storytelling and audio content production, the visual turn presents both a challenge and an opportunity. Korean broadcasters are navigating many of the same platform dynamics described in this article: the dominance of YouTube among younger audiences, the growth of video-first consumption habits, and the pressure to extend content across multiple formats and platforms. The lessons from the BBC Witness History collaboration - particularly around the use of structured AI workflows, the importance of editorial oversight, and the surprising commercial potential of archival content - may be directly applicable to Korean broadcasters exploring how to extend the life and reach of their own audio catalogues.
We are still learning. Five episodes is a small sample. But the 83/17 sentiment split, the 51,000 archival views, the compounding algorithmic behaviour, and the 63% audience growth are all pointing in the same direction. The direction is clear enough to act on.