How KQED is using AI to explore the future of news audio

This post was originally published on Google’s The Keyword.

Radio reaches more Americans every week than any other platform. Public radio stations in the United States have over 3,000 local journalists, and each day they create audio news reports about the communities they serve. But news audio is in a similar place as newspaper articles were in the 1990s: hard to find and difficult to sort by topic, source, relevance or recency. News audio can not delay in improving its discoverability.

In partnership with Google, KQED and KUNGFU.AI, an AI-services provider and leader in applied machine learning, ran a series of tests on KQED’s audio to determine how we might reduce the errors and time to publish our news audio transcripts and, ultimately, make radio news audio more findable.

“One of the pillars of the Google News Initiative is incubating new approaches to difficult problems,” said David Stoller, partner lead for News & Publishing at Google. “Once complete, this technology and associated best practices will be openly shared, greatly expanding the anticipated impact.”

What makes finding audio so much harder?

For news audio to be searched or sorted, the speech must first be converted to text. This added step is trickier than it seems and currently puts news audio at a disadvantage for being found quickly and accurately. Transcription takes time, effort and bandwidth from newsrooms — not something that is in abundance these days. Though there have been great advances in speech-to-text, when it comes to news, the bar for accuracy is very high. As someone who works to make KQED’s reporting widely available, it is frustrating when KQED’s audio isn’t prominent in search engines and news aggregators.

The challenge of correctly identifying who, what and where

For our tests, KQED and KUNGFU.AI applied the latest speech-to-text tools to a collection of KQED’s news audio. News stories try to address the “five Ws”: who, what, when, where and why. Unfortunately, because AI typically lacks the context in which the speech was made (i.e., identity of the speaker, location of the story), one of the most difficult challenges of automated speech-to-text is correctly identifying these types of proper nouns, known as named entities. KQED’s local news audio is rich in references of named entities related to topics, people, places and organizations that are contextual to the Bay Area region. Speakers use acronyms like “CHP” for California Highway Patrol and “the Peninsula” for the area spanning San Francisco to San Jose. These are more difficult for artificial intelligence to identify.

When named entities aren’t understood, machine-learning models make their best estimation of what was said. For example, in our test, “The Asia Foundation” was incorrectly transcribed as “age of Foundations,” and “misgendered” was incorrectly transcribed as “Miss Gendered.” For news publishers, these are not just transcription errors but editorial problems that change the meaning of a topic and can cause embarrassment for the news outlet. This means people have to go in and correct these transcriptions, which is expensive to do for every audio segment. Without transcriptions, search engines can’t find these stories, limiting the amount of quality local news people can find online.

An illustration showing a new proposed process for audio transcription where the human correcting the mistakes in the first version helps inform it to make the transcription more clear, accurate for the future.

A machine learning ↔ human ↔ machine learning feedback loop

At KQED, our editors can correct common machine learning errors in our transcripts. But right now, the machine-learning model isn’t learning from its mistakes. We’re beginning to test out a feedback loop in which newsrooms could identify common transcription errors to improve the machine-learning model.

We’re confident that in the near future, improvements into these speech-to-text models will help convert audio to text faster, ultimately helping people find audio news more effectively.

If you are interested in learning more, you can read the full report here. Thanks to the project collaborators: Lowell Robinson and Ethan Toven-Lindsey at KQED; Andy Cheatwood at KPCC; Freyja Balmerk at WNYC; David Moore at WBUR; Brin Winterbottom at NPR; and Andrew Kuklewicz and Brandon Hundt at PRX.

Tim Olson works on strategic planning and partnerships for KQED, such as Google News on Assistant. He is active in public media, including on the Digital Infrastructure Group and the Major Action Committee. Tim has worked at multiple public media organizations, including KCTS in Seattle and WTTW in Chicago.

How KQED is using AI to explore the future of news audio

What makes finding audio so much harder?

The challenge of correctly identifying who, what and where

A machine learning ↔ human ↔ machine learning feedback loop

Leave a comment Cancel reply

Featured Jobs

How KQED is using AI to explore the future of news audio

What makes finding audio so much harder?

The challenge of correctly identifying who, what and where

A machine learning ↔ human ↔ machine learning feedback loop

Leave a comment Cancel reply

More News

Featured Jobs