By B.N. Frank
Research has indicated that kids’ use of screens isn’t all that good for them for a variety of reasons. If you were looking for another reason to monitor and/or reduce your kids’ screen use, this one may be sufficient.
YouTube’s Captions Insert Explicit Language in Kids’ Videos
The AI that transcribes spoken dialog on the platform’s standard version can render “corn” as “porn,” “beach” as “bitch,” and “brave” as “rape.”
Nearly 400,000 people subscribe to the YouTube account Rob the Robot – Learning Videos For Children. In one 2020 video, the animated humanoid and his friends visit a stadium-themed planet and attempt feats inspired by Heracles. Their adventures are suitable for the elementary school set, but young readers who switch on YouTube’s automated captions might expand their vocabulary. At one point YouTube’s algorithms mishear the word “brave” and caption a character aspiring to be “strong and rape like Heracles.”
A new study of YouTube’s algorithmic captions on videos aimed at kids documents how the text sometimes veers into very adult language. In a sample of more than 7,000 videos from 24 top-ranked kids’ channels, 40 percent displayed words in their captions found on a list of 1,300 “taboo” terms, drawn in part from a study on cursing. In about 1 percent of videos, the captions included words from a list of 16 “highly inappropriate” terms, with YouTube’s algorithms most likely to add the words “bitch,” “bastard,” or “penis.”
Some videos posted on Ryan’s World, a top kids’ channel with more than 30 million subscribers, illustrate the problem. In one, the phrase “You should also buy corn” is rendered in captions as “you should also buy porn.” In other videos, a “beach towel” is transcribed as a “bitch towel,” “buster” becomes “bastard,” a “crab” becomes a “crap,” and a craft video on making a monster-themed dollhouse features a “bed for penis.”
“It’s startling and disturbing,” says Ashique KhudaBukhsh, an assistant professor at Rochester Institute of Technology who researched the problem with collaborators Krithika Ramesh and Sumeet Kumar at the Indian School of Business in Hyderabad.
Automated captions are not available on YouTube Kids, the version of the service aimed at children. But many families use the standard version of YouTube, where they can be seen. Pew Research Center reported in 2020 that 80 percent of parents to children 11 or younger said their child watched YouTube content; more than 50 percent of children did so daily.
KhudaBukhsh hopes the study will draw attention to a phenomenon that he says has gotten little notice from tech companies and researchers and that he dubs “inappropriate content hallucination”—when algorithms add unsuitable material not present in the original content. Think of it as the flip side to the common observation that autocomplete on smartphones often filters adult language to a ducking annoying degree.
“The benefits of speech-to-text are undeniable, but there are blind spots in these systems that can require checks and balances.”
Ashique KhudaBukhsh, assistant professor, Rochester Institute of Technology
YouTube spokesperson Jessica Gibby says children under 13 are recommended to use YouTube Kids, where automated captions cannot be seen. On the standard version of YouTube, she says the feature improves accessibility. “We are continually working to improve automatic captions and reduce errors,” she says. Alafair Hall, a spokesperson for Pocket.watch, a children’s entertainment studio that publishes Ryan’s World content, says in a statement the company is “in close and immediate contact with our platform partners such as YouTube who work to update any incorrect video captions.” The operator of the Rob the Robot channel could not be reached for comment.
Inappropriate hallucinations are not unique to YouTube or video captions. One WIRED reporter found that a transcript of a phone call processed by startup Trint rendered Negar, a woman’s name of Persian origin, as a variant of the N-word, even though it sounds distinctly different to the human ear. Trint CEO Jeffrey Kofman says the service has a profanity filter that automatically redacts “a very small list of words.” The particular spelling that appeared in WIRED’s transcript was not on that list, Kofman said, but it will be added.
“The benefits of speech-to-text are undeniable, but there are blind spots in these systems that can require checks and balances,” KhudaBukhsh says.
Those blind spots can seem surprising to humans who make sense of speech in part by understanding the broader context and meaning of a person’s words. Algorithms have improved their ability to process language but still lack a capacity for fuller understanding —something that has caused problems for other companies relying on machines to process text. One startup had to revamp its adventure game after it was found to sometimes describe sexual scenarios involving minors.
Machine learning algorithms “learn” a task by processing large amounts of training data—in this case audio files and matching transcripts. KhudaBukhsh says that YouTube’s system likely inserts profanities sometimes because its training data included primarily speech by adults, and less from children. When the researchers manually checked examples of inappropriate words in captions, they often appeared with speech by children or people who appeared not to be native English speakers. Previous studies have found that transcription services from Google and other major tech companies make more errors for non-white speakers and fewer errors for standard American English, compared with regional US dialects.
Rachael Tatman, a linguist who coauthored one of those earlier studies, says a simple blocklist of words not to use on kids’ YouTube videos would address many of the worst examples found in the new research. “That there’s apparently not one is an engineering oversight,” she says.
A blocklist would also be an imperfect solution, Tatman says. Inappropriate phrases can be constructed with individually innocuous words. A more sophisticated approach would be to tune the captioning system to avoid adult language when working on kids’ content, but Tatman says it wouldn’t be perfect. Machine learning software that works with language can be statistically steered in certain directions, but it is not easily programmed to respect context that seems obvious to humans. “Language models are not precision tools,” Tatman says.
KhudaBbukhsh and his collaborators devised and tested systems to fix taboo words in transcripts, but even the best of those inserted the correct word less than a third of the time for YouTube transcripts. They will present their research at the Association for the Advancement of Artificial Intelligence’s annual conference this month and have released data from their study to help others explore the problem.
The team also ran audio from kids’ YouTube videos through an automated transcription service offered by Amazon. It too sometimes made mistakes that made the content edgier. Amazon spokesperson Nina Lindsey declined to comment but provided links to documentation advising developers how to fix or filter unwanted words. The researchers’ results suggest those options might be wise when transcribing content for children: “Fluffy” became the F-word in the transcript of a video about a toy; one video host asked viewers to send in not “craft ideas” but “crap ideas.”
Artificial intelligence (A.I.) algorithm biases, inaccuracies and vulnerabilities are nothing new (see 1, 2, 3, 4, 5, 6). In fact, people have been accused and convicted of crimes based on inaccuracies (see 1, 2) including from the use of expensive A.I.-based ShotSpotter technology that’s been installed in many cities. Of course, experts have warned for years about A.I. because of these issues and others.
Activist Post reports regularly about A.I. and other unsafe technology. For more information, visit our archives.
Provide, Protect and Profit from what’s coming! Get a free issue of Counter Markets today.