The director of a documentary film uses an AI engine so that his celebrated, deceased subject can speak from beyond the grave: theverge.com.
A musician creates a business built around deepfake technology, letting other musicians engage with her voice: rollingstone.com.
Bedroom producers make “fan fiction” songs featuring the AI-engineered voices of actual stars: billboard.com.
Synthetic voices belatedly catch up with CGI, and all-digital animation may be in our near future: technologyreview.com.
Initial vaguely related thoughts:
All bands start as cover bands.
There’s a whole culture of nightclub performers, cover bands, and actors having careers (or partial careers) being other people.
There’s an uncanny valley between John Fogerty being sued for sounding like himself and the verdict against Robin Thicke and Pharrell Williams in the “Blurred Lines” case.
A lot of the voices of fictional robots and androids in film and television are the voices of humans (see: 2001: A Space Odyssey, WarGames, Max Headroom, Colossus: The Forbin Project, and so on).
The future is especially meaningful when viewed through the lens of the past.
Jamming algorithmic econo
Judging by this recent interaction with my phone, speech-to-text (STT) still has a long ways to go. Consider that the message being transcribed here was, with the exception of my name (Marc Weidenbaum, apparently aka “Mark we D Bond”), a rote automated robocall script that thousands of people have no doubt received. (That is “Mark we D Bond,” aka “Mark, we didn’t bun,” aka “Mark, we did one.) You’d think that after that many attempts by a system to transcribe the same audio over and over and over, the system would have accomplished a closer approximation.
A friend who saw me post this on social media, when it was a slightly more inchoate thought, pointed out that this STT incident was like the machine had written lyrics for the band the Minutemen (an especially ironic observation, since the band’s singer was D Boon and the Minutemen recorded for a label called SST). That idea made me think about the cut-up work of William S. Burroughs, and how with STT you could write rough-draft lyrics for a song by saying words along with the melody, and then have the transcription service make mincemeat gibberish of the rough draft, and then you could sing the resulting mincemeat gibberish with full conviction.
Oh, and the “Prima Newman” sentence was in Spanish, as was part of the preceding sentence. “Newman on the way day” is “número nueve” (number nine, number nine, number nine …). It is fascinating, and revealing, that the automated STT service can’t switch gears particularly well when the speaker, even an automated one, changes language so quickly. For individuals in situations (protests, authoritarian regimes, etc.) where they are trying to avoid STT surveillance (scenario: audio > STT > algorithmic filter > legal/police action), this multilingual approach seems like a potential tactic (along the lines of playing copyrighted material to avoid footage being archived publicly on YouTube, TikTok, etc.).
Even if "I" seem to be off, or even a little off
Time and again I’ve learned that the surest way for people to interact with technology is for them to experience the technology as not just a useful tool, but as a presence that is eager to help. I saw a fascinating presentation a few years back by a researcher who showed that simply adding a pair of stick-on googly eyes to a device would significantly increase the likelihood that people would interact with it.
There is a lot of discussion about matters of gender in the roles of today’s personal digital assistants, such as Siri and Alexa, though less so about the tone of those interactions, the balance they strike between authoritative resource and obsequious servant. The intelligences that animate our phones and “smart home” devices walk a tightrope that is suspended across the deep chasm we have come to call the Uncanny Valley (the scenario in which certain approximations of the human by the digital have the opposite effect of the googly eyes: repelling us rather than enchanting us).
As time passes, these digital assistants will serve as interpersonal middleware, along the lines of the Google Duplex service, which can initiate and make a call on your behalf, communicating a request to someone on the other end — and perhaps at some future date, to a digital assistant serving your intended interlocutor. The two parties’ mutual assistants might have numerous communications before their human guardians ever might speak to each other directly.
The humble doorbell, a device that serves as a technological messaging tool, is a model of such interaction. The tradition intercom, by extension, facilitates communication without itself participating, to varying degrees. In the case of this apartment building’s aftermarket solution, one takes for granted that the “I” in “I’m on” is not one of the human inhabitants, but the device itself. This intercom’s screen may have gone dead, but its purpose, its utility, lives on, and someone sorted out that by telling the visitor so in an enthusiastic tone would improve the intended interaction.
Also: Note the nose-like protuberance that is the exposed lock mechanism, a bit of chance anthropomorphism. Also: Note that one doesn’t “just” push the apartment number; one also pushes the hashtag (né pound, as in one must pound the button to make certain it has its desired effect).
Nor a robot
I spent an hour in the park and took a half dozen pictures. Then on the walk home I took more than five dozen. If you can’t identify any forest in these images, it’s not because you’re a robot.
What music I make often aspires to the hum of a refrigerator, and apparently my photos aspire to those of a captcha lineup.
From the past week
I do this manually each Saturday, collating recent tweets I made at twitter.com/disquiet, my public notebook. Some tweets pop up (in expanded form or otherwise) on Disquiet.com sooner. It’s personally informative to revisit the previous week of thinking out loud.
▰ Replacing the sonic boom with a “sonic thump,” a new supersonic airplane gets even more streamlined. NASA is due to begin “acoustic validation” tests in two years: cnet.com.
▰ I love the sound of the city digging up the neighbor’s sidewalk with something aggressively pneumatic this early in the morning. It sounds like infrastructure.
▰ Tried to record the sizzle of the bibimbap at lunch, but too much chatter. And that’s OK. Chatter is itself a nice return to reality.
▰ I wonder if the neighbor yelling at the city employee working on the other neighbor’s sidewalk is aware the yelling is in many ways just as disturbing if not more so in the morning.
▰ Have a good weekend, folks. At this rate I’ll have 4,000 photos to cull through by Monday. And I haven’t even started shooting at night yet. Perhaps I’ll have written that many words by then, too.