Judging by this recent interaction with my phone, speech-to-text (STT) still has a long ways to go. Consider that the message being transcribed here was, with the exception of my name (Marc Weidenbaum, apparently aka “Mark we D Bond”), a rote automated robocall script that thousands of people have no doubt received. (That is “Mark we D Bond,” aka “Mark, we didn’t bun,” aka “Mark, we did one.) You’d think that after that many attempts by a system to transcribe the same audio over and over and over, the system would have accomplished a closer approximation.
A friend who saw me post this on social media, when it was a slightly more inchoate thought, pointed out that this STT incident was like the machine had written lyrics for the band the Minutemen (an especially ironic observation, since the band’s singer was D Boon and the Minutemen recorded for a label called SST). That idea made me think about the cut-up work of William S. Burroughs, and how with STT you could write rough-draft lyrics for a song by saying words along with the melody, and then have the transcription service make mincemeat gibberish of the rough draft, and then you could sing the resulting mincemeat gibberish with full conviction.
Oh, and the “Prima Newman” sentence was in Spanish, as was part of the preceding sentence. “Newman on the way day” is “número nueve” (number nine, number nine, number nine …). It is fascinating, and revealing, that the automated STT service can’t switch gears particularly well when the speaker, even an automated one, changes language so quickly. For individuals in situations (protests, authoritarian regimes, etc.) where they are trying to avoid STT surveillance (scenario: audio > STT > algorithmic filter > legal/police action), this multilingual approach seems like a potential tactic (along the lines of playing copyrighted material to avoid footage being archived publicly on YouTube, TikTok, etc.).