Tue. 01/10 – Now We Have VALL-E To Take Over My Podcast Voice
Tech Brew Ride Home
Amalgamated Internets, LLC
4.7 • 1K Ratings
🗓️ 10 January 2023
⏱️ 20 minutes
🧾️ Download transcript
Summary
Transcript
Click on a timestamp to play from that location
| 0:00.0 | Welcome to the Tech Meme Ride Home for Tuesday, January 10th, 2023. I'm Brian McCullough today. |
| 0:09.0 | Well, now there's Voli, a text-to-speech technology that could fully replace me as this podcast narrator. |
| 0:16.6 | It looks like Microsoft wants to do everything just short of buying open AI entirely. |
| 0:21.9 | More layoffs at coin base, Why the whole 5G interfering with airplanes thing still |
| 0:26.4 | isn't resolved and not everything that says its chat gp t is really chat gp t. Here's what you missed today in the world of tech. |
| 0:35.0 | Well it seems as though once again my instinct to investigate deeper into a topic was perfectly time. |
| 0:44.6 | Microsoft has unveiled Vol E. |
| 0:47.3 | A text-to-speech AI model trained on 60,000 hours of English speech that can simulate a person's voice from just three seconds of sample audio. |
| 0:57.0 | Quoting Aris Technica. |
| 0:59.0 | Once it learns a specific voice, Volly can synthesise audio of that person saying anything and do it in a way that |
| 1:07.4 | attempts to preserve the speaker's emotional tone. Its creators speculate that |
| 1:12.1 | Volley could be used for high quality text to speech applications, |
| 1:16.0 | speech editing where a recording of a person could be edited and changed from a text transcript, |
| 1:22.0 | making them say something they originally didn't, |
| 1:26.3 | and audio content creation when combined with other generative AI models like GPT 3. |
| 1:33.6 | Microsoft calls Vol E a neural codic language model, |
| 1:38.7 | and it builds off of a technology called Encodic, |
| 1:42.0 | which Meta announced in October 2022. |
| 1:45.0 | Unlike other text-to-speech methods that typically synthesize speech by manipulating waveforms, |
| 1:51.0 | volley generates discrete audio-coding codes from text and acoustic |
| 1:56.3 | prompts. |
| 1:57.6 | It basically analyzes how person sounds, breaks that information into discrete components called tokens, thanks to Encoddick, |
... |
Please login to see the full transcript.
Disclaimer: The podcast and artwork embedded on this page are from Amalgamated Internets, LLC, and are the property of its owner and not affiliated with or endorsed by Tapesearch.
Generated transcripts are the property of Amalgamated Internets, LLC and are distributed freely under the Fair Use doctrine. Transcripts generated by Tapesearch are not guaranteed to be accurate.
Copyright © Tapesearch 2026.

