I may have to change the name of my newsletter again. Shortly after returning from pat leave, I bid farewell to MSL audio to join the MTIA (custom silicon) program to help Meta kick its Nvidia addiction.

The most consistent piece of feedback I got on my way out was that people hoped I would continue writing this newsletter, so here we are with the first post-voice issue. I hope to keep following voice AI closely but plan on including more content around chips, AI for healthcare, and other topics that have interested me of late.

Spoken LLM from scratch with NanoGPTaudio

Andrej Karpathy took the world by storm with NanoGPT, his simple tutorial for training a chatbot from scratch. Our friends at Kyutai recently extended this repo to audio alongside an excellent blog post explaining the details and tradeoffs around tokenization.

While I haven’t carved out the time to go through the code, Kyutai’s blog post is the clearest explanation and demonstration of different encoding strategies I’ve seen and shows that you can get really good native audio model by gluing together an off-the-shelf OSS tokenizer (Mimi) and some OSS data (Libri-lite). NTP truly is all you need…

And after that someone thinks TTS is worth billions?

On that note, how on Earth is Cartesia raising another $100M? Six months ago, I wrote that Cartesia got lucky that they raised $64M just as the world was figuring out NTP was all you needed for great TTS. I would have thought that the world would have figured it out by now.

Yes, Cartesia’s product is good. Yes, they are an interesting bet on SSMs. But there are literally dozens of very strong TTS/voice agent/audio companies right now that are indistinguishable to this guy who has spent years building audio evals. I would love to see their pitch deck and be a fly on the wall in the investor meeting. My best guess is that the market saw Meta’s acquisition of Play and Waveforms and thought there were more corporate dollars to come?

But this is fairly optimistic in my mind. Until voice has its Nano Banana or Claude Code moment and a breakout consumer product emerges, I couldn’t imagine big tech continuing on this buying spree. Time will tell.

Dr. ChatGPT retires

OpenAI quietly dropped support for healthcare queries in ChatGPT over the weekend. Given the shortage of doctors across the US and world, the collection of evals that show frontier AIs significantly outperform humans at medical diagnostic tasks, and the strong PMF of AI in consumer health, this seems like a bad decision for both OpenAI and the world.

If this doesn’t reverse, it is a huge opportunity for a startup to fill.

Qualcomm’s $25B press release

In the most obvious news of the year, Qualcomm announced that it was working on its own AI accelerator. Despite omitting any mention of specs, prototypes, customers, or launch dates, its stock ripped 11% adding $25B to its market cap.

Good for them. They are selling what the market is buying.

Minimax cooks and shows NTP is all you need

For those with their heads in the sand, Minimax, the Chinese company that has been sitting on top of the TTS arena for months, just released a very competitive text model, M2. Like most model releases these days, they essentially saturate metrics and appear competitive with all the top open and closed labs. Properly evaluating a model takes quite a lot of diligence these days, but after spending 5 minutes with their website and a collection of hard prompts, I generally agree that they are at the frontier.

Even last year, a TTS lab pivoting to releasing a frontier text model would have been unheard of. Today, it barely registers. NTP truly is all you need.

Shots fired in the AI platform wars

Running a platform is hard. The owner needs to charge enough rent to pay the bills but not so much to discourage application developers. Windows is perhaps the best example. Microsoft both became the largest company in the world and enabled developers to capture in aggregate a large multiple of that value. Apple has enabled a new generation of app companies, but many resent paying a 30% toll.

WhatsApp is a strange hybrid. It is both a messaging product that Meta owns and is highly incentivized to grow but also a platform on which many millions of people create businesses. If you’ve traveled in the developing world, you’ve almost certainly encountered businesses that exist solely on/because of WhatsApp. These businesses can pay for extra tools, but in general, the platform is free and both sides benefit when they are on it.

But what does one do when a competitor arrives on your platform? Meta offers Meta AI through WhatsApp, which overlaps almost entirely in features with ChatGPT. OpenAI recognized that many people on WhatsApp might enjoy a conversation with ChatGPT, so built a bot that seems to be quite successful and perhaps even more successful than Meta’s own. What is Meta to do? It’s certainly a hard call and not black and white, but we opted to boot off OpenAI and 1-800-CHATGPT will have to find a new home.

As these chatbot interfaces start to look more and more like platforms, I do wonder what the future holds and what precedent this will set. If I want to create a OpenAI plugin that competes with some future OpenAI business, will they boot me off?

Is Meta pricing in 67-133% earnings growth?

I’m trying to wrap my head around the scale of AI data center investments across the tech industry. To take Meta as an example, we committed to $600B in spending.

If we assume a discount rate equivalent to the interest on the bonds we just issued (6.58%), we would need to generate an incremental $40B/year to be NPV neutral with no depreciation. With 10-year depreciation, this doubles to $80B/year.

The market obviously recently punished the stock to the tune of 10%, but even so, this means we are pricing in 60%-120% earnings growth from AI. Wild times.

Daniel D. McKinnon

Musings on adventuring in the modern era and tinkering with technological curiosities

Category Archives: Speech and Language Scoop

Dan’s Voice AI Voice #48