Google is spending billions to turn its TPU chips into a real challenger to Nvidia
TechSpot

Google is spending billions to turn its TPU chips into a real challenger to Nvidia

Google is spending billions to turn its TPU chips into a real challenger to Nvidia

TL;DR: Google is pushing its in-house AI chips much more aggressively, turning years of tensor processing unit development into a direct challenge to Nvidia's hold on the AI hardware market.

For years, the company built its chips mostly to handle its internal workloads. Those tensor processing units, or TPUs, sat behind products like search and speech recognition, handling some of the company's heavier AI workloads. Now, Google is trying to turn that in-house advantage into a business that can stand up to Nvidia.

One clear example of that shift is in western New York at an AI data-center cluster called Lake Mariner, on Lake Ontario's southern shore near Niagara Falls. Alphabet's Google has provided a $3.2 billion financial guarantee for the project, whose developers plan to rent computing power from thousands of Google's chips to Anthropic, according to people familiar with the matter who spoke to The Wall Street Journal.

The basic playbook is similar to Nvidia's: support data-center financing and then benefit when those sites buy your chips. That kind of financing has become more important as the market for AI compute has tightened. Over the past year, the AI race has become less about models and more about sheer access to computing power.

"You have all these very well-capitalized companies who are big believers that this market around compute is going to have tremendous value," said Nazar Khan, co-founder and chief technology officer of TeraWulf, which is developing Lake Mariner with FluidStack, a Google-backed cloud provider. "They want to be in the game, they don't want to be left behind," Khan told The WSJ.

The origins of Google's TPU program

The story behind Google's push goes back to 2013. Jeff Dean, now chief scientist at Google's DeepMind lab, recalled working on speech recognition systems built on the neural-network techniques that later evolved into today's large language models.

"I said, 'OK, if we want to have this speech model that we roll out to 100 million users, and they use it a few minutes a day, that would require doubling the number of computers Google had,'" he said. "We need to build specialized hardware."

That conclusion helped spur Google's TPU program, which has since produced multiple generations of the chips. Google kept those chips to itself at first, then started offering them through Google Cloud as demand for AI computing exploded. That step helped drive growth in the cloud business and set the stage for more direct competition with Nvidia.

The seventh-generation TPU and market positioning

Research firm SemiAnalysis asked in a November note whether the release of Google's seventh-generation TPU – which Anthropic uses to train its models – marked "the end of Nvidia's dominance." The company's latest moves suggest it is willing to test that question.

Google recently struck a $5 billion deal with Blackstone to create a new cloud-services business designed to compete with Nvidia-aligned providers such as CoreWeave and Nebius. It has also decided to sell chips directly to customers rather than only through its cloud and has rolled out its first TPU designed specifically for inference.

Mark Lohmeyer, vice president of AI and computing infrastructure for Google Cloud, said the new inference chip and improvements in how TPUs work across different systems have generated new interest in using them. "We're seeing a set of customers that might not have considered it in the past," he said.

Citadel Securities, a longtime Google Cloud client, recently began using TPUs for some of its research software. Josh Woods, the firm's chief technology officer, said the company can run key workloads at 30% lower cost and up to four times faster with TPUs.

Nvidia's response and market dynamics

Nvidia, for its part, is not treating TPUs as an existential threat. The company still controls an estimated more than 90% of the AI chip market, helped by its CUDA software stack and a hardware ecosystem that many AI labs already rely on.

In an April appearance on podcaster Dwarkesh Patel's show, CEO Jensen Huang said Nvidia has a much wider reach than any custom chip or ASIC. "I would love to hear them demonstrate the cost advantage of TPUs," he said. "It makes no sense in my mind."

Some cloud providers worry they are locked into Nvidia's full stack, concerned that shifting spend elsewhere could cost them access to its most coveted chips. Adam Fisher, a partner at Bessemer Venture Partners, said some so-called neo-clouds fear ending up in what insiders half-jokingly call "Jensen jail."

"Not all the Nvidia neo-clouds would say it this way – some would say Nvidia gives them what they need – but there are others that are dying for something else, but they can't get it from another supplier," he said.

Google's financial commitment and leadership

Google is trying to counter that inertia with money and focus. The company has said it plans to raise $85 billion in equity, largely to support AI infrastructure. It is backing another Anthropic-related project called River Bend, a $7 billion development near Baton Rouge, La., and is providing $1.4 billion in guarantees for an AI computing lease in Colorado City, Texas.

Inside Google, the TPU business has taken on a higher profile under Amin Vahdat, who in December became chief technologist in charge of the company's AI infrastructure build-out. His portfolio now spans chip design, supply and deployment, and he reports both to Google Cloud chief Thomas Kurian and Alphabet CEO Sundar Pichai.

People who have worked with him describe Vahdat as demanding but quietly competitive, and say he is pushing for steady performance gains and clearer commercial goals for Google's silicon efforts.

Vahdat says he is not setting out to knock Nvidia off its pedestal. Google still runs Nvidia GPUs in its data centers, and he describes the relationship as both cooperative and competitive. "For me and for us, it's not zero-sum," he said. "There's so much demand out there."

With AI workloads growing faster than any one supplier can handle, that may be the opening Google needs. If Google keeps improving its chips, lines up long-term customers such as Anthropic and Citadel, and uses its balance sheet to help build data-center capacity, TPUs could become more than an internal tool. They could be a genuine second option in a market that has mostly belonged to Nvidia.

Comments

No comments yet. Start the discussion.