Why intent prediction needs more than an LLM​​​​‌‍​‍​‍‌‍‌​‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍​‍​‍​‍‍​‍​‍‌​‌‍​‌‌‍‍‌‍‍‌‌‌​‌‍‌​‍‍‌‍‍‌‌‍​‍​‍​‍​​‍​‍‌‍‍​‌​‍‌‍‌‌‌‍‌‍​‍​‍​‍‍​‍​‍‌‍‍​‌‌​‌‌​‌​​‌​​‍‍​‍​‍‌‍​‌‍‌‌​​‍‍‌​‌‌​‌‍​‌‌‍​‌‍‍‌‍‌‌‍‌‍‌‌‌​‍‌‍‌‍‌‍​‌‍‌‌​‍‍‌‍​‌‍​‍‌‍‍‌‌‍‍‌‌​‌‍‌‌‌‍‍‌‌​​‍‌‍‌‌‌‍‌​‌‍‍‌‌‌​​‍‌‍‌‌‍‌‍‌​‌‍‌‌​‌‌​​‌​‍‌‍‌‌‌​‌‍‌‌‌‍‍‌‌​‌‍​‌‌‌​‌‍‍‌‌‍‌‍‍​‍‌‍‍‌‌‍‌​​‌​‌‌​​​‌‍‌‍​‌​​‌‌​​​​‍‌‌‍​​‍‌​‌​‌‍‌‍​‌‌​‌‌​‍‌​‌​‌‍‌‍​​‌​​‌​‍‌​‍​‌‍​‌
Stack Overflow Blog

Why intent prediction needs more than an LLM​​​​‌‍​‍​‍‌‍‌​‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍​‍​‍​‍‍​‍​‍‌​‌‍​‌‌‍‍‌‍‍‌‌‌​‌‍‌​‍‍‌‍‍‌‌‍​‍​‍​‍​​‍​‍‌‍‍​‌​‍‌‍‌‌‌‍‌‍​‍​‍​‍‍​‍​‍‌‍‍​‌‌​‌‌​‌​​‌​​‍‍​‍​‍‌‍​‌‍‌‌​​‍‍‌​‌‌​‌‍​‌‌‍​‌‍‍‌‍‌‌‍‌‍‌‌‌​‍‌‍‌‍‌‍​‌‍‌‌​‍‍‌‍​‌‍​‍‌‍‍‌‌‍‍‌‌​‌‍‌‌‌‍‍‌‌​​‍‌‍‌‌‌‍‌​‌‍‍‌‌‌​​‍‌‍‌‌‍‌‍‌​‌‍‌‌​‌‌​​‌​‍‌‍‌‌‌​‌‍‌‌‌‍‍‌‌​‌‍​‌‌‌​‌‍‍‌‌‍‌‍‍​‍‌‍‍‌‌‍‌​​‌​‌‌​​​‌‍‌‍​‌​​‌‌​​​​‍‌‌‍​​‍‌​‌​‌‍‌‍​‌‌​‌‌​‍‌​‌​‌‍‌‍​​‌​​‌​‍‌​‍​‌‍​‌

Why intent prediction needs more than an LLM

Ryan sits down with Frank Portman, CTO at Yobi, to talk about why next-token prediction, though great for language, isn't the right inductive bias for forecasting human behavior. They discuss how Yobi builds a "foundation model of behavior" using transformers and graph neural networks instead of chat-style LLMs, and what it takes to run millions of personalization decisions per second while keeping consumer data private.

Yobi is a behavioral AI company building foundation models that predict future behavior for ad tech, marketing, and more. Connect with Frank via fportman.com or at yobi.ai.

From math to machine learning

Ryan Donovan: Let's get to know our guests. How did you get into software and technology?

Frank Portman: It's a good question. I've been doing this for a little while now, but actually way back in the day, I wanted to be a mathematician.

Ryan Donovan: Close enough, right?

Frank Portman: Close enough. I think I have a very similar origin story to many people working in technology - when I was very young, I liked computers, I liked video games, I liked to tinker, but I definitely wasn't one of those people that was coding something at 13. I just knew how to open the terminal on my computer, which was advanced but not too crazy. Technology is cool, but I like math, so I actually went to college for pure math.

I finished and I still find it super interesting, but along the way we had to take a couple of courses outside of our direct major. I thought, okay, let's do applied math - specifically the ones where we have to implement our own matrix multiplication. Suddenly I wasn't the one asking people for help; someone was asking me for help. I thought, maybe there's something here. Writing software and solving problems with software is like a series of mini puzzles - though that's not the software engineering job, which is all about solving business problems.

This was right around the time data science and machine learning was becoming a very hot buzzword. Despite no formal training or education, I had the math and stats background and seemed to be good at the coding thing. Nothing's harder than getting your first job, but ever since then, that's what I've been doing.

Ryan Donovan: Matrix multiplication is pretty hot these days, isn't it?

Frank Portman: It is pretty hot these days. That's what everything comes down to.

Why LLMs fall short for intent prediction

Ryan Donovan: I've heard of folks using LLMs for a sort of generic abstracted next-token prediction where the next token could be intent signals, steps in a process, or behaviors in a chain. Why do you think that is a flawed approach?

Frank Portman: There are a couple of things. One, the nice thing about models is they'll do exactly what you train them to do. That's also the worst thing about models.

It's not clear to me why the inductive bias of gathering all the text in the world, getting very good at predicting long sequences, and then fine-tuning it to be more pleasant or correct in conversation would make these LLMs good for forecasting, prediction, or decision making. They are phenomenal at synthesizing information within your context, writing code, and emergent things like writing rap in the form of Shakespeare.

But decision-making under uncertainty is the thing around modeling intent and making predictions. It's not clear to me that the inductive bias of "let's just train to predict next token" can build that into existence. However, I do think LLMs in combination with the right tools could start to get there - that's why people are excited about agentic everything versus the "one LLM rules them all" approach.

Ryan Donovan: To get this sort of inductive bias, you need a lot of foundational data. I've talked to people doing next-token prediction for intent on very constrained use cases like job hiring. You're looking at more generalized intent prediction - is that right?

Frank Portman: The way I think about what we're building: I think of us as a behavioral AI company, building a foundation model of behavior - which is different from a foundation model of text generation, video generation, or image generation.

The inductive biases that are largely different are the data that goes into it and the way we train the model. The data we train on is proprietary and sensitive - oftentimes identifiable even to an anonymous ID or browser session. The model isn't always text; sometimes it's just "product A," which we do for privacy considerations.

We're not trying to create a chatbot experience. We're trying to create a base representation that we call "broadly predictive of future behavior." That's our variant of the foundation model. If it's broadly predictive, you can run an ad campaign for a product that was nowhere near the training data set and it does well - the same way that foundation for LLMs means it's pretty good at writing Shakespeare despite never having seen it in the training data.

Behavior prediction beyond ads

Ryan Donovan: When you say behavior prediction, is that just limited to ads?

Frank Portman: No minority report. We are not an ad tech company - we are a behavioral AI company. We can talk about why we're working in the ad space to start. I've been here four years now, but I'm new to the ads world. We see ourselves building products anywhere a personalization decision or recommendation could be made. Ads is an economically viable place to play first.

Ryan Donovan: The internet is built on ads, so that makes sense.

The architecture: transformers and graph neural networks

Ryan Donovan: If not LLMs, what sort of models are you working with?

Frank Portman: At the end of the day, these architectures all tend to look very similar. "Attention Is All You Need" is the title of the paper from back in the day, and it's kind of true. We use large-scale transformers. We also use graph models because we have interesting questions around identity that language models don't.

For example, there are anonymous identifiers that we want to connect in some way while still respecting privacy. We have flavors of graph neural networks in our stack, but at the end of the day, attention is all you need - scale transformer is the right training process, the right amount of electricity to train all this.

Training process differences

Ryan Donovan: How does the training process differ from a transformer based on language?

Frank Portman: The literal training process - running PyTorch on GPUs distributed in our data centers - is very similar. We have interesting questions in the graphical world where certain tokens are discrete and high cardinality, such as all the websites or specific hashed user IDs.

We have questions around inductive versus transductive models. An inductive model can induct any new node or row in a satisfying way. A transductive model only has representations for what it was trained with, and you need heuristics to induct new representations into it.

For the behavior side, it's maybe not rapidly changing - a new franchise opens up, we should include it - but the user side changes quite a bit. You're doing new things, diverging from paths you might have looked similar to before. We have to be able to induct new nodes relatively satisfyingly, so we spend a lot of time working on architectures that are inductive and not just transductive.

LLMs are kind of inductive by definition. Every transformer model, somewhere inside them, has a lookup table for embeddings related to certain tokens. In the language world, that's maybe hundreds of thousands of tokens. The way those are combined with transformers in contextually aware ways - it's amazing that given 300,000 or 500,000 base tokens, you can get the emergent behavior we see.

Comments

No comments yet. Start the discussion.