Airbnb Engineering Grade 9 3d ago

Scaling beyond one: How Airbnb evolved its data architecture for a multi-product world

How Airbnb’s data engineers and analytics engineers built a consistent and flexible data modeling framework to support the expansion into Homes, Experiences, and Services. By : Patrick Lam , Namrata Lamba , Jamie Stober With the May 2025 Summer Release, Airbnb redesigned its app, relaunched Experiences, and debuted Services, pushing us beyond our traditional Homes focus. For the data teams, this meant rapidly evolving a decade-old infrastructure to integrate two brand-new product pillars. Our data engineers and analytics engineers rose to the challenge by building a consistent and flexible framework to serve as a robust and scalable data foundation for the next decade of growth. But getting there wasn’t straightforward. This fundamental shift surfaced a critical question for our data organization: How do you evolve your offline data architecture to support new product lines without introducing disorder in vital analytics services? We knew the approach we took would have long-lasting implications. A fragmented strategy risked creating data silos, inconsistent analytics, and a tangled web of technical debt that would likely slow down future innovation. In this post, we’ll take you behind the scenes to share key decisions that we made, the framework that emerged, and the lessons that helped reshape our offline data warehouse for the future. Note that we focus specifically on our offline data warehouse (the analytics-oriented data infrastructure owned by our data engineers and analytics engineers) rather than the online data systems that serve the app directly, as the two domains have fundamentally different requirements, constraints, and design philosophies that warrant separate treatment. The core dilemma: separate vs. monolithic The first and most critical question was how to structure offline data for the new, three-product world, with Homes, a refreshed Experiences product, and the new Services offering. This involved a trade-off between two main approaches: Separate data models: This approach creates distinct sets of tables for each product line, keeping the data for each business clean and highly tailored, but incurring a higher incidence of duplicated logic across models. Monolithic model: This approach combines all product lines into a single, unified set of tables to maximize code reusability and ensure consistency, but risks becoming unwieldy and less well-suited to the unique attributes of each product. It became clear that neither approach was universally superior. The optimal choice depended heavily on the specific business domain. A model that was perfect for guest data, for example, would be suboptimal for payments data. We chose a path that balanced consistency with flexibility. We established a framework that combined firm, centralized principles with decentralized modeling guidelines, empowering each data team to make the right choice for its domain. Our three foundational principles To ensure a baseline of consistency across all teams, and to keep the door open for any new product categories that emerge in the future, we established three foundational principles. These principles ensured that no matter which modeling path a team chose, the results would be consistent, scalable, and easy for all data consumers to understand: Principle 1: No hybrid data models. A domain’s data model had to be either completely separate by product type or completely monolithic. We viewed this choice between two distinct paths as key for future scalability. It was important to avoid confusing, inconsistent situations down the road where some products might use combined data tables while others do not. Principle 2: Consistent identifier naming. To ensure reliable table joins and prevent confusion, we established a strict convention where the structure of primary identifiers was directly dependent on the modeling choice. Teams using separate models were required to use product-specific IDs (e.g., id_experience , id_service ). In contrast, teams using a monolithic model had to use a generic product descriptor ID (e.g., id_product_listing ) and include a product type column (e.g., dim_product_type ) to differentiate between Homes, Experiences, and Services. Principle 3: Clear namespace organization. We used namespaces to define clear placement for every table. Core, product-specific tables were placed in dedicated product namespaces while monolithic, cross-cutting tables lived in a global namespace. This structure was supplemented by team-specific namespaces, giving individual teams the flexibility to manage their own assets and intermediate tables. These principles set firm boundaries for our modeling efforts and ensured a consistent foundation across the company. The modeling guidelines With the foundational principles in place, we gave each team a set of guidelines to help them decide how to model their data. This empowered them to pick the right model for their specific domain, using a common set of consideratio

Enable JavaScript and cookies to continue

Comments

No comments yet. Start the discussion.