When deep research isn't enough for your business: Sakana AI launches 'ultra deep research' agent for 100+ page reports in 8 hours
Tokyo-based AI startup Sakana AI has officially launched its first commercial product, Sakana Marlin . Billed as a " Virtual CSO " (Chief Strategy Officer), Marlin is an autonomous, B2B research agent that deliberately abandons the instantaneous text generation of modern chatbots in favor of deep, long-horizon reasoning. What sets Marlin apart from the current ecosystem of AI tools is its temporal scale: instead of returning an answer in seconds, it runs continuous, self-governing reasoning loops for up to eight hours at a time to deliver deeply researched, well cited, 100-page strategy reports and executive slides. The company posted sample reports generated by Marlin on its product website here . Available immediately via the company’s website with pricing starting at a pay-as-you-go tier, the platform is designed strictly for enterprise use—specifically targeting corporations, financial institutions, and think tanks. The generative AI hype cycle has largely been defined by speed. For the past two years, the industry standard has been the ability to generate a poem, a line of code, or a surface-level summary in mere milliseconds. But the enterprise frontier is rapidly shifting from shallow, rapid generation to deep, methodical reasoning. With Marlin, major businesses are no longer asking how fast an AI can answer, but how deeply it can think. The Product: A Virtual CSO What exactly is a business getting when they deploy Sakana Marlin? The workflow is fundamentally different from typical large language model (LLM) interactions. Rather than engaging in a tedious back-and-forth prompt engineering session, the user simply provides a core research topic. Following a brief initial exchange to sharpen the scope and direction of the investigation, the human steps away entirely. For the next several hours, Marlin operates as a self-contained digital strategy team. It formulates its own initial hypotheses, navigates the web to gather data, cross-references sources to verify findings, and maps the causal dynamics within complex business environments. It is effectively searching for the "winning formula" within a sea of noise. Think of it less like a search engine and more like a junior strategy consultant locked in a room with a whiteboard and an internet connection. You provide the strategic prompt in the morning, and by the end of the workday, the system delivers a comprehensive, professional-grade portfolio. In Marlin's case, the final output is not a generic text blob; it is a structured set of strategic options, complete with executive summary slides, appendices, references, and a deeply researched report. The company highlighted several real-world use cases to demonstrate Marlin's capacity for complex synthesis, including generating detailed resolution scenarios for a theoretical blockade of the Strait of Hormuz, mapping out the fragmented global AI regulation patchwork, and analyzing macroeconomic trends like the return of "bond vigilantes". Sakana says Marlin relies on multiple AI models, but did not provide specific model names or providers. I've reached out on X to find out more and will update when I receive a response. The Engine of Long-Horizon Reasoning Under the hood, Marlin is the commercial culmination of Sakana AI’s extensive laboratory breakthroughs over the past two years. The product is powered by an exploration engine relying on Sakana's own prior research breakthrough, Adaptive Branching Monte Carlo Tree Search (AB-MCTS) , and leverages frameworks derived from "The AI Scientist," an earlier Sakana AI research project featured in the journal Nature that successfully automated the scientific discovery process from ideation to peer review. To understand how this works in practice, consider a real-world analogy: modern chess engines. When a computer plays chess, it doesn't just look at the board and guess; it plays out thousands of potential future moves, evaluating the strength of each resulting position before committing to an action. Marlin’s AB-MCTS engine does something similar for research. Inside the Engine: The Mechanics of AB-MCTS The chronology of this technology traces back to June 2025, when Sakana AI first introduced the framework to the public alongside the research paper “ Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search ”. At that time, to encourage developer experimentation with collective AI intelligence, the company released the underlying algorithm as an open-source software library called TreeQuest , distributed under the permissive Apache 2.0 license . This open-source milestone laid the technical foundation for what would eventually evolve into the proprietary, enterprise-grade Marlin product a year later. Traditionally, when developers attempt to extract higher-quality reasoning from large language models, they rely on a brute-force method called "repeated sampling"—essentially running the model dozens of times in parallel and hoping o
Tokyo-based AI startup Sakana AI has officially launched its first commercial product, Sakana Marlin. Billed as a "Virtual CSO" (Chief Strategy Officer), Marlin is an autonomous, B2B research agent that deliberately abandons the instantaneous text generation of modern chatbots in favor of deep, long-horizon reasoning. What sets Marlin apart from the current ecosystem of AI tools is its temporal scale: instead of returning an answer in seconds, it runs continuous, self-governing reasoning loops for up to eight hours at a time to deliver deeply researched, well cited, 100-page strategy reports and executive slides. The company posted sample reports generated by Marlin on its product website here. Available immediately via the company’s website with pricing starting at a pay-as-you-go tier, the platform is designed strictly for enterprise use—specifically targeting corporations, financial institutions, and think tanks. The generative AI hype cycle has largely been defined by speed. For the past two years, the industry standard has been the ability to generate a poem, a line of code, or a surface-level summary in mere milliseconds. But the enterprise frontier is rapidly shifting from shallow, rapid generation to deep, methodical reasoning. With Marlin, major businesses are no longer asking how fast an AI can answer, but how deeply it can think. The Product: A Virtual CSO What exactly is a business getting when they deploy Sakana Marlin? The workflow is fundamentally different from typical large language model (LLM) interactions. Rather than engaging in a tedious back-and-forth prompt engineering session, the user simply provides a core research topic. Following a brief initial exchange to sharpen the scope and direction of the investigation, the human steps away entirely. For the next several hours, Marlin operates as a self-contained digital strategy team. It formulates its own initial hypotheses, navigates the web to gather data, cross-references sources to verify findings, and maps the causal dynamics within complex business environments. It is effectively searching for the "winning formula" within a sea of noise. Think of it less like a search engine and more like a junior strategy consultant locked in a room with a whiteboard and an internet connection. You provide the strategic prompt in the morning, and by the end of the workday, the system delivers a comprehensive, professional-grade portfolio. In Marlin's case, the final output is not a generic text blob; it is a structured set of strategic options, complete with executive summary slides, appendices, references, and a deeply researched report. The company highlighted several real-world use cases to demonstrate Marlin's capacity for complex synthesis, including generating detailed resolution scenarios for a theoretical blockade of the Strait of Hormuz, mapping out the fragmented global AI regulation patchwork, and analyzing macroeconomic trends like the return of "bond vigilantes". Sakana says Marlin relies on multiple AI models, but did not provide specific model names or providers. I've reached out on X to find out more and will update when I receive a response. VB Transform · July 14–15 · Menlo Park · LLMs, ops & evals Standard benchmarks fail. Amazon and Waymo explain what they test instead. The evals track goes deep on the four dimensions of reliability — consistency, robustness, predictability, safety — and how teams at Amazon and Waymo are operationalizing them in production. See the full agenda →The Engine of Long-Horizon Reasoning Under the hood, Marlin is the commercial culmination of Sakana AI’s extensive laboratory breakthroughs over the past two years. The product is powered by an exploration engine relying on Sakana's own prior research breakthrough, Adaptive Branching Monte Carlo Tree Search (AB-MCTS), and leverages frameworks derived from "The AI Scientist," an earlier Sakana AI research project featured in the journal Nature that successfully automated the scientific discovery process from ideation to peer review. To understand how this works in practice, consider a real-world analogy: modern chess engines. When a computer plays chess, it doesn't just look at the board and guess; it plays out thousands of potential future moves, evaluating the strength of each resulting position before committing to an action. Marlin’s AB-MCTS engine does something similar for research. Inside the Engine: The Mechanics of AB-MCTS The chronology of this technology traces back to June 2025, when Sakana AI first introduced the framework to the public alongside the research paper “Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search”. At that time, to encourage developer experimentation with collective AI intelligence, the company released the underlying algorithm as an open-source software library called TreeQuest, distributed under the permissive Apache 2.0 license. This open-source milestone laid the technical foundation for what would eventually evolve into the proprietary, enterprise-grade Marlin product a year later. Traditionally, when developers attempt to extract higher-quality reasoning from large language models, they rely on a brute-force method called "repeated sampling"—essentially running the model dozens of times in parallel and hoping one of the answers is correct. However, repeated sampling operates blindly; it cannot evaluate its own intermediate steps or pivot based on external feedback. AB-MCTS replaces this paradigm with a principled, multi-turn approach driven by a Bayesian decision framework. As the AI constructs a strategy report, the system treats the research process as a branching tree of possibilities. At each node of the tree, the algorithm dynamically balances two distinct behaviors based on external feedback signals: Going Wider (Exploration): Spawning entirely new, alternative hypotheses or candidate responses when the current path yields diminishing returns or unresolved contradictions. Going Deeper (Exploitation): Methodically refining, auditing, and building upon an existing candidate solution that shows high strategic promise. What transforms this from a laboratory experiment into a commercial engine is its extension into Multi-LLM AB-MCTS. Sakana AI’s architecture introduces a critical third dimension to the search tree: the ability to dynamically choose which model to invoke for a specific sub-task, treating the industry’s leading frontier models as a plug-and-play collective intelligence network. According to technical documentation published by the company, the engine can coordinate highly heterogeneous models—allowing an orchestration model to delegate initial ideation to one LLM, while utilizing a reasoning-heavy model to audit, verify, and correct intermediate errors generated earlier in the search tree. By scaling up compute at inference time—leveraging the distinct "personalities" and strengths of multiple foundation models over thousands of automated cycles—AB-MCTS provides the mathematical guardrails Marlin requires. It ensures that the resulting 100-page strategy reports are not merely long-winded AI generations, but the highly vetted product of systemic, automated trial-and-error. Licensing, Data, and Enterprise Implications It is crucial to note that Sakana Marlin is distinctly not a general consumer tool; it is a commercial software-as-a-service (SaaS) offering restricted to corporate entities, organizations, and sole proprietors. For enterprises, licensing and data handling terms are often the determining factors in software adoption. Unlike many consumer-grade AI tools that silently harvest user inputs and proprietary data to train future foundational models, Sakana Marlin operates under a strict, enterprise-grade data policy. Neither Sakana AI nor its external AI service providers will use customer data or inputs for model training or fine-tuning unless the client provides explicit opt-in consent. Even with consent, data is heavily processed to remove personally identifiable information. This closed-loop security is absolutely vital for companies handling sensitive M&A research, unreleased product strategies, or proprietary market analyses. The commercial licensing is structured into tiered pricing models that reflect its enterprise nature: Pay-as-you-go: Users can purchase credits on demand, with a single run costing 100 credits, and add-on credits priced at ¥98 ($0.61 USD) each. Pro Plan: At ¥150,000 ($935.68 USD) per month, businesses receive 2,000 credits, bringing down the cost of add-on credits to ¥90 ($0.56 USD). Team Plan: Geared toward larger departments, this ¥400,000 ($2,495.14 USD) per month tier includes 6,000 credits, lowering add-on costs to ¥85 ($0.53 USD) per credit. Enterprise: Fully custom quotes with dedicated support and customized credit allocations. Why Sakana Is Worth Watching Sakana AI’s transition into a commercial enterprise powerhouse is rooted in the pedigree of its founders, who famously helped spark the current generative AI boom. Formed in Tokyo in 2023, the startup was co-founded by Llion Jones—a co-author of Google’s seminal 2017 “Attention Is All You Need” paper who coined the term “transformer”—and David Ha, a former Google Brain researcher and head of research at Stability AI. The decision to build a new laboratory outside the Silicon Valley bubble was a deliberate rejection of the current AI ecosystem. At a TED AI conference in late 2025, Jones candidly expressed that he was "absolutely sick" of transformers, warning that the intense pressure from investors and the hyper-fixation on scaling single, monolithic models had calcified the industry's creativity and blinded researchers to the next major breakthrough. To break free from this "big company-itis," Jones and Ha structured Sakana AI around principles of biomimicry and evolutionary computing. The company's name, derived from the Japanese word for fish, reflects its core technical
Comments
No comments yet. Start the discussion.