Adopting AV1 for Real-Time Communication (RTC) at Scale
Facebook Engineering

Adopting AV1 for Real-Time Communication (RTC) at Scale

Adopting AV1 for Real-Time Communication (RTC) at Scale

Adopting AV1 for real-time communication at Meta has been a multi-year effort spanning codec selection, device eligibility, rate control, and error resilience. We're sharing the technical and operational challenges while deploying AV1 and expanding coverage, and how we addressed them for real-time communication. We're presenting several technologies for improving AV1 call quality, including rate control and error resilience.

The AV1 video codec, first standardized by AOMedia in 2018, has rapidly evolved and gained widespread industry support. Today, leading companies like YouTube, Netflix, and Meta stream video using AV1 at scale. Meta introduced AV1 for real-time video calls on high-end devices in 2023, aiming to deliver superior call quality. Since then, we have made notable progress in expanding AV1's reach and improving the experience for AV1-powered calls. Today, AV1 is enabled on the majority of mobile devices in Meta Real-Time Communication (RTC) applications such as Messenger and WhatsApp.

Why Is Meta Interested in Adopting AV1 for RTC?

The motivation for switching to a more advanced video codec is straightforward - it delivers the same visual quality while using much less bandwidth. In offline tests, we observed at least a 20% bitrate reduction with AV1 compared with H.264/AVC under our product settings on low-end and mid-range devices. If devices can accommodate higher encoding complexity, the bitrate reductions are even greater.

For real-time video calls, this means people on slower or limited networks can enjoy significantly better video quality. This is important to our users because, to meet low-latency requirements, the RTC product must handle bitrate fluctuations. In real-world networks - especially in emerging markets - video bitrates for RTC products typically range from 10 kbps to 400 kbps. Maintaining good video quality below 100 kbps remains challenging.

To evaluate the user experience across codecs, we enabled AV1 in the Messenger app and conducted a side-by-side comparison using two Android phones. In the examples below, AV1 is displayed on the right and H.264/AVC on the left, both limited to 100 kbps. The H.264/AVC video appears noticeably blurry, while the AV1 video remains much clearer - highlighting the significant advantage of AV1 for video calls under bandwidth constraints.

| H.264/AVC (left) versus AV1 (right). |

An increased focus on screen content needs support from high-quality computer generated content encoding. Traditionally, video encoders aren't that well suited to complex content such as text with a lot of high-frequency content, and people are very sensitive to reading blurry text. AV1 has a set of coding tools - palette mode and intra-block copy - that drastically improve performance for screen content.

Palette mode is designed according to the observation that the pixel values in a screen-content frame usually concentrate on the limited number of color values. It can represent the screen content efficiently by signaling the color clusters instead of the quantized transform-domain coefficients. In addition, for typical screen content, repetitive patterns can usually be found within the same picture. Intra-block copy facilitates block prediction within the same frame, so that the compression efficiency can be improved significantly. AV1 has the benefit of providing these two tools at the main profile.

The Challenges in Adopting AV1

While the comparison clearly illustrates AV1's advantages, there are significant challenges to its adoption in RTC. Unlike video on demand (VOD), RTC systems must manage end-to-end video latency, which ideally should remain below 300 milliseconds. If latency exceeds this threshold, people begin to notice delays in the conversation.

Maintaining both high video quality and low latency is challenging. For example, multi-pass encoding techniques - which can improve quality - introduce additional delay. On the decoder side, extensive buffering further increases latency. Additionally, any sudden spikes in bitrate can cause video freezes during calls, degrading the user experience.

RTC products must also dynamically adapt to network conditions during a call. Two challenges are fluctuations in network bandwidth and packet loss. To cope with bandwidth changes, the video encoder adjusts parameters such as resolution and frame rate. However, switching resolutions typically requires a new key frame, which can cause a sudden bitrate spike and temporary video freezing. Similarly, packet loss can trigger retransmissions or force the encoder to send another key frame, both of which may lead to video freezes. Effectively managing these issues helps enable delivery of high-quality, uninterrupted video calls.

Additionally, the RTC client must perform both real-time encoding and decoding, both of which consume significant power - making power efficiency important, especially on mobile devices.

Encoder and Decoder Selection

Choosing the right encoder and decoder is the most critical step in adopting a new codec. The computational complexity of video codecs is a significant consideration for mobile devices. While AV1 offers improved compression efficiency through advanced coding tools, these benefits come at the burden of increased computational demands, particularly during encoding.

To assess this increased complexity, in an offline experiment we integrated an open-source AV1 encoder and measured power consumption on a Pixel 8 device during a video call. The results showed a 14% increase in power usage compared to H.264/AVC - a significant challenge for mobile deployment. To address this, we adopted an internal low-complexity encoder that has similar power consumption as H.264 baseline, as detailed in the next section.

Beyond power, AV1 encoding also increases memory usage compared to H.264/AVC, leading to app crash regressions that further complicate mobile adoption.

Low-Complexity Encoder

A strong encoder should balance visual quality against computational complexity. Low complexity encoding helps enable AV1 encoding on mid-range and low-end devices. Compared to older codecs like H.264/AVC, newer codecs such as AV1 deliver better compression efficiency. However, these benefits are thought of to come only with higher computational complexity - this represents an obstacle to extending AV1 coverage to low-end devices.

However, a newer codec should not necessarily require a higher-complexity encoder. Because modern codecs support a larger set of coding tools, a well-designed encoder has more opportunities to find better trade-offs between quality and complexity. These trade-offs are also referred to as presets. Ideally, the encoder offers multiple presets, spanning a range from high to low complexity while still maintaining a consistent compression efficiency gain. An ultra-low-complexity preset comparable to H.264/AVC could enable shipping AV1 on low-end phones.

To address this, we adopted a low-complexity encoder implementation of AV1 for the RTC use cases. In addition to optimizing the quality of the high-complexity preset, we developed an ultra-low-complexity preset. This new preset delivers encoding complexity comparable to H.264/AVC. With it in place, we designed a mechanism that adjusts the encoder preset based on device capabilities, enabling us to ship AV1 to a much broader range of devices.

Decoder Selection

After selecting the encoder, the next step is choosing the decoder. Although video decoders are generally less complex than encoders, we found that decoding complexity remains significant on mobile devices and video calling usecases, especially low-end models. In our initial A/B tests, some low-end devices could not perform real-time decoding, resulting in video freezes and audio/video synchronization issues.

We compared several open-source decoders and, after A/B testing, we selected dav1d for its superior power efficiency and reliability. Our experiments also showed an increase in talk time with the dav1d decoder.

Binary Size

Integrating the AV1 encoder and decoder into the mobile app introduces another challenge: binary size. Using libAOM as an example, AV1 support adds 1.7 MB to the application (600 kB compressed). While this may sound negligible, it's a major challenge for a company that serves billions of users. Binary size affects update success rates, application startup time, and software health metrics like memory usage and crash rates which can negatively impact user experience. A larger binary leaves more people on older app versions and delays incoming call setup. For example, a 600 kB increase could consume an entire year's binary size budget for a large organization.

We explored several approaches to reduce the binary size:

  • Our initial approach was to use a dynamic-download framework to deliver AV1 as a separate component. However, download failures - whether from poor network conditions, device issues, or random occurrences - degraded the user experience, making this approach insufficient.
  • We then focused on direct binary size optimizations. For example, the quantization matrix (QM) tool accounts for about 10% of the encoder's library size; optimization could halve it. We also contributed size reductions optimizations to the dav1d project. This strategy extends to end-to-end pipeline optimization, removing unused tools from the library entirely. For instance, removing QM frees 60 kB of binary space.
  • At the application level, we can share codec libraries across features - such as video message transcoding - and leverage built-in platform codec support to avoid bundling additional libraries.

Expanding AV1 Coverage

After selecting the encoder and decoder, the next challenge was identifying which devices are eligible to use AV1. Compiling eligible iOS models was straightforward given the limited number of variants, but Android posed a far greater challenge due to the vast number of device models. We initially tried selecting devices based on memory, release year, and Android OS version, but none of these strategies proved sufficiently reliable. Ultimately, we leveraged Meta's in-house ML-based device eligibility framework to generate a reliable list of eligible Android devices.

AV1 Device Eligibility

We created a machine learning (ML)-based device eligibility framework to support advanced video and audio features based on device capability. The idea is to use large-scale real-world statistical data to categorize device capabilities, rather than relying on lab data. This helps us scale our device eligibility system and make more accurate decisions.

We propose an ML-based device eligibility approach that uses low-level performance statistical metrics collected through our logging pipeline to assess a device's AV1 capability. The model takes these measurements as input features and outputs an rtc_score, which quantifies the device's overall AV1 performance. This score then informs decisions such as optimizing call settings and determining whether a device can run the AV1 codec efficiently.

In 2025, we iteratively refined our model using AV1-specific data and significantly expanded device support. Our first milestone, Model V1.1, rolled out in August 2025 and broadened AV1 traffic across an increasing set of devices. That additional traffic contributed to a dedicated AV1-only dataset that became both larger and more representative over time. With this richer data, we built Model V2, introducing a two-tier approach that differentiates between higher-end and lower-end devices - reflecting the reality that entry-level phones and flagship devices can have very different AV1 encoding capabilities.

Across these iterations, we substantially increased AV1 enablement across the device landscape, with an approach designed to keep improving as traffic grows and more data becomes available. As AV1 traffic continues to grow, we expect iterative optimization will further improve both call duration and quality.

Codec Complexity Adaptation

Device eligibility lets us identify capable devices, but we discovered an additional challenge: During A/B tests, we observed calls with significant audio/video sync regressions, primarily caused by devices unable to encode or decode video in real time. Surprisingly, even a 2023 smartphone with an octa-core processor could not handle encoding at 320Γ—180@15fps. This issue affected both H.264 and AV1, though it was more prevalent with AV1. We suspect these devices throttle CPU frequency during calls, reducing their effective capability.

As a result, enabling AV1 purely based on device name is not sufficient. We needed a more robust mechanism to adjust codec complexity based on both local and peer device status. We developed three mechanisms: adaptive encoder preset adjustment, encoding latency-aware codec switching, and decoding latency-aware codec switching.

Adaptive Encoder Preset Adjustment

We designed multiple encoder presets ranging from low to high complexity. A monitoring mechanism continuously tracks encoding latency during calls to select the appropriate preset. If encoding latency becomes too high - meaning the device is close to being unable to encode in real time - we reduce encoder complexity. Conversely, if the device can sustain higher complexity, we increase the preset to achieve better quality.

Local Device Encoding Latency-Aware Codec Switch

If lowering the encoder preset still does not reduce encoding

Comments

No comments yet. Start the discussion.