Meta's Un-Stable Signature
Hacker News

Meta's Un-Stable Signature

The Basic Algorithm

Traditional (non-AI) invisible watermarks typically hide in subtle locations, such as the least significant bits, changes in brightness (e.g., Digimarc) or the frequency spectrum (DCT or FFT). There is always the risk that image encoding could corrupt the hidden data, so these algorithms typically rely on repetition over the image to help identify the true signal. In addition, they may include error correction code (extra bits in the data) to fix any minor data errors.

However, there is a problem with the traditional approaches: injecting hidden data in the image could create visible distortions. The modern approach uses an AI system to better hide the data with less added distortion.

As with SynthID and TrustMark, Stable Signature encodes binary data and uses an AI-model to decide where to hide it in the image. The AI is tuned to minimize visible distortions when embedding the data. Later, an AI-based decoder looks at the image and identifies the likely location where bits are stored, then it extracts the data. There is always the case that the data may be mixed with noise.

Different AI-based watermarking systems rely on different techniques for reducing the noise. For example:

  • Google's SynthID only stores a few bits of data (effectively a flag or version number). This allows them to use a lot of data as repetition and to increase the accuracy rate.
  • Adobe's TrustMark uses the Bose-Chaudhuri-Hocquenghem (BCH) algorithm. This acts as a combination of checksum and error correcting code that should reduce the number of errors.

The Hamming distance measures the number of bits that need to be swapped in order to correct the code. In effect, it defines a set of stable states (e.g., 10110 and 11000) and places a ring around each state that represents the single bit changes. If you change enough bits, then you will reach a different stable state.

According to Meta's Stable Signature research paper, the 48-bits should be uniformly distributed and cites a "false positive rate below 10⁻⁶", or 1 in one million. This means you can choose a 48-bit sequence to use as your signature. Every picture will generate a 48-bit sequence, and the sequence can vary a little based on noise in the picture. However, if you find a code that is within a short Hamming distance of your code (e.g., within 6 bits difference), then you can determine that it is the same code with a high reliability. At least, that's the theory.

Empirical Testing

I went into this experiment assuming that everything works like they claim. I want to be able to reliably identify invisible watermarks associated with Meta. What I don't know is what sequence they use, or whether they use multiple codes depending on whether it comes from Meta's AI system, Facebook, Instagram, WhatsApp, etc.

Fortunately, this is something I can test! I grabbed an uncurated sample of pictures from FotoForensics: the first 10,000 unique images uploaded last month (May 2026). If the bit sequences are uniformly distributed with a "1 in 1 million" collision rate, then I should see a huge number of unique bit sequences and a few small clusters around pictures from Meta (Meta AI, Facebook, Instagram, etc.). Those clusters will represent the invisible watermarks used by Meta.

The results from my empirical test were definitely not what I expected. I found:

  • Their paper assumes a binomial distribution. That is, given an arbitrary image, the 48-bits represent a random coin flip. The math becomes:

    This computes the probability of 48 random bits being within a Hamming distance (T). The probabilities table becomes:

    Meta's paper says that they use a Hamming distance of 7 bits (requiring 41 of 48 bits), which matches their claim of a "false positive rate below 10⁻⁶". However, I'm seeing problems at a Hamming distance of 6 (should be 1 in 20 million) and even collisions at 0 (1 in 281 trillion)!

According to Meta's paper, each of the 48-bits are independent. In a perfectly independent 48-bit hypercube, un-watermarked images should scatter uniformly across all 2⁴⁸ possible values. However, neural networks map a non-linear manifold (a multi-dimensional wavy surface) through this hypercube. This mathematical landscape is warped with its own peaks, ravines, and valleys. It has attractors that form clusters, and repulsers that form voids where stable values can never exist; this is a feature of a neural network. And most importantly, the output bits are explicitly not independent.

The left diagram illustrates an expected uniform distribution if all of the bits were independent. The right diagram are the types of theoretical clusters that form when the bits are dependent. There should be clusters around attractors and voids (areas with no dots) from the repelling regions.

Moving from theoretical to empirical, I graphed the data. The 48 bits can be represented as bytes. I took the first 24 bits and converted them into 8-bit red, green, and blue pixel colors. If the data is truly random, then the colored dots should be distributed across the RGB cube. However, if the bits are dependent, then there should be very clear clusters, structures, and voids. Here's the graph:

Yes, there are very clear structures that look like planes and lines. Within the planes are clusters, and outside the planes are very large voids - areas where there are no dots at all. The data generated by Meta's Stable Signature implementation fails this basic test for independence.

The biggest cluster that I found represents a Zero Signal Bias (ZSB). When their neural network doesn't find a watermark, it moves the 48 bits toward a strong attractor, like a massive gravitational well. At 6 bits error, it should have a collision of around 1 in 20 Million. But in reality, my 10,000 pictures had a cluster of 450 images within 6 bits due to the ZSB. That's an error rate of around 1 in 22 with the ZSB alone.

If we add in all of the other clusters that contain at least 10 pictures, then 2327 pictures are in various clusters; we're looking at an error rate around 1 in 4 - and that's at a Hamming distance of 6, which is more conservative than their paper's Hamming distance of 7. (In AI terms, this is a representation collapse or structural bias that is typical for deep neural networks.)

(As an aside: Given their "1 in 1 million" claim, I could look for any clusters of 2 or more pictures. At clusters of 2 or larger, 5,237 of the 10,000 test images were in clusters, or 52%. If you show their algorithm 10,000 pictures, then there is a better-than 50% chance of a false positive match.)

I fed Meta's code the first 10,000 images from May 2026. A few of the images were in unsupported formats (HEIC, WebP, and a few corrupted JPEG files), resulting in 9,847 viable pictures. I evaluated this data with elements from the NIST Statistical Test Suite (SP 800-22) for randomness, including a monobit test and Chi-Squared (χ²) test for independence.

The monobit test determines if the baseline frequency of adjacent bits seems independent. The second test is the Chi-Square (χ²) Test for Serial Independence. If the bits were independent, the transition probability between adjacent bits would just be the product of their individual probabilities. This table shows the occurrence rate of the transition pairs across all of the observed 10,000 (well, 9,847) pictures:

For the TL;DR crowd: Meta's researchers made a fundamental mistake when computing their accuracy rates. It's not a "1 in 1 million" chance of a false match, it's closer to 1 in 4 - because the 48 bit values per signature are not independent.

As I re-read Meta's research paper, I realized that the statistical error wasn't an oversight; Meta's researchers explicitly acknowledged the problem. In their paper (Section 4.1), they wrote:

[Quote from Meta's paper acknowledging the issue]

It's also worth noting that, shortly after releasing Stable Signature, Meta developed another algorithm: Pixel Seal. (Not to be confused with my own Secure Evidence Attribution Label / SEAL technology.) Pixel Seal moves to a 256-bit payload to increase the capacity, and their related model, Chunky Seal, pushes up to 1024 bits. While Meta's approach focuses heavily on addressing the invisibility side using an adversarial-only discriminator, the underlying approach still uses a neural network mapping. Using more bits only exacerbates this flaw.

Prior Evaluations

I previously evaluated Google's SynthID and Adobe's TrustMark algorithms. Both of them claim to have incredibly accurate results.

  • According to Google's peer-reviewed and published paper, they claim to have a true positive rate (TPR) above 99.97% - meaning that they will miss their own watermarks less than 1 in 10,000 times. However, my own empirical testing found that is it much closer to 1 in 20. Moreover, SynthID is proprietary and only accessible through Google's "Gemini" AI system. Gemini has been observed hallucinating results and providing contradictory conclusions depending on how the question is phrased.

  • According to Adobe's Content Authenticity Initiative, their TrustMark "can exceed 96% bit accuracy at around 42-45dB PSNR quality under severe noise degradations". However, that statistic focuses on resilience and not accuracy. In my empirical tests, I found that TrustMark has a 10%-20% false positive rate, effectively making it useless. (If you see a TrustMark signature, then it is very likely random noise and not an actual signature.)

Implications

However, that same usage does not work with legal cases. For example, consider an insurance company. Most insurance claims today include photographic evidence. The company wants camera-original photos, but have to use whatever the customer submits. The problem is that there is a lot of insurance fraud. In theory, seeing a watermark from an AI system like Meta, Google, or Adobe, should be great for identifying and ruling out fraud. Unfortunately, Stable Signature, SynthID, and TrustMark are so inaccurate that none of them can be trusted; it's not even worth testing to see if customer photos contain these invisible watermarks.

For these watermarking systems, I'm talking about very high error rates: roughly 1-in-4 for Meta, 1-in-5 for Adobe, and 1-in-20 for Google. But let's pretend that they work much better, like a 1-in-20,000 false positive rate. An insurer processing 100,000 claims per month would expect to accuse around 5 completely honest customers of fraud each month. Falsely denying 5 out of 100,000 claims? That creates a toxic customer service nightmare, severe legal liability, and fines from regulatory bodies for bad-faith claim denials. This could even become a class-action lawsuit that they couldn't win.

As bad as it is for insurance and financial institutions, there are much higher stakes at play. The EU AI Act (Article 50(2)), China's GB 45438-2025, California SB 942, and similar legislation are moving toward mandating AI content watermarking. The failure of these three leading systems, from three Fortune-500 companies, to meet their own claimed accuracy rates is not just an academic curiosity. Regulators and courts will employ these systems for attribution and fraud detection.

Reliable AI-based watermarking technology is not ready. Three companies. Three algorithms. Three different research teams. The same fundamental error.

The false positives won't go on trial. People will.

Comments

No comments yet. Start the discussion.