“Formally verified” is becoming a procurement criterion for post-quantum cryptography. It means the code has been mathematically proven correct by automated tools, not just tested. For ML-KEM and ML-DSA implementations headed into production environments, it is a meaningful differentiator. It is also a phrase that can cover less than it sounds like it covers.
A new paper by cryptographer Nadim Kobeissi examines the verification pipeline behind Cryspen’s libcrux, one of the most widely deployed PQ libraries in the world. Libcrux provides the ML-KEM implementation in Signal’s post-quantum ratchet. Google partners with Cryspen on FIPS-standard PQ code. Mozilla ships it in Firefox. The Linux Foundation’s PQ Code Package includes it.
Kobeissi found structural gaps in the verification tool itself. Code passes through the full verification process, produces valid proof artifacts, and still carries security properties the proofs never checked. He calls this a “verification facade.” The verification happened. It covered less than you would expect.
What the paper found
ML-KEM and ML-DSA rely on rejection sampling: generate a candidate, check it against a security requirement, retry until one passes. This loop is what makes ML-DSA signatures zero-knowledge and ML-KEM key generation correct. If it does not work, private keys leak.
The verification tool cannot currently prove anything about these loops. A missing annotation causes the pipeline to silently generate a default that makes the loop uncheckable. The proof engine does not flag an error. It reports success because it found no violation, but only because it never actually checked. The proof artifacts look valid. Extraction succeeds without warnings. But nothing behind the output covers what the loop produces.
Kobeissi demonstrated this with five proof-of-concept exploits against ML-KEM (FIPS 203), ML-DSA (FIPS 204), Ed25519, and ChaCha20. Each compiles, passes tests, and extracts valid-looking verification output while carrying a security gap the verification cannot detect. He also identified two other gap classes: operations the proof engine accepts on faith without checking, and an annotation mechanism that can make all downstream proofs vacuously true if misused.
What the production code shows
The paper uses simplified models. We examined the production libcrux repository to check whether these gaps apply to the code that ships. They do.
The annotation that would make rejection sampling loops verifiable does not appear anywhere in the libcrux codebase. Six while loops in ML-KEM and ML-DSA, all implementing rejection sampling or coefficient generation for FIPS 203 and FIPS 204, extract without it.
The build system goes further. ML-KEM’s sampling module and its IND-CPA encryption core are configured to skip proof-checking entirely. ML-DSA uses a whitelist of 28 verified modules; the signing module containing the rejection sampling loop is not on it. Forty-five modules across the codebase have their proofs disabled. The entire ARM64 backend (every iOS and Android device) has zero proofs checked. Verifying optimized SIMD code is genuinely harder than verifying portable implementations, and industry practice often relies on testing optimized paths against a verified reference. But the public claims did not distinguish between backends.
When Cryspen announced their Google partnership in August 2024, they described the implementations as “verified to be panic free, functionally correct, and secret independent.” When Signal adopted libcrux for SPQR in October 2025, they cited verification as providing “a high degree of assurance.” A more careful description of what verification covers appeared from Cryspen’s co-founder in February 2026, the same week Kobeissi disclosed thirteen bugs in the library in a companion paper. Those were not abstract verification gaps. They included an endianness bug that caused real decryption failures in Signal’s post-quantum ratchet, missing X25519 validation, and FIPS 204 specification violations. Four of the thirteen were inside the formally verified boundary.
What this means for PQ deployment
The Rust code is correct. The rejection loops produce right answers. The arithmetic is sound. Kobeissi is careful about this. The issue is the distance between what “formally verified” communicates to someone evaluating implementations and what the verification actually establishes.
Every verification tool has boundaries. CompCert does not verify the linker. seL4 does not verify the hardware. The question is not whether boundaries exist. It is whether they are visible to the people making deployment decisions.
Organizations scanning their own PQ posture with [PQ]probe are already asking what their infrastructure actually negotiates versus what vendors claim it supports. The same discipline applies to implementation assurance. “Formally verified” is a statement about a model. How much of the deployed code that model covers is an answerable question.
If the answer requires reading a Makefile, the assurance claim reads more like aspirational marketing than actual engineering.