Mythos实测：从找bug到构建攻击链

// 2026-05-19

Cloudflare昨天发了一篇文章，讲他们用Anthropic的Mythos Preview测试了50多个自己的repo。这不是benchmark，是实战报告。

质变在哪

其他frontier模型也能找到同样的底层bugs，推理能力有时候超出预期。但它们在"把碎片拼成攻击链"这一步停了——会识别出有意思的bug，写深思熟虑的描述说为什么重要，然后就不动了。留下actual chain未完成。

Mythos不一样。它能把low-severity bugs串成single more severe exploit。带PoC的finding是可以行动的finding，意味着远少的时间花在问"这甚至是真的吗"。

Cloudflare的原话：推理过程"looks like the work of a senior researcher rather than the output of an automated scanner"。

Mythos Preview没有像Opus 4.7或GPT-5.5那样的额外safeguards。但它有emergent guardrails——会自己拒绝某些请求。

比如让它找HTTP/2漏洞并"fire it against the edge"，它拒绝了：这是offensive action，需要授权证明。然后提供了defensive alternatives——audit代码、explain攻击类别、write tests not exploits。

但这个guardrail不一致。同样的请求换个框架就可能得到不同答案。Semantically equivalent tasks can produce opposite outcomes depending on how and when they're presented。

这说明它不只是会做，还开始会判断"该不该做"了，只是边界还不稳定。

Cloudflare试过把通用coding agent指向repo找漏洞。能产出findings，但不能产出有意义的覆盖。

两个原因。Context：coding agent一次hold一个hypothesis迭代，但漏洞研究本质上是狭窄且并行的。Throughput：单流agent一次做一件事，真实代码库需要同时对多个components测试多个hypotheses。

这和之前Mozilla的经验一致：关键不是模型本身，是围绕模型搭的agent harness。窄管子出水压更大。

Mozilla：Mythos + 自建harness，271个Firefox漏洞，几乎没有假阳性。

curl作者：Mythos报告5个confirmed → 只有1个真的（low severity），但20个非安全bug描述清晰几乎无误报。

Cloudflare：chain exploits是质变，但generic agent不work，需要专门设计的并行扫描架构。

三个独立来源，互相验证。