AI 建造者日报 — 2026年5月24日

📌 X/TWITTER

Box CEO Aaron Levie

AI is making security vulnerabilities easier to find and create — but the new bottleneck is our ability to review, triage, and actually fix them. Far from AI magically solving everything, the follow-on work still requires human judgment. The result: we’re about to enter a security engineer boom. Pure Jevons paradox in action.

AI 让安全漏洞的发现变得前所未有的容易——但新的瓶颈变成了「谁来审查、响应、修复这些漏洞」。AI 远没有神奇地解决所有问题,后续的 triage 和人工判断依然是关键。结论就是:我们将迎来一轮 安全工程师的黄金时代。教科书级别的 Jevons 悖论。


Y Combinator CEO Garry Tan

GBrain v0.40.0 is live — now with a voice agent powered by Gemini Live. “Mars is a friend, Venus is your EA.” A full open-source personal AI you can run locally, MIT License. Garry also shared a sharp take on Geoffrey Moore’s chasm theory: when the alternative is literally “we die” and the bar is zero, the traditional framework breaks down. Startups don’t need a “whole product” when the status quo is non-existent.

GBrain v0.40.0 发布——搭载 Gemini Live 的语音 agent。「Mars 是你的朋友,Venus 是你的私人助理。」完整的开源个人 AI,MIT License,可以本地运行。Garry 还犀利地点评了 Geoffrey Moore 的跨越鸿沟理论:当替代方案是「我们完蛋」且门槛为零时,传统框架就失效了。当一个品类的存量方案不存在,你不需要「完整产品」也能活下来。


Roblox 产品负责人 Peter Yang

Two threads worth noting. First, Peter is deep in conversation with the best solo founders and engineers about how they use agents to 10x their output — AI stack choices, end-to-end workflows, multi-agent coordination. A podcast series is dropping soon. Second, a blunt post on mass layoffs: read the signals early (flat growth + “flatter orgs” talk), learn Codex or Claude Code as your training ground for working with AI agents, and treat your job as a gig with an uncertain renew date.

两条值得关注的线程。第一,Peter 正在深入研究顶尖 solo founder 和工程师如何用 agent 实现 10x 产出——AI 技术栈选型、端到端工作流、多 agent 协调。播客系列即将上线。第二,一篇关于裁员的直白建议:提前读懂信号(增长停滞 +「组织结构扁平化」话术)、把 Codex 或 Claude Code 当作和 AI agent 协作的训练场、把你的工作当作一份不知何时续约的 gig。


AI 布道者 Swyx (Shawn Wang)

Swyx co-signs a mental framework for understanding what transformers learn well today — and why they hit walls. Together with Ankit, he’s been writing about the need for adversarial world models, describing rungs of thinking that bring us closer to a “Kolmogorov-limit generator of reality.” In a separate thread, he introduces Kakuna: skills with checklists that harden your codebase. His metaphor: the “mullet factory” — party in front (ship unique features), dark in the back (timeless production-grade code, audits, subagent parallelism).

Swyx 认可了一个理解 transformer 当前学习能力及其局限的思维框架。他和 Ankit 一直在写关于对抗式世界模型的必要性,描述了逐步逼近「Kolmogorov 极限的真实生成器」的思维阶梯。在另一条推文中,他介绍了 Kakuna:带 checklist 的 skill,专门硬化你的代码库。他的比喻是「mullet 工厂」——前面派对(交付独特功能),后面暗黑(永恒的生产级代码、审计、子 agent 并行)。


Anthropic Claude 官方

Claude 官方账号介绍了 “The Problem Solvers” 系列——聚焦用 Claude 解决硬核问题的创始人。本期主角是 Genspark AI 的联合创始人兼 CTO Kay Zhu,一个完全基于 Claude 构建的 all-in-one AI 工作空间。在一个人人都在拼速度的市场里,Kay 认为团队才是真正的差异化

Claude’s official account spotlights “The Problem Solvers” series, featuring founders tackling hard problems with Claude. This edition profiles Kay Zhu, co-founder and CTO of Genspark AI — an all-in-one AI workspace built entirely on Claude. In a market moving this fast, where anyone can build, Kay’s conviction: the team is what makes the difference.


Every CEO Dan Shipper

Dan will be speaking about his piece “After Automation” — a framework for what comes next when AI handles the mechanics of work. The talk explores what human work looks like when automation is table stakes.

Dan 即将就他的文章「After Automation」发表演讲——探讨当 AI 接管了工作的机械层面之后,人类的工作形态会变成什么样。


OpenClaw 创始人 Peter Steinberger

GitHub 终于原生上线了 PR 数量限制——在此之前 OpenClaw 一直用 bot 来强制执行每「人」10 个 PR 的上限。一个小功能,但对 agent-heavy 的工作流来说是刚需。

GitHub finally shipped PR limits natively — OpenClaw had been using bots to enforce a 10-PR-per-person cap. A small feature, but essential for agent-heavy workflows.


FirstMark VC Matt Turck

AI progress behind the scenes at OpenAI is continuous, compounding progress — not discrete leaps. “Pretty wild over the last few months but behind the scenes it’s continuous progress compounding.”

OpenAI 幕后的 AI 进展不是离散的跳跃,而是持续复利式的积累。「过去几个月已经很疯狂了,但幕后是持续不断的复利进展。」


FPV Ventures 合伙人 Nikunj Kothari

“This time is too important to NOT be doing your life’s best work.” Nikunj also closed a Series A and wired the funds — notably, the company is not AI. A useful reminder that great companies are still being built outside the AI hype cycle.

「当下这个时代太重要了,你不去做一生中最好的工作就说不过去。」Nikunj 还完成了一笔 A 轮融资——重点是,这家公司不是 AI 赛道。一个有用的提醒:AI 热潮之外,依然有伟大的公司在诞生。


Google Labs

Google Labs 在 I/O 之后对网站做了一次翻新,把最新的 AI 实验和工具整合到了一个更清晰的入口。团队还分享了各自产品中「最被低估或最令人惊喜的功能」。

Google Labs refreshed their site post-I/O, consolidating the latest AI experiments and tools into a cleaner entry point. The team also shared what they consider the most underrated or surprising features from their products.


前 OpenAI CPO Kevin Weil

“A friend shared this with me, and I love it so much. Make no little plans.” Kevin shares a personal note resonating with big-picture ambition — fitting for someone who’s served at the intersection of product and science at OpenAI, Instagram, and Twitter.

「一个朋友分享给我,我太喜欢了。Make no little plans。」Kevin 分享了一条关于宏大野心的个人感悟——对这位曾横跨 OpenAI、Instagram 和 Twitter 产品与科学前沿的人来说,很应景。


📝 官方博客

Claude Blog: New in Claude Managed Agents — Dreaming, Outcomes & Multiagent Orchestration

Anthropic 为 Claude Managed Agents 发布了三项重大更新:

Dreaming(梦境/反思):一个定时运行的进程,回顾 agent 的历史会话和记忆,提取模式、整理记忆,让 agent 随时间自我进化。你可以选择自动更新记忆,也可以人工审核后再落地。Dreaming 能够发现单个 agent 自己看不到的模式——重复出现的错误、多个 agent 共趋的工作流、团队共享的偏好。本质上,Memory 让 agent 在工作中边做边学,Dreaming 则在会话之间提炼这些学习。

Outcomes(结果导向):你可以定义一个成功标准(rubric),agent 据此工作。一个独立的 grader 用自己的 context window 评估输出,不受 agent 推理过程的影响。不达标就 pinpoint 问题,agent 再做一轮。内部测试中,Outcomes 将任务成功率提升了最多 10 个百分点,最难的那批任务获益最大。文件生成质量也有显著提升——docx 任务 +8.4%,pptx 任务 +10.1%。

Multiagent Orchestration(多 agent 编排):当任务量超出单个 agent 的能力,一个 lead agent 会把工作拆解并委派给 specialist agent,每个 specialist 有自己的模型、system prompt 和工具。Specialist 并行工作于共享文件系统,结果汇入 lead agent 的上下文。Claude Console 中可追踪每一步——哪个 agent 做了什么、以什么顺序、为什么。Harvey(法律 AI)用这套系统将完成率提升了约 6 倍;Netflix 的平台团队用它并行分析数百个构建日志,只提取值得关注的模式;Spiral by Every 用 multiagent 编排 + Outcomes 做写作 agent——lead agent 跑 Haiku 接收请求,子 agent 跑 Opus 并行生成草稿,每份草稿都按 Every 的编辑准则打分,只有过线的才返回。

Anthropic shipped three major updates to Claude Managed Agents:

Dreaming: A scheduled process that reviews agent sessions and memory stores, extracts patterns, and curates memories so agents improve over time. You control how much: auto-update or review-before-apply. Dreaming surfaces patterns invisible to a single agent — recurring mistakes, converging workflows, shared preferences. Together, Memory captures what agents learn as they work, Dreaming refines that memory between sessions.

Outcomes: Write a rubric describing what success looks like, and the agent works toward it. A separate grader evaluates output independently. When something’s wrong, the grader pinpoints what needs to change and the agent takes another pass. In testing, Outcomes improved task success by up to 10 points, with the biggest gains on the hardest problems. File generation quality also jumped: +8.4% on docx, +10.1% on pptx.

Multiagent Orchestration: When a task is too big for one agent, a lead agent breaks it into pieces and delegates to specialists with their own models, prompts, and tools. Specialists work in parallel on a shared filesystem. Every step is traceable in the Claude Console. Harvey (legal AI) saw ~6x completion rate improvement; Netflix’s platform team analyzes hundreds of build logs in parallel; Spiral by Every uses multiagent orchestration + Outcomes for their writing agent.


🎙️ 播客

Unsupervised Learning Ep 87: Gemini Co-Lead on World Models, RL’s Next Domains & Continual Learning

嘉宾:Oriol Vinyals,Google DeepMind 研究副总裁,Gemini 联合负责人(与 Noam Shazeer、Jeff Dean 并列)

Takeaway:AGI 在某种意义上已经来了——如果七年前拿到今天的模型,Oriol 会说「这就是 AGI」。但真正的缺失能力是从经验中持续学习(continual learning),这不仅仅是又一个 benchmark,而是「智能」的元能力本身。

Google I/O 刚结束,主持人 Jacob Efron 就和 Oriol 坐下来,把 founders 和投资人最关心的问题一一抛给了他。以下是核心洞察:

世界模型还没迎来「GPT 时刻」。Omni 展示了令人惊叹的视频生成和交互编辑能力,但 Oriol 认为多模态模型还没有达到语言模型那种从海量数据中涌现理解力的阶段。真正的挑战在于:能否仅从视频和图像数据中提取出物理规则——比如重力——而不依赖文字标注?这是机器学习的「原初之梦」,但当前最好的方案仍然是混合训练(文字+视觉),纯迁移学习仍处于研究阶段。「如果我们能在所有视频数据上训练,并达到语言模型的理解水平,那将是巨大的突破。」

RL 的下一站不是更多领域,而是元能力。编码和数学上的 RL 进展惊人,但这些领域的成功源于可验证性。Oriol 更关心的是「智能的元能力」——模型能不能在上下文中学会玩一个全新的、训练数据里不存在的游戏?他测试模型的方式是:给一份从未见过的游戏说明书,观察模型是否能理解规则、遵循规则、并在游戏过程中越玩越好。「目前模型在这个测试上还不够好。」

一个他改变看法的事:「我原本相信在更广泛的分布上训练会更好,但没想到在极窄的高难度领域(数学、编码)上做 RL,居然能产生如此强的泛化能力。这是我的意料之外。」

AGI 的缺失拼图:Oriol 认为模型真正从经验中学习的能力——不是预训练时的一次性学习,而是像人一样不断适应——是 AGI 图景中尚未到位的部分。但他同时说:「某种程度上 AGI 已经在这里了」,只是定义在不断后移。

对 founders 的建议:即使不训练自己的模型,精心设计 eval 和数据飞轮本身就具有巨大价值——这些 eval 甚至可能被前沿实验室采纳为标准。如果选择在模型之上构建产品,找到一个巨头不会重点关注的垂直领域,深耕用户和 critical mass,依然能创造巨大价值。

“If seven years ago I had to experiment with a model that we have currently, would I have declared this is AGI? I would say probably yes. […] In some way AGI is here. […] I don’t think it is here in the way I want to see it, but it is fairly close.”


Guest: Oriol Vinyals, VP of Research at Google DeepMind and Co-Lead of Gemini (alongside Noam Shazeer and Jeff Dean)

The Takeaway: AGI is here in some sense — if you’d shown Oriol today’s models seven years ago, he’d have said “this is AGI.” But the real missing piece is continual learning from experience — not just another benchmark, but the meta-capability of intelligence itself.

Right after Google I/O, Jacob Efron sat down with Oriol to run through the questions that founders and investors are asking. Key insights:

World models haven’t had their “GPT moment” yet. Omni is impressive at video generation and interactive editing, but Oriol believes multimodal models haven’t reached the emergent understanding that language models achieved. The holy grail: can you train purely on video/image data and extract physics — gravity, cause and effect — without text labels? This is machine learning’s “original dream,” and the best approach today is still hybrid (text + vision). Pure transfer learning remains in the research stage.

RL’s next frontier isn’t more domains — it’s meta-capabilities. RL progress on coding and math is stunning, but those domains succeed because of verifiability. Oriol is more interested in meta-capabilities: can a model learn to play a game it’s never seen before, purely from reading the instruction manual in-context, and get better as it plays? “The models are not that good at this yet.”

One thing he changed his mind on: “I wanted to believe training on a broad distribution would be better, but I didn’t predict that training on narrow, hard problems (math, coding) with RL would generalize so well. That surprised me.”

AGI’s missing piece: The ability for models to truly learn from experience — not one-shot pretraining, but adapting continuously like humans do. But he also says “in some way AGI is here” — the definition just keeps moving.

Advice for founders: Even if you don’t train your own model, carefully designing evals and data flywheels is immensely valuable — frontier labs might even adopt them as standards. If building on top of models, find a vertical that big players aren’t focused on, build critical mass, and the value can be enormous.


通过 Follow Builders 生成: https://github.com/zarazhangrui/follow-builders

POSTS UPDATED 2026-05-24 #1d60653 📰 建造者日报 2026-05-24