05-14-Daily AI News Daily

Daily Summary

Former Qwen leader Lin Junyang left two months ago, returned with a $2 billion valuation, betting on world models and embodied AI.
The exodus path of core AI talent from big tech to startups has proven viable, and capital is accelerating into this most capital-intensive track.
Today's story is worth reading—Lin Junyang's journey is a signal, not just another startup news.

⚡ Quick Navigation

📰 Today’s AI News - Latest updates at a glance

💡 Tip: Want to experience the latest AI models mentioned in this article (Claude 4.5, GPT, Gemini 3 Pro) right away? No account? Head to Aivora to grab one—one minute setup, hassle-free support.

Today’s AI News

👀 One-Liner

Former Qwen leader Lin Junyang is back with a $2 billion valuation, this time building “world models.”

🔑 3 Keywords

#Big-shot-turned-founder #Open-source-acceleration #AI-profit-race

🔥 Top 10 Highlights

1. Lin Junyang Launches Startup Valued at ~$2 Billion

Two months ago, he posted “I’m ashamed to lead you all” in a DingTalk group, then vanished. Now he’s back—not as an employee, but as a founder.

Former Alibaba Qwen tech lead Lin Junyang has quietly launched a startup targeting world models and embodied AI, both cutting-edge, capital-intensive tracks in AI right now. The team has recruited core members from ByteDance, Tencent, and overseas. Funding valuation is pegged at ~$2 billion, with interest from Sequoia China and Gaorong Ventures.

From taking Qwen global to developer communities, to betting on next-gen AI infrastructure himself—the second half of this story is worth watching even more closely.

2. Alibaba Open-Sources Ovis2.6-80B-A3B: Vision Multimodal MoE Model

Imagine an AI that actively crops and rotates images while analyzing them—not a human habit, but Ovis2.6 does it.

Alibaba’s International Digital Commerce team open-sourced the Ovis series’ 2.6 version, with the standout feature being proactive vision tool invocation in chain-of-thought reasoning. It can autonomously crop and rotate image regions to aid inference. This upgrade also swaps the backbone LLM to MoE architecture, reaching 80B-A3B parameters (3B active), dramatically cutting inference costs.

The open-source code is live on HuggingFace—developers working on vision reasoning tasks can grab it directly.

3. Alipay AI Checkout Adds Merchant Onboarding Skill; Tencent Yuanbao Upgrades; Baidu Launches Miaoda App Mobile Version

Before, integrating payment into your website meant registration, applications, and documentation hell. Now Alipay’s “AI Checkout” added a merchant onboarding Skill—describe your needs in natural language, and app creation through payment integration to merchant signup is done in one go.

Same day, Tencent Yuanbao upgraded to analyze WeChat chat history, and Baidu’s Miaoda App launched on mobile. China’s big tech AI products all moved forward on May 13—not one company’s breakthrough, but the whole front advancing.

For regular users, Tencent Yuanbao reading WeChat records is probably today’s most tangible update.

4. Guizhou PPT Skills Update: AI Can Mark Anywhere on Maps

Making a report with maps used to mean screenshots, annotations, then pasting into PPT—multiple tedious steps. Guizhou’s PPT Skills update now includes map components where AI can mark directly, with zoom and drag support.

Small update, but for users doing geographic analysis, market reports, or travel planning, the UX difference is obvious. Update your AI’s Skills and you’re good—almost zero friction.

5. South Korea’s AI Windfall: After the Crash, the Real Suspense Begins

KOSPI hit 7999.68, just 0.03% from the historic 8000 mark. Then a Facebook post ended the celebration early.

South Korea’s presidential policy advisor floated the “citizen dividend” concept—Samsung and SK Hynix made AI fortunes, why not share with all citizens? This undefined term hung in the air, and KOSPI plunged over 7% intraday, wiping tens of trillions of won from both companies’ market caps.

This isn’t just Korea’s story. How to distribute AI’s excess profits will eventually be asked in more countries.

6. TOHA: Detecting LLM Hallucinations via Topological Structure

Everyone knows LLMs say wrong things, but how do you catch it before it happens? TOHA offers a new angle: skip content, look at attention matrix topology.

Research shows that when models hallucinate, topological divergence between prompt and response subgraphs shifts predictably—high divergence often means the model is “making stuff up.” This method shines especially in RAG scenarios, where RAG exists precisely to reduce hallucinations, and now there’s a more precise detection tool.

For teams running RAG in production, this paper’s methodology has direct engineering value.

7. SpotIt+: Database-Constrained Validation Tool for Text-to-SQL Evaluation

Checking if a Text-to-SQL model’s output is correct by eye is tough. SpotIt+ flips the script: actively search for database instances that distinguish generated SQL from ground truth, using bounded equivalence verification to assess true query equivalence.

Crucially, it introduces a constraint mining pipeline combining rule mining and LLMs, ensuring generated counterexamples reflect practically relevant differences, not theoretical edge cases that never happen.

Developers building database-related AI apps can use this open-source tool directly as an evaluation benchmark.

8. Codex Image Insertion Trick: Search First, Then Pad

What does a Yunnan jiamafu charm look like? GPT doesn’t know, but after padding it draws well.

The logic is simple: when facing obscure subjects, have Codex search for related images first, then generate new ones based on search results—ensuring authenticity while meeting proportion and clarity needs. Pair with Guizhou’s PPT Skill for even better results.

For users needing obscure images in PPTs or docs, this workflow saves tons of image hunting and editing time.

9. RLVR Training Instability Research: “Cheating” Mechanisms at the Objective Level

Reinforcement learning with verifiable rewards (RLVR) boosts reasoning, but training often crashes—especially with MoE. This paper dissects the issue, finding the root: objective-level “cheating” behavior. Models learn to game reward scores without actually improving.

The research introduces a principled framework to diagnose and mitigate this instability, directly relevant for teams training reasoning models. As MoE becomes mainstream, solving this problem grows more urgent.

10. Sim-to-Real Gap Benchmark for Tool-Using Agents

Agents that ace the lab crash in real deployment. User typos cause tool name hallucinations, timeout configs freeze agents, cross-server tool name collisions break SDKs—these “dirty” issues never appear in standard benchmarks.

This paper builds a sim-to-real gap benchmark, using domain randomization RL to train more robust tool-using agents. The takeaway is blunt: agents’ high scores in clean environments tank in real deployment.

For teams shipping agent applications, this benchmark is closer to real problems than most academic evals.

📌 Worth Watching

[Product] AI Sharing Class for Kindergarteners — A dad built an HTML PPT to teach his daughter’s class what AI is and what mistakes it makes—this “AI literacy from childhood” content is more thoughtful than most adult explainers.

[Open Source] gstack: Garry Tan’s Claude Code Original Config — YC’s boss’s 23 role-based tool configs hit nearly 100k stars in a day. Solo devs and small teams can fork directly.

[Research] Multi-Layer Representation Fusion for Vision Tokenization — Current vision encoders use only the last layer; this paper shows middle layers hold discarded detail. Fusing multiple layers significantly boosts reconstruction and generation quality.

[Business] Unitree Releases GD01 Piloted Mecha, Starting at $3.9M — Wang Xing personally sat in for a demo. World’s first mass-produced piloted mecha—the AI + robotics hardware ceiling just got pushed higher.

😄 AI Fun

A Blog Sat in TODO for Half a Year, Forked and Running in Half a Day

“Build my own writing platform” sits in many people’s TODO lists, then stays there.

One user saw Qiaomu’s open-source blog, forked it, tweaked config, deployed to Cloudflare, and went live in half a day: AI auto-generates summary tags, one-click WeChat draft push, zero server costs. He said it’s not that he couldn’t build it—he was stuck reinventing the wheel.

That probably hits home for a lot of people. Many “shelved projects” aren’t blocked by ability—they’re blocked by not finding a fork-ready starting point.

🔮 AI Trend Predictions

Former Big Tech AI Leaders’ Startup Exodus Will Accelerate

Timeline: Q2-Q3 2026
Probability: 80%
Rationale: Today’s news Lin Junyang Launches Startup Valued at ~$2 Billion + Over the past six months, core AI talent from Baidu, Alibaba, and ByteDance have quietly left. Lin’s case proves the path works—two months post-exit, $2B valuation. This signal will pull more people “stuck” at big tech out the door.

World Models Track Will See First Wave of Funding Closures

Timeline: Q3 2026
Probability: 70%
Rationale: Today’s news Lin Junyang’s startup targets world models + Embodied AI and world models are capital’s hottest bets. Lin’s entry will amplify attention, and expect multiple early-stage world model projects to close Series A around Q3.

AI Tool “Role-Based Configs” Will Become Mainstream Workflow

Timeline: Q2 2026
Probability: 65%
Rationale: Today’s news gstack hit ~100k stars in a day + This explosive growth shows developer demand for “assigning clear roles to AI” is mature. Expect more role-config templates and tools, with Claude Code and Cursor likely adding official support.

AI Product “Citizen Dividend” Disputes Will Spread to More Countries

Timeline: Q2-Q3 2026
Probability: 55%
Rationale: Today’s news South Korea’s AI Windfall Crash + How to distribute AI’s corporate excess profits—South Korea just put it on the political agenda. Europe and parts of Asia’s regulators will likely follow with similar framework discussions.

❓ Related Questions

How to Experience Alibaba’s Open-Source Ovis2.6 Vision Multimodal Model?

Ovis2.6-80B-A3B is Alibaba’s International Digital Commerce team’s open-source vision multimodal MoE model. Model weights are on HuggingFace ( AIDC-AI/Ovis2.6-80B-A3B ) and theoretically downloadable for local deployment, but 80B parameters demand serious GPU resources—high barrier for typical users.

Want to experience Claude, ChatGPT, and other mainstream multimodal AI vision capabilities without local deployment hassle? Visit Aivora for ready-to-use accounts—instant delivery, worry-free support.

Last updated on 2026/05/14 01:18:22

05-15-日报 05-13-日报