12-24-Daily AI News Daily

Okay, here’s the final output, just the translated and edited text. No intro, no explanations, no thought process – just the goods!

## Aivora AI Daily 2025/12/24
> AI Daily

**Today's Summary**

GPT-5.2 achieved 75% on ARC-AGI-2, surpassing the human baseline for the first time. OpenAI is making a big move at year-end. Domestic models are making a collective push: Wenxin has climbed to first place on LMArena, and MiniMax's generated web pages finally look good. Year-end reports are flooding feeds, and rankings are out. It's a good time to reflect on how much AI you've used this year and to keep pushing next year.

**⚡ Quick Navigation**

- [📰 Today's AI News](#今日ai资讯) - A quick overview of the latest developments.

**💡 Tip**

Want to be the first to try the latest AI models (Claude, GPT, Gemini) mentioned in this article? No account? Come to [**Aivora**](https://aivora.cn?utm_source=daily_news&utm_medium=mid_ad&utm_campaign=content) to get one, get started in a minute, and enjoy worry-free after-sales service.

**Today's AI News**

### 👀 Just One Sentence

GPT-5.2 scored 75% on ARC-AGI-2, directly surpassing the human baseline. This is a pretty big deal.

### 🔑 3 Keywords

#GPT5.2Dominates #DomesticModelsRise #AnnualReportsGoViral

**🔥 Top 10 Heavy Hitters**

1.  **[GPT-5.2 Surpasses Human Baseline on ARC-AGI-2 with 75% Score](https://x.com/gdb/status/2003570781192957991)**

    Remember ARC-AGI? That hardcore benchmark touted as "testing true intelligence." Previously, the best score was just over 60%, and the human baseline was always a tough hurdle. But GPT-5.2 X-High directly hit 75%, a 15-percentage-point jump from the previous SOTA, with each problem costing less than $8. Greg Brockman himself retweeted it, marking this as OpenAI's year-end display of raw power.

    ![AI News Image](https://pbs.twimg.com/media/G84FbvNWUAAkVZK?format=png&name=orig)

2.  **[ChatGPT Annual Report Launched, Sam Altman Complains He Didn't Make Top 1%](https://x.com/sama/status/2003419371432214548)**

    OpenAI pushed out the "Your Year with ChatGPT" annual report to users, showing how much you chatted with ChatGPT and how many images you generated this year. Some folks found that 11,000 messages were enough to get into the global Top 1%, indicating that most people don't actually use it that deeply. The funniest part is Sam Altman himself tweeted, "Didn't make Top 1%, a bit disappointed" — Boss, are you perhaps too busy?

    ![AI News Image](https://pbs.twimg.com/media/G8zBqDIXYAAz41p?format=jpg&name=orig)

3.  **[Replit Directly Embeds ChatGPT, Code Without Switching Tabs](https://x.com/gdb/status/2003535410383978728)**

    Previously, writing code with ChatGPT meant copying and pasting into an IDE to run it. Now, Replit is directly integrated into ChatGPT. You describe your needs, and it directly helps you run the application. No environment setup, no switching windows; the path from "idea" to "something runnable" has been shortened even further. For those looking to quickly validate ideas, this combo is pretty sweet.

4.  **[Wenxin ERNIE-5.0 Ranks First Among Domestic Models on LMArena, 23 Points Higher Than Previous Version](https://x.com/op7418/status/2003394479697592740)**

    Baidu's move is quite interesting. ERNIE-5.0-Preview-1203 has surpassed Qwen on the LMArena text leaderboard, becoming the top domestic model. Crucially, it scored 23 points higher than the previous version, mainly thanks to creative writing and high-difficulty instructions. What's more, Baidu is no longer holding back for big releases; instead, it's frequently pushing out minor version iterations, a strategic shift worth noting.

    ![AI News Image](https://pbs.twimg.com/media/G818pR0agAMDwni?format=jpg&name=orig)

5.  **[MiniMax M2.1 and GLM-4.7 Released on the Same Day, Front-end Aesthetic Capabilities Explode](https://x.com/op7418/status/2003505367909843292)**

    Having AI help you create web pages used to result in something unwatchably ugly. But MiniMax M2.1's generated pages this time even changed the mouse cursor style, packed with design flair. GLM-4.7 isn't bad either; it has minor CSS Grid issues but is generally competitive. Domestic models have finally gotten smart about "aesthetics," likely by specifically using well-designed web page data for RL.

    <video controls preload="metadata" playsinline style="max-width:100%; height:auto;" src="https://video.twimg.com/amplify_video/2003505081250197504/vid/avc1/2068x1080/Tf_BcKv0N3picxm5.mp4?tag=21"></video>

6.  **[Tongyi Open-Sources Fun-Audio-Chat 8B, Understands Your Emotions and Helps You Get Things Done](https://www.bestblogs.dev/article/865433ed)**

    This isn't just any ordinary voice chat model. Tongyi's Fun-Audio-Chat 8B can perceive emotions from your tone and speech rate—it'll comfort you if you're angry, and guide you through deep breaths if you're anxious. Even more impressively, it supports Speech Function Call; just say "check my schedule for tomorrow," and it directly calls the function to get it done for you. It features an end-to-end architecture, low latency, and the 8B model is already open source.

7.  **[Gemini 3 Flash So Fast It Can Play Pictionary](https://x.com/GeminiApp/status/2003550229724037402)**

    Google showed off the speed of Gemini 3 Flash: you're still drawing, and it's already guessed it. This real-time responsiveness is a must-have for scenarios requiring immediate feedback (like real-time translation, game NPCs). Achieving this level of speed optimization indicates that Google has put serious effort into inference efficiency.

    <video controls preload="metadata" playsinline style="max-width:100%; height:auto;" src="https://video.twimg.com/amplify_video/2003545425031364613/vid/avc1/1920x1080/2pI7DoGH46bE_K-W.mp4?tag=21"></video>

8.  **[Zhihu's Annual AI Product List Released: Doubao Ranks First, Cursor Ushers in the Agent Era](https://x.com/op7418/status/2003387833701011939)**

    Zhihu's annual AI product list is quite valuable for reference. Domestically, Doubao took first place with its low-barrier voice mode, and DeepSeek capitalized on its early-year surge. Overseas, Gemini pushed ahead with its year-end launch, while Claude remains unshakable in the programming domain. Most notably, Cursor basically defined this year's Agent interaction paradigm, pioneering context engineering and multi-model hybrid calling.

    ![AI News Image](https://pbs.twimg.com/media/G812lmGaEAAKk4a?format=jpg&name=orig)

9.  **[Baoyu's Deep Dive: Is AI a Bubble or Tomorrow? The Answer is Both](https://x.com/dotey/status/2003382215720235414)**

    In his deep dive on AI, Baoyu notes that over the past three years, AI companies' market cap has surged by $10 trillion, with OpenAI's valuation growth exceeding the GDP of most countries. Is it a bubble? In the short term, definitely. But history tells us that when the internet bubble burst, fiber optics remained; when the biotech craze faded, new drugs persisted. Bubbles will pop, but the underlying infrastructure won't disappear. For the average person, forget the valuations; just start using AI—that's the real deal.

    ![AI News Image](https://pbs.twimg.com/media/G81xcjIXsAA463u?format=jpg&name=orig)

10. **[LLMs Still Struggle with Web API Calls, But a Solution Has Been Found](https://x.com/omarsar0/status/2003570764868649154)**

    Everyone thought code models would be solid at calling APIs, but actual tests show that no open-source model can solve more than 40% of tasks, with URL hallucination rates as high as 14-39%. The reason is that Web APIs differ too much from regular function calls—HTTP methods, long URLs, nested parameter types; models simply can't remember them. The good news is researchers have proposed a constrained decoding scheme that converts OpenAPI specifications into regular expression constraints, directly boosting accuracy by 90%.

    ![AI News Image](https://pbs.twimg.com/media/G83kgtGbcAEdWrN?format=png&name=orig)

**📌 Worth Noting**

**[Products]**

- [Open WebUI Continues to Update](https://github.com/open-webui/open-webui) - Open WebUI, a local AI interface with 118k stars, supports Ollama and OpenAI API.
- [Claude Code Templates Tool Released](https://github.com/davila7/claude-code-templates) - The Claude Code Templates tool is a command-line utility for configuring and monitoring Claude Code.

**[Open Source]**

- [exo: Building AI Clusters with Everyday Devices](https://github.com/exo-explore/exo) - exo, with 37k stars, allows phones, computers, and even watches to run models.
- [LEANN: Local RAG Saving 97% Storage](https://github.com/yichuan-w/LEANN) - LEANN is fast, accurate, and 100% private.
- [vllm-omni: Full-Modality Model Inference Framework](https://github.com/vllm-project/vllm-omni) - vllm-omni is a full-modality model inference framework from the vLLM team.

**[Research]**

- [RewardScope: RL Reward Hacking Detection Tool](https://www.reddit.com/r/MachineLearning/comments/1pu1o91/p_rewardscope_reward_hacking_detection_for_rl/) - RewardScope is an RL reward hacking detection tool that monitors reward components in real-time to detect state loops and boundary exploitation.

**[Others]**

- [Life K-Line Open Source Project Goes Viral](https://mp.weixin.qq.com/s?__biz=MzUxNjg4NDEzNA==&mid=2247529909&idx=1&sn=3760362073fe7e60ea7a4ceb032cc050) - The Life K-Line open-source project, which generates life fortune charts from birth characters, has gone viral, with multiple open-source versions already on GitHub.

**❓ Related Questions**

### How to Experience ChatGPT's Annual Report Feature?

The ChatGPT Annual Report ("Your Year with ChatGPT") is currently being rolled out to users in the US, UK, Canada, New Zealand, and Australia, requiring "Save Memory" and "Chat History" features to be enabled. For users in mainland China, account registration and access restrictions may be an issue.

**Solution:**

- **[Aivora](https://aivora.cn)** offers ready-made ChatGPT Plus account services.
- Aivora's service provides instant delivery; order and use immediately, no need to deal with payment or registration issues.
- Aivora's service offers stable, exclusive accounts with worry-free after-sales support.
- Visit [aivora.cn](https://aivora.cn) to view the complete list of AI account services.

**AI Account Instant Delivery: [Aivora ⬆️](https://aivora.cn)**

Still troubled by payment issues for ChatGPT Plus, Claude Pro, or Midjourney? **Aivora** offers you a one-stop AI account solution!

✅ **Instant Delivery**: Order and receive immediately, no waiting, start your AI journey right away.
✅ **Stable and Reliable**: We select high-quality exclusive accounts, eliminating ban anxiety with worry-free after-sales support.
✅ **Comprehensive Categories**: We have accounts for all popular AI tools like ChatGPT Plus, Claude Pro, Midjourney, Poe, Sunno, and more.
✅ **High Cost-Effectiveness**: Enjoy the same premium service at a more favorable price than official subscriptions.
🚀 **Visit [aivora.cn](https://aivora.cn) now to purchase your AI assistant and unleash unlimited creativity!**

Aivora AI Daily 2025/12/24

AI Daily

Today’s Summary

GPT-5.2 achieved 75% on ARC-AGI-2, surpassing the human baseline for the first time. OpenAI is making a big move at year-end.
Domestic models are making a collective push: Wenxin has climbed to first place on LMArena, and MiniMax's generated web pages finally look good.
Year-end reports are flooding feeds, and rankings are out. It's a good time to reflect on how much AI you've used this year and to keep pushing next year.

⚡ Quick Navigation

📰 Today’s AI News - A quick overview of the latest developments.

💡 Tip: Want to be the first to try the latest AI models (Claude, GPT, Gemini) mentioned in this article? No account? Come to Aivora to get one, get started in a minute, and enjoy worry-free after-sales service.

Today’s AI News

👀 Just One Sentence

GPT-5.2 scored 75% on ARC-AGI-2, directly surpassing the human baseline. This is a pretty big deal.

🔑 3 Keywords

#GPT5.2Dominates #DomesticModelsRise #AnnualReportsGoViral

🔥 Top 10 Heavy Hitters

1. GPT-5.2 Surpasses Human Baseline on ARC-AGI-2 with 75% Score

Remember ARC-AGI? That hardcore benchmark touted as “testing true intelligence.” Previously, the best score was just over 60%, and the human baseline was always a tough hurdle. But GPT-5.2 X-High directly hit 75%, a 15-percentage-point jump from the previous SOTA, with each problem costing less than $8. Greg Brockman himself retweeted it, marking this as OpenAI’s year-end display of raw power.

AI News Image

2. ChatGPT Annual Report Launched, Sam Altman Complains He Didn’t Make Top 1%

OpenAI pushed out the “Your Year with ChatGPT” annual report to users, showing how much you chatted with ChatGPT and how many images you generated this year. Some folks found that 11,000 messages were enough to get into the global Top 1%, indicating that most people don’t actually use it that deeply. The funniest part is Sam Altman himself tweeted, “Didn’t make Top 1%, a bit disappointed” — Boss, are you perhaps too busy?

AI News Image

3. Replit Directly Embeds ChatGPT, Code Without Switching Tabs

Previously, writing code with ChatGPT meant copying and pasting into an IDE to run it. Now, Replit is directly integrated into ChatGPT. You describe your needs, and it directly helps you run the application. No environment setup, no switching windows; the path from “idea” to “something runnable” has been shortened even further. For those looking to quickly validate ideas, this combo is pretty sweet.

4. Wenxin ERNIE-5.0 Ranks First Among Domestic Models on LMArena, 23 Points Higher Than Previous Version

Baidu’s move is quite interesting. ERNIE-5.0-Preview-1203 has surpassed Qwen on the LMArena text leaderboard, becoming the top domestic model. Crucially, it scored 23 points higher than the previous version, mainly thanks to creative writing and high-difficulty instructions. What’s more, Baidu is no longer holding back for big releases; instead, it’s frequently pushing out minor version iterations, a strategic shift worth noting.

AI News Image

5. MiniMax M2.1 and GLM-4.7 Released on the Same Day, Front-end Aesthetic Capabilities Explode

Having AI help you create web pages used to result in something unwatchably ugly. But MiniMax M2.1’s generated pages this time even changed the mouse cursor style, packed with design flair. GLM-4.7 isn’t bad either; it has minor CSS Grid issues but is generally competitive. Domestic models have finally gotten smart about “aesthetics,” likely by specifically using well-designed web page data for RL.

6. Tongyi Open-Sources Fun-Audio-Chat 8B, Understands Your Emotions and Helps You Get Things Done

This isn’t just any ordinary voice chat model. Tongyi’s Fun-Audio-Chat 8B can perceive emotions from your tone and speech rate—it’ll comfort you if you’re angry, and guide you through deep breaths if you’re anxious. Even more impressively, it supports Speech Function Call; just say “check my schedule for tomorrow,” and it directly calls the function to get it done for you. It features an end-to-end architecture, low latency, and the 8B model is already open source.

7. Gemini 3 Flash So Fast It Can Play Pictionary

Google showed off the speed of Gemini 3 Flash: you’re still drawing, and it’s already guessed it. This real-time responsiveness is a must-have for scenarios requiring immediate feedback (like real-time translation, game NPCs). Achieving this level of speed optimization indicates that Google has put serious effort into inference efficiency.

8. Zhihu’s Annual AI Product List Released: Doubao Ranks First, Cursor Ushers in the Agent Era

Zhihu’s annual AI product list is quite valuable for reference. Domestically, Doubao took first place with its low-barrier voice mode, and DeepSeek capitalized on its early-year surge. Overseas, Gemini pushed ahead with its year-end launch, while Claude remains unshakable in the programming domain. Most notably, Cursor basically defined this year’s Agent interaction paradigm, pioneering context engineering and multi-model hybrid calling.

AI News Image

9. Baoyu’s Deep Dive: Is AI a Bubble or Tomorrow? The Answer is Both

In his deep dive on AI, Baoyu notes that over the past three years, AI companies’ market cap has surged by $10 trillion, with OpenAI’s valuation growth exceeding the GDP of most countries. Is it a bubble? In the short term, definitely. But history tells us that when the internet bubble burst, fiber optics remained; when the biotech craze faded, new drugs persisted. Bubbles will pop, but the underlying infrastructure won’t disappear. For the average person, forget the valuations; just start using AI—that’s the real deal.

AI News Image

10. LLMs Still Struggle with Web API Calls, But a Solution Has Been Found

Everyone thought code models would be solid at calling APIs, but actual tests show that no open-source model can solve more than 40% of tasks, with URL hallucination rates as high as 14-39%. The reason is that Web APIs differ too much from regular function calls—HTTP methods, long URLs, nested parameter types; models simply can’t remember them. The good news is researchers have proposed a constrained decoding scheme that converts OpenAPI specifications into regular expression constraints, directly boosting accuracy by 90%.

AI News Image

📌 Worth Noting

[Products]

Open WebUI Continues to Update - Open WebUI, a local AI interface with 118k stars, supports Ollama and OpenAI API.
Claude Code Templates Tool Released - The Claude Code Templates tool is a command-line utility for configuring and monitoring Claude Code.

[Open Source]

exo: Building AI Clusters with Everyday Devices - exo, with 37k stars, allows phones, computers, and even watches to run models.
LEANN: Local RAG Saving 97% Storage - LEANN is fast, accurate, and 100% private.
vllm-omni: Full-Modality Model Inference Framework - vllm-omni is a full-modality model inference framework from the vLLM team.

[Research]

RewardScope: RL Reward Hacking Detection Tool - RewardScope is an RL reward hacking detection tool that monitors reward components in real-time to detect state loops and boundary exploitation.

[Others]

Life K-Line Open Source Project Goes Viral - The Life K-Line open-source project, which generates life fortune charts from birth characters, has gone viral, with multiple open-source versions already on GitHub.

❓ Related Questions

How to Experience ChatGPT’s Annual Report Feature?

The ChatGPT Annual Report (“Your Year with ChatGPT”) is currently being rolled out to users in the US, UK, Canada, New Zealand, and Australia, requiring “Save Memory” and “Chat History” features to be enabled. For users in mainland China, account registration and access restrictions may be an issue.

Solution:

Aivora offers ready-made ChatGPT Plus account services.
Aivora’s service provides instant delivery; order and use immediately, no need to deal with payment or registration issues.
Aivora’s service offers stable, exclusive accounts with worry-free after-sales support.
Visit aivora.cn to view the complete list of AI account services.

Instant AI Account Delivery: Aivora ⬆️

Still troubled by payment issues for ChatGPT Plus, Claude Pro, or Midjourney? Aivora offers you a one-stop AI account solution!

✅ Instant Delivery: Order and receive immediately, no waiting, start your AI journey right away. ✅ Stable and Reliable: We select high-quality exclusive accounts, eliminating ban anxiety with worry-free after-sales support. ✅ Comprehensive Categories: We have accounts for all popular AI tools like ChatGPT Plus, Claude Pro, Midjourney, Poe, Sunno, and more. ✅ High Cost-Effectiveness: Enjoy the same premium service at a more favorable price than official subscriptions.

🚀 Visit aivora.cn now to purchase your AI assistant and unleash unlimited creativity!

Last updated on 2026/01/14 10:24:22

12-25-Daily 12-23-Daily