Our First Proof submissions
#First Proof #AI reasoning #OpenAI #mathematical proofs #expert verification #research-level problems #GPT models
📌 Key Takeaways
- OpenAI shared AI-generated proof attempts for the First Proof math challenge
- At least five of the ten proofs are believed to be correct by experts
- The problems are research-level and require expert verification
- OpenAI is developing more rigorous models for complex reasoning
📖 Full Retelling
OpenAI shared their AI model's proof attempts for the First Proof math challenge on February 14, 2026, at 12:00 AM PT, as part of their research to determine whether advanced AI systems can produce correct, checkable proofs on complex, expert-level mathematical problems. The First Proof challenge consists of 10 research-level mathematics problems designed to test AI systems' ability to build end-to-end arguments in specialized domains, unlike typical math problems that require sophisticated reasoning and expert review to verify correctness. Some of these problems remained unsolved for years before being cracked by human experts. Based on initial expert feedback, the researchers believe at least five of the proof attempts (problems 4, 5, 6, 9, and 10) have a high probability of being correct, while others remain under review. The researchers initially believed their solution to problem 2 was likely correct but later revised this assessment after official commentary and community analysis identified it as incorrect. OpenAI's proof attempts include a newly added appendix with prompt patterns and examples that simulate their manual interactions with the models during the research process.
🏷️ Themes
AI Research, Mathematical Reasoning, Scientific Advancement
📚 Related People & Topics
OpenAI
Artificial intelligence research organization
# OpenAI **OpenAI** is an American artificial intelligence (AI) research organization headquartered in San Francisco, California. The organization operates under a unique hybrid structure, comprising the non-profit **OpenAI, Inc.** and its controlled for-profit subsidiary, **OpenAI Global, LLC** (a...
Entity Intersection Graph
Connections for OpenAI:
🌐
Artificial intelligence
9 shared
🌐
ChatGPT
8 shared
👤
Wall Street
4 shared
🏢
Nvidia
4 shared
🏢
Anthropic
3 shared
Original Source
February 20, 2026 Research Conclusion Our First Proof submissions We’re sharing our proof attempts for First Proof, a math challenge testing if AI can produce checkable proofs on domain-specific problems. View our set of proof attempts (opens in a new window) Loading… Share We ran an internal model on all 10 First Proof (opens in a new window) problems, a research-level math challenge designed to test whether AI systems can produce correct, checkable proof attempts. Unlike short-answer or competition-style math, these problems require building end-to-end arguments in specialized domains, and correctness is hard to establish without expert review. The authors of the First Proof problems are leading experts in their respective fields, and at least a couple of the problems were open for years before the authors found solutions. An academic department that has substantial overlap with the subject areas could conceivably solve many of the problems in one week. We shared (opens in a new window) our proof attempts on Saturday, February 14, 2026 at 12:00 AM PT. Based on feedback from experts, we believe at least five of the model’s proof attempts (problems 4, 5, 6, 9, and 10) have a high chance of being correct, and several others remain under review. We initially believed our attempt for problem 2 was likely correct. Based on the official First Proof commentary and further community analysis, we now believe it is incorrect. We’re grateful for the engagement and look forward to continued review. Our full set of proof attempts can be found here (opens in a new window) . The preprint includes all ten proof attempts, plus a newly added appendix with prompt patterns and examples that aim to simulate our manual interactions with the models during the process. We believe novel frontier research is perhaps the most important way to evaluate capabilities of next generation AI models. Benchmarks are useful, but they can miss some of the hardest parts of research: sustaining lo...
Read full article at source