Agentic AI Models: The Latest Developments and News
Agentic AI Models: What's New?
The agentic AI models are moving faster than almost any technology market in history. New versions, new capabilities, and new players are emerging every few weeks. This page tracks the 12 most prominent agentic AI models, their version history, and the latest developments from each.
1. GPT Series (OpenAI)
Parent company: OpenAI
Versions so far: GPT-4, GPT-4o, o1, o3, GPT-5, GPT-5.1, GPT-5.2, GPT-5.3, GPT-5.4, GPT-5.4 mini, GPT-5.4 nano
Latest development news:
GPT-5.4 launched as OpenAI's most capable frontier model (March 2026). GPT-5.4 combines the coding capabilities of GPT-5.3-Codex with a 1 million-token context window, native computer use, tool search, and compaction support for long-running agentic workflows. It is described as capable of getting complex professional work done more accurately and with less back-and-forth than any previous model, matching or exceeding industry professionals in 83% of knowledge work comparisons on GDPval.
Source: openai.com
GPT-5.2-Codex released as the most advanced agentic coding model to date (March 2026). Built on GPT-5.2 and optimized for Codex, this variant introduced context compaction for long-horizon coding tasks, stronger performance on large code migrations and refactors, and significantly improved cybersecurity capabilities. A security researcher using an earlier Codex variant independently discovered and responsibly disclosed three React framework vulnerabilities.
Source: openai.com
GPT-5 launched in August 2025 as a unified intelligent system. GPT-5 unified OpenAI's previous reasoning and general-purpose models into a single system with an internal router that decides when to answer quickly and when to think longer. It set new benchmarks in coding (74.9% on SWE-bench Verified), math (94.6% on AIME 2025), and is available to all users including the free tier.
Source: openai.com
Agents SDK for TypeScript released (2025). OpenAI launched a TypeScript version of its Agents SDK, alongside support for remote MCP servers and code interpreter as built-in tools in the Responses API, making it significantly easier for developers to build multi-step agentic workflows integrating external services.
Source: developers.openai.com
2. Claude (Anthropic)
Parent company: Anthropic
Versions so far: Claude 1, Claude 2, Claude 3 (Haiku, Sonnet, Opus), Claude 3.5, Claude 3.7, Claude 4, Claude 4.1, Claude 4.5, Claude 4.6, Claude 4.7, Claude Mythos Preview
Latest development news:
Claude Opus 4.7 released as Anthropic's most capable generally available model (April 2026). Opus 4.7 brings stronger performance across software engineering, complex long-running coding tasks, and higher-resolution vision. Anthropic also launched Claude Design alongside it, a new product that lets users collaborate with Claude to create visual outputs like designs, prototypes, and slides.
Source: anthropic.com
Claude Managed Agents launched in public beta (April 2026). Anthropic launched a fully managed agent harness for running Claude as an autonomous agent, with secure sandboxing, built-in tools, and server-sent event streaming. The platform handles the infrastructure complexity of running persistent agents, letting developers focus on task logic rather than orchestration.
Source: platform.claude.com
Claude Mythos Preview launched as an invitation-only cybersecurity model (April 2026). Described as Anthropic's most powerful model to date, Mythos Preview is available only to vetted organizations through Project Glasswing for defensive cybersecurity work. Anthropic has stated it does not plan to make Mythos generally available but is using the restricted deployment to learn how to safely scale models of this capability class.
Source: CNBC
Claude Cowork reached general availability (April 2026). Cowork, Anthropic's desktop automation tool for non-technical users, launched on macOS and Windows with expanded analytics, role-based access controls for enterprise plans, and computer use capabilities that let Claude take actions directly on a user's desktop.
Source: support.claude.com
3. Gemini (Google DeepMind)
Parent company: Google DeepMind
Versions so far: Gemini 1.0, Gemini 1.5, Gemini 2.0, Gemini 2.5 (Pro, Flash, Flash-Lite), Gemini 3.0, Gemini 3.1 (Pro Preview, Flash, Flash-Lite Preview)
Latest development news:
Gemini 3.1 Pro Preview launched as Google's latest frontier model (February 2026). Released on February 19, 2026, Gemini 3.1 Pro Preview succeeded Gemini 3 Pro with improvements in reasoning, multimodal understanding, and agentic capabilities. It also introduced a dedicated endpoint optimized for custom tools, relevant for developers building agents that combine Gemini with proprietary APIs and data sources.
Source: ai.google.dev
Gemini 2.5 Flash Native Audio reached general availability on Vertex AI (December 2025). The updated model brought sharper function calling, stronger instruction following, and smoother multi-turn conversations to live voice agent applications. United Wholesale Mortgage reported that integrating Gemini 2.5 Flash Native Audio helped generate over 14,000 loans for broker partners since May 2025.
Source: blog.google
Gemini 3 series launched with agentic and coding capabilities (December 2025). The first Gemini 3 Pro Preview launched with state-of-the-art reasoning and multimodal understanding, alongside support for the Computer Use tool. Gemini Code Assist also transitioned from static tools to a fully agentic mode powered by Model Context Protocol (MCP) servers, replacing the previous tool-based integration approach.
Source: ai.google.dev
Apple announced plans to integrate Gemini into Siri (January 2026). Apple confirmed it would use the Gemini AI model in the upcoming version of Siri, marking one of the most significant third-party integrations for the Gemini family and significantly expanding its reach beyond Google's own products.
Source: Wikipedia: Gemini (language model)
4. DeepSeek (DeepSeek AI)
Parent company: DeepSeek AI (owned by High-Flyer hedge fund, Hangzhou, China)
Versions so far: DeepSeek-V2, DeepSeek-V2.5, DeepSeek-V3, DeepSeek-R1, DeepSeek-V3.1, DeepSeek-V3.2, DeepSeek-V3.2-Speciale, DeepSeek-R1-0528
Latest development news:
DeepSeek V4 expected to launch imminently on Huawei chips (April 2026). Reuters, citing The Information, reported on April 3, 2026 that DeepSeek V4 is likely to launch within weeks and will run on Huawei's latest domestic chips rather than Nvidia hardware. DeepSeek reportedly spent months working with Huawei and Cambricon to rewrite parts of its model stack, and is developing two additional V4 variants optimized for different capabilities.
Source: Reuters
DeepSeek-V3.2 released with agent training across 1,800+ environments (December 2025). V3.2 introduced a new massive agent training data synthesis method covering over 1,800 environments and 85,000+ complex instructions. It was also DeepSeek's first model to integrate thinking directly into tool use, supporting agentic tasks in both thinking and non-thinking modes. A special research variant, V3.2-Speciale, achieved gold-medal-level results in IMO, CMO, ICPC World Finals, and IOI 2025.
Source: api-docs.deepseek.com
DeepSeek-R1-0528 updated with 685 billion parameters (May 2025). The updated R1 model brought performance on par with OpenAI's o3 and Google's Gemini 2.5 Pro on reasoning benchmarks, continuing DeepSeek's pattern of achieving frontier-level reasoning at training costs a fraction of US labs, with V3 reportedly trained for approximately $6 million.
Source: Computerworld
DeepSeek-R1 triggered a global industry reckoning upon launch (January 2025). When DeepSeek-R1 launched, it surpassed ChatGPT as the most-downloaded app on the US iOS App Store within days and caused an 18% drop in Nvidia's share price. The model's combination of GPT-4-level reasoning at dramatically lower training costs prompted industry-wide discussion about AI efficiency and challenged assumptions about compute requirements for frontier AI.
Source: Wikipedia: DeepSeek
5. Llama (Meta)
Parent company: Meta
Versions so far: Llama 1, Llama 2, Llama 3 (3.1, 3.2, 3.3), Llama 4 (Scout, Maverick), Llama 4 Behemoth (in training), Muse Spark
Latest development news:
Llama 4 Scout and Maverick launched as Meta's first natively multimodal open-weight models (April 2025). Released on April 5, 2025, Scout is a 17 billion active parameter model with a 10 million-token context window that runs on a single NVIDIA H100 GPU. Maverick uses 128 experts with 400 billion total parameters and a 1 million-token context window. Both are multimodal and multilingual across 12 languages, using a Mixture-of-Experts architecture.
Source: ai.meta.com
Llama API launched at LlamaCon with fine-tuning and security tools (April 2025). At Meta's first-ever LlamaCon developer conference, the company launched a Llama API in limited preview with fine-tuning capabilities, evaluation suites, and early inference access powered by Cerebras and Groq. Meta also released new security tools including Llama Guard 4, LlamaFirewall, and Llama Prompt Guard 2.
Source: ai.meta.com
Llama 4 Behemoth announced as a 2-trillion-parameter teacher model (April 2025). Behemoth, still in training at launch, is described as a 288 billion active parameter model with approximately 2 trillion total parameters, designed as a teacher model to distill capabilities into smaller variants. Meta claimed it rivals top closed-source models in reasoning and math.
Source: TechCrunch
Meta Superintelligence Labs launched Muse Spark as a successor to Llama (April 2026). In April 2026, Meta's newly formed Superintelligence Labs released Muse Spark, described as a multimodal reasoning model built for speed, efficiency, and real-world AI applications, signaling a new chapter beyond the Llama brand.
Source: Wikipedia: Llama (language model)
6. Grok (xAI)
Parent company: xAI (founded by Elon Musk)
Versions so far: Grok 1, Grok 1.5, Grok 2, Grok 3, Grok 4, Grok 4.1, Grok 4.1 Fast, Grok 4.20, Grok 4.3 Beta, grok-code-fast-1
Latest development news:
Grok 4.3 Beta released with native video input and document generation (April 2026). Released April 17, 2026 for SuperGrok Heavy subscribers, Grok 4.3 introduced native video understanding, the ability to generate downloadable PDFs, spreadsheets, and PowerPoint files directly from conversation, and tighter integration with Grok Computer, xAI's autonomous desktop agent. It retains the 2 million-token context window and 16-agent collaboration system from Grok 4.20.
Source: DEV Community
Grok 4.20 launched as a multi-agent collaborative system (February 2026). Rather than a single model, Grok 4.20 is a council of four specialized AI agents that deliberate in parallel before responding. xAI reported it topped Alpha Arena Season 1.5, a live stock-trading competition, achieving approximately 12% returns while all competing models posted losses, and placed second on ForecastBench, a global AI forecasting leaderboard.
Source: NextBigFuture
xAI selected by US Department of War for Frontier AI deployment (December 2025). xAI was chosen to deliver Frontier AI systems to the US Department of War's Chief Digital and Artificial Intelligence Office (CDAO), with plans to deploy Grok across all 3 million military and civilian employees at IL5 security clearance. xAI also announced partnerships with Saudi Arabia, HUMAIN, and the Government of El Salvador.
Source: x.ai
grok-code-fast-1 launched as a dedicated agentic coding model (2026). xAI released grok-code-fast-1, described as a fast and economical reasoning model specifically designed for agentic coding tasks. It was made available through the xAI API and integrated into Microsoft Copilot Studio for US-based enterprise users building coding-intensive agents.
Source: docs.x.ai
7. Mistral (Mistral AI)
Parent company: Mistral AI (Paris, France)
Versions so far: Mistral 7B, Mixtral 8x7B, Mixtral 8x22B, Mistral Large, Codestral, Pixtral, Magistral (Small, Medium), Devstral 2, Mistral Large 3, Ministral 3 family, Mistral Small 4, Voxtral TTS
Latest development news:
Mistral Small 4 released as a unified reasoning, coding, and multimodal model (March 2026). Small 4 is the first Mistral model to merge the capabilities of Magistral (reasoning), Devstral (agentic coding), and Pixtral (vision) into a single model with a configurable reasoning effort parameter. Released under Apache 2.0, it is priced at $0.15 per million input tokens, making it one of the most cost-efficient reasoning models on the market.
Source: mistral.ai
Voxtral TTS launched as Mistral's first open-weight speech model (March 2026). Voxtral supports nine languages including English, French, German, Spanish, Hindi, and Arabic, and can clone a custom voice from a sample as short as three seconds. It is built for enterprise voice agent use cases with a time-to-first-audio of 90ms, and is based on Ministral 3B, making it runnable on edge devices.
Source: TechCrunch
Mistral raised $830 million to build European GPU infrastructure (March 2026). Mistral secured debt financing to purchase 13,800 NVIDIA GB300 GPUs for a data center near Paris, with operations planned by mid-2026. A separate €1.2 billion deal with EcoDataCenter will bring AI compute capacity to Sweden by 2027. The company aims to secure 200MW of capacity across Europe by end of 2027.
Source: IntuitionLabs
Mistral Large 3 launched as a 675-billion-parameter open-weight frontier model (December 2025). With 41 billion active parameters across a Mixture-of-Experts architecture and a 256K context window, Large 3 is one of the largest open-weight multimodal frontier models from a major Western lab. It was co-optimized with NVIDIA GB200 hardware, achieving a 10x performance gain over the prior H200 generation.
Source: VentureBeat
8. Qwen (Alibaba Cloud)
Parent company: Alibaba Cloud (Alibaba Group)
Versions so far: Qwen 1, Qwen 1.5, Qwen 2, Qwen 2.5, Qwen 3 (0.6B to 235B), Qwen 3.5, Qwen3-Max-Thinking, Qwen3.6-Plus
Latest development news:
Qwen3.6-Plus announced as an efficiency breakthrough model (April 2026). Alibaba's Tongyi Lab announced Qwen3.6-Plus claiming it achieves performance close to Claude Opus 4.5 while being under half the size of Moonshot's Kimi K2.5. The model prioritizes efficiency over scale, continuing Alibaba's shift from parameter maximalism toward practical deployability.
Source: Gentic News
Qwen3-Max-Thinking launched as Alibaba's largest model with agentic tool use (January 2026). Qwen3-Max-Thinking, Alibaba's biggest model to date with over 1 trillion parameters, was released with enhanced agentic and tool-selection capabilities. Alibaba claimed it outperformed major US rivals on the "Humanity's Last Exam" benchmark and can automatically select the best AI tool for a given task while drawing on past conversations as context.
Source: South China Morning Post
Qwen 3 series launched with thinking/non-thinking mode switching (April 2025). The Qwen 3 family ranged from 0.6 billion to 235 billion parameters at launch, with all models supporting seamless switching between reasoning-intensive "thinking" mode and fast "non-thinking" mode. The series features a 256K token context window expandable to 1 million tokens, and is available under Apache 2.0 licensing for commercial use.
Source: AI Hub
Qwen app integrated with Alibaba's e-commerce ecosystem (January 2026). Alibaba updated the Qwen app to allow users to shop, order food, and make payments without leaving the app, leveraging integration with Taobao and other Alibaba platforms. With over 100 million monthly active users, the integration gives Qwen a distinct monetization path compared to standalone AI assistants.
Source: CNBC
9. Kimi (Moonshot AI)
Parent company: Moonshot AI (Beijing, China; backed by Alibaba)
Versions so far: Kimi K1.5, Kimi K2, Kimi K2-Instruct-0905, Kimi K2 Thinking, Kimi K2.5
Latest development news:
Kimi K2.5 launched as a multimodal model with Agent Swarm capability (January 2026). K2.5 introduced the ability to handle text, images, and video from a single prompt, along with a feature called Agent Swarm that breaks down complex tasks and delegates them to up to 100 parallel AI sub-agents operating simultaneously with tool access. Moonshot claimed K2.5's agentic capabilities outperformed the top three US AI models and topped SWE-bench Verified at 71.3%.
Source: CNBC
Moonshot launched an automated coding tool to compete with Claude Code (January 2026). Alongside K2.5, Moonshot introduced a dedicated agentic coding tool designed to compete directly with Anthropic's Claude Code. The tool is compatible with OpenAI's API protocols, reducing the barrier for developers already building on OpenAI's ecosystem to switch or test Moonshot's offering.
Source: Kaohoon International
Kimi K2 Thinking released with 200-300 sequential tool calls (November 2025). K2 Thinking was trained for approximately $4.6 million and supports executing 200 to 300 sequential tool calls autonomously, with a 1-trillion-parameter MoE architecture and 256K-token context. Benchmarks showed it outperforming GPT-5 and Claude Sonnet 4.5 on Humanity's Last Exam (44.9%), BrowseComp (60.2%), and SWE-bench Verified (71.3%).
Source: CNBC
Moonshot secured $500 million in funding at a $4.3 billion valuation (January 2026). Investors, including Alibaba and IDG Capital backed the round as Chinese AI competition intensified. The funding was announced coinciding with the K2.5 launch, reinforcing Moonshot's position as one of China's leading frontier AI labs alongside DeepSeek and Alibaba's Qwen team.
Source: Kaohoon International
10. Microsoft Copilot / MAI (Microsoft)
Parent company: Microsoft
Versions so far: Copilot (powered by GPT-4 Turbo, GPT-4o, o-series models); Phi-3, Phi-4 (on-device); MAI-1-preview (Microsoft's own foundation model)
Latest development news:
Microsoft unveiled agentic AI across Dynamics 365, Power Platform, and Microsoft 365 Copilot in 2026 Release Wave 1. The wave introduced agents across sales, service, finance, supply chain, HR, and ERP workflows in Dynamics 365, alongside agent authoring, optimization, and self-healing capabilities in Power Automate. Copilot Studio added advanced governance, multi-agent orchestration, and evaluation features with admin controls for configuring team-level access.
Source: Cloud Wars
Microsoft began building persistent AI agents for Copilot inspired by OpenClaw (March 2026). Microsoft created a team under VP Omar Shahine to develop persistent agents for Microsoft 365 that monitor inboxes, surface documents, and flag time-sensitive items without waiting for a user prompt. The move was partly a competitive response to Claude's direct integration into Word, Excel, and PowerPoint.
Source: Winbuzzer
MAI-1-preview, Microsoft's first homegrown foundation model, entered public testing (August 2025). After years of relying on OpenAI's models for Bing and Windows, Microsoft began testing its own in-house foundation model trained on approximately 15,000 Nvidia H100 GPUs. MAI-1-preview was rolled into Copilot text use cases with early developer access, representing Microsoft's first step toward reducing dependency on any single external AI provider.
Source: Radical Data Science
Grok 4.1 Fast from xAI added to Copilot Studio's multi-model lineup (March 2026). Microsoft added xAI's Grok 4.1 Fast to Copilot Studio, expanding the platform's model options for US-based enterprise users building agents requiring fast reasoning and large context handling. The addition reflects Microsoft's multi-model strategy, allowing businesses to route tasks to the most appropriate model.
Source: microsoft.com
11. Amazon Nova / Amazon Q (AWS)
Parent company: Amazon Web Services (AWS)
Versions so far: Amazon Q Developer, Amazon Nova (Act, Lite, Micro, Pro), Amazon Bedrock agent infrastructure
Latest development news:
Amazon Nova Act launched as a production UI automation agent (2025-2026). Nova Act is designed to build and manage fleets of agents for automating production UI workflows with high reliability. Powered by a custom Nova computer use model, it is positioned for enterprises that need agents to navigate software interfaces, fill forms, and execute multi-step UI tasks at scale rather than just in testing environments.
Source: aws.amazon.com
Amazon Q Developer's coding agents saved 4,500 developer-years of work at Amazon (2024). Amazon CEO Andy Jassy disclosed in the Q2 2024 earnings call that using Q Developer's code transformation agents, Amazon migrated 30,000 internal applications from Java 8/11 to Java 17, saving over 4,500 developer-years of work and $260 million annually. Developers accepted 79% of auto-generated code without any additional changes.
Source: Digiday
Amazon Bedrock expanded to support Claude Opus 4.7 and multi-model agentic workflows (April 2026). AWS made Claude Opus 4.7 available through the Messages API on Amazon Bedrock across 27 AWS regions, allowing enterprise customers to run Anthropic's latest frontier model within AWS infrastructure with regional data routing. Bedrock now supports a wide portfolio of frontier models for agentic workflow orchestration.
Source: platform.claude.com
AWS launched the Strands Agents open-source SDK for simplified agent development (2026). Strands Agents is an open-source SDK that allows developers to build AI agents with just a few lines of code, without requiring custom orchestration logic. It is part of AWS's broader push to make agentic AI more accessible for production deployments without building agent infrastructure from scratch.
Source: aws.amazon.com
12. IBM Granite (IBM)
Parent company: IBM
Versions so far: Granite 3.0, Granite 3.1, Granite 3.2, Granite 3.3, Granite 4.0 (Nano, Micro, Tiny, Small), Granite 4.0 Vision, Granite Code, Granite Embedding, Granite Guardian, Granite Time Series, Granite Speech
Latest development news:
Granite 4.0 3B Vision released as a modular enterprise document extraction model (April 2026). IBM released Granite 4.0 3B Vision as a lightweight LoRA adapter of approximately 0.5 billion parameters built on the Granite 4.0 Micro backbone. It is designed specifically for enterprise document understanding tasks including converting complex charts to code, tables to HTML, and structured data extraction from PDFs. The model ranked 3rd among models in the 2-4B parameter class on the VAREX leaderboard as of March 2026.
Source: MarkTechPost
Granite 4.0 launched as IBM's new generation of hybrid enterprise models optimized for agentic workflows (late 2025). The Granite 4.0 collection uses a hybrid architecture combining Mamba-2 layers with standard transformer attention, requiring over 70% less memory than similarly sized conventional models. The flagship Granite-4.0-H-Small has 32 billion total parameters (9 billion active), while the Tiny and Micro variants are designed for edge and local deployments and fast function calling within larger agentic pipelines. All models are released under the Apache 2.0 license.
Source: ibm.com
Granite 4.0 Nano series released with top-tier agentic benchmarks for sub-2B models (October 2025). IBM released eight Granite 4.0 Nano models ranging from 350M to approximately 1.5B parameters, in both standard transformer and hybrid Mamba-2 variants. On IFEval (instruction following) the Granite 4.0 H 1B scored 78.5, outperforming Qwen3 1.7B at 73.1 and Gemma 3 1B at 59.3. On the Berkeley Function Calling Leaderboard v3, it scored 54.8 against Qwen3 at 52.2, demonstrating strong tool-use capability for an ultra-compact model.
Source: SiliconANGLE
Granite 3.2 launched with programmable reasoning and multimodal document understanding (February 2025). IBM debuted Granite 3.2 with the ability to toggle chain-of-thought reasoning on or off programmatically, reducing unnecessary compute overhead for simpler tasks. A new vision language model matched or exceeded Llama 3.2 11B and Pixtral 12B on enterprise benchmarks including DocVQA, ChartQA, and OCRBench. All models are available on Hugging Face, IBM watsonx.ai, Ollama, and Replicate under Apache 2.0.
Source: PR Newswire
