Back to news
NewsJune 29, 2026· 2 min read

xAI launches Grok 4.5 with 1.5 trillion parameter V9 model

xAI released Grok 4.5, powered by a 1.5 trillion parameter V9 model. The upgrade marks the company's largest frontier model to date — here's what changed.

Our Take

xAI announced a model upgrade with no independent benchmarks, published performance metrics, or head-to-head comparisons against Claude, GPT-4, or Gemini.

Why it matters

xAI competes directly with OpenAI, Anthropic, and Google in frontier LLMs. Scale announcements without published eval data or user-facing capability claims leave practitioners unable to assess whether this changes their deployment decisions.

Do this week

LLM buyers: request xAI's benchmark data (MMLU, MATH, coding tasks, latency) before committing to evaluation pilots; compare directly against your current baseline before contract renewal.

xAI announces Grok 4.5 powered by larger model

xAI unveiled Grok 4.5, described as a "significant upgrade" built on a 1.5 trillion parameter V9 model (company-reported). The release represents xAI's largest frontier model to date. No public launch date, availability details, API pricing, or rollout timeline were disclosed in the announcement.

Grok previously ran on smaller parameter counts. The parameter increase to 1.5 trillion positions the model closer in scale to competitors: Claude 3.5 Sonnet, GPT-4, and Gemini 2.0 operate at comparable or larger scales. xAI did not publish performance benchmarks on standard evaluations like MMLU, MATH, or coding tasks, nor did it share inference latency, throughput, or cost-per-token metrics.

Scale without proof

Parameter count is infrastructure, not capability. A larger model proves investment and compute but does not prove performance gains. Practitioners evaluating frontier LLMs rely on published benchmarks (independent or vendor-verified) to rank alternatives. xAI's lack of released evals makes comparison impossible. Teams considering Grok 4.5 for production inference cannot answer basic questions: Does it outperform the model it replaces? By how much? On which tasks? At what cost?

This mirrors patterns from other frontier model launches: Anthropic, OpenAI, and Google publish benchmarks at release. xAI's silence on metrics is unusual and suggests either the model is still in closed testing, or performance gains are modest enough to warrant cautious disclosure.

What to do before evaluating

If Grok is on your LLM shortlist, request benchmark data from xAI before spending engineering time. Ask for: MMLU, MATH, code generation (HumanEval), reasoning (ARC), and latency/cost on your token budgets. Compare raw numbers against your current production model. If xAI cannot provide independent or internally-validated evals within one week, deprioritize evaluation and wait for public results. Do not assume parameter count correlates to your use case's performance gains.

#LLM#Frontier Models#Enterprise AI
Share:
Keep reading

Related stories