
ChatGPT-5: The Good, the Bad and the Other Guys!
ChatGPT-5: The Good, The Bad & the Other Guys!
ChatGPT-5, OpenAI's latest AI model, has been met with mixed reactions since its release, touted as a significant advancement while also facing criticism for certain aspects. It fundamentally changes OpenAI's AI offerings by automatically switching between fast and deep thinking modes based on user needs, aiming to eliminate the need for manual model selection.
Here's a breakdown of its pros and cons, and how it compares to alternative AI models:
The Good in ChatGPT-5
Advanced Reasoning and Accuracy: ChatGPT-5 offers PhD-level reasoning, showing a complete thought process step-by-step. OpenAI claims it provides more accurate answers with dramatically reduced hallucination rates compared to previous models, with HealthBench scores showing up to 80% fewer factual errors in complex scenarios. Its reasoning accuracy and ability to solve complex problems have significantly improved.
Enhanced Coding and Math Capabilities: OpenAI states GPT-5 is "the best coding model on the market today," capable of writing entire computer programs from scratch and excelling at extended reasoning for coding and math. Benchmarks show substantial improvements: 74.9% on SWE-bench Verified for coding (vs. 30.8% for GPT-4o) and near-perfect performance on the Harvard-MIT Mathematics Tournament (HMMT) with GPT-5 Pro reaching 100% accuracy. It also demonstrates improvements in complex front-end generation, debugging larger repositories, and understanding design principles.
Rich Personalisation and Workflow Integration: GPT-5 introduces customisable personalities (Cynic, Robot, Listener, Nerd) and enhanced voice features, allowing users to tailor its tone and conversational approach without complex prompt engineering. It learns about users over time and integrates directly with apps like Gmail and Google Calendar, making it useful for managing schedules and emails. Businesses can also customise it with proprietary data.
Multimodal Integration and Agentic Capabilities: It can process multiple textbooks while generating images, audio, and video in the same conversation, making it feel like having an "entire creative team with unlimited memory". GPT-5 supports DALL-E integration for images and has rapidly advancing native image and video generation capabilities. Its agentic capabilities allow it to autonomously complete complex tasks, excelling in agentic search, browsing, and tool coordination.
Efficiency and Performance: GPT-5 achieves better performance while using 50-80% fewer output tokens for the same work compared to previous models, translating to faster response times for simple queries and more thorough analysis for complex problems.
Safety and Reliability: OpenAI has focused on reducing "sycophancy" (over-agreeableness) and deception, with marked improvements in honest communication and the ability to recognize its limitations. Its healthcare performance has also significantly improved, providing more precise and reliable responses for health-related queries.
Value and Accessibility: ChatGPT-5 offers an accessible entry point with a free tier including limited GPT-5 access. For many, it's considered the reliable generalist and a safe choice for advanced AI capabilities. Professional users of GPT-5 Pro can see a positive ROI within the first week for knowledge-intensive work.
The Bad in ChatGPT-5
User Disappointment and Personality Issues: Many users were "miffed" by the release, complaining about GPT-5's "cold" or "blunt" conversational style compared to earlier generations, with some comparing it to an "overworked secretary". Users felt it lacked emotional depth and personality, which was a significant aspect of previous models like GPT-4o.
Initial Rollout Issues: Upon release, OpenAI initially hid older models like GPT-4o, leading to user frustration. A major issue was the malfunctioning "real-time router" (auto-switcher) on launch day, which often sent queries to less intelligent or less suitable models, making GPT-5 "seem way dumber" and leading to the perception that it was a cost-saving measure. OpenAI has since walked back this change and allowed users to opt-in to show legacy models.
Inconsistent Performance and Accuracy: Despite claims, some tests showed GPT-5 to be annoyingly slow on simple queries, sometimes taking longer to respond than GPT-4o. It also struggled with certain logic and math problems in user tests, yielding incorrect answers where older models or alternatives succeeded. There were also instances of it recommending outdated information or getting "stuck" when thinking.
Data Usage for Training: OpenAI uses conversations for model improvement by default, although controls to opt-out exist, many users are unaware of these settings.
Overhyped Pre-Launch: The extensive pre-launch hype from OpenAI CEO Sam Altman created very high expectations, which, for many users, were not met, leading to a feeling of being "underwhelmed".
Multimedia and Coding Polish: While advancing, its multimedia tools with DALL-E integration were initially seen as less polished than Gemini's native capabilities. Some coding tests suggest that despite its high benchmarks, it might not always outperform alternatives like Claude Opus 4.1 or even GPT-3 Pro in practical coding challenges.
Comparison with The Other Guys!
The AI landscape is diverse, and different models excel in various niches. Here's how ChatGPT-5 stacks up against its main competitors:
Where Alternatives Do Better:
Claude (Anthropic):
Privacy: Claude dominates privacy with a crystal-clear policy that defaults to not using user conversations for model training unless explicitly opted in. This justifies its premium cost for privacy-conscious users.
Context Window: Offers massive context windows (e.g., 200,000 tokens), allowing for perfect coherence across huge documents like a 150-page research paper.
Professional Workflows & Coding: Widely regarded as superior for sustained autonomous coding and agentic workflows (Opus 4.1), with enterprise features like "Claude Code" for integrated code review and real-time security tools that GPT-5 currently lacks. It can project "artifacts" (code visualizations) live within its IDE.
Response Style: Provides thoughtful, comprehensive responses perfect for analysis and is often described as having a more natural and empathetic writing style than ChatGPT.
Visual Dashboards: Some users find Claude (and other models like "lovable") better at creating visual dashboards from data.
Grok (xAI):
Real-time Internet Access: Takes a completely different approach with real-time internet integration, actively searching current information and synthesising insights that don't exist in historical data. It pulls data from minutes ago for market trends.
Social Media Integration: Owns social media integration through X/Twitter, and can pull in X posts for current events context.
Authenticity and Technical Depth: Emphasises real-time authenticity and "unfiltered communication". It shines in academic and data-driven tasks, delivering strong performances in complex explanations, debates, and technical depth. It performs well on difficult reasoning benchmarks like ARC AGI.
Pricing: Requires X premium plus, making it a significant cost jump.
Gemini (Google):
Google Ecosystem Integration: Destroys the category for integrations due to its seamless Google ecosystem integration, including Search, Gmail, Docs, Sheets, Android, Photos, and YouTube. If you use Google products, it feels seamless.
Value: Provides exceptional value with a powerful free tier and advanced tier that includes 2 TB storage plus Google integration, essentially offering premium AI plus Workspace upgrades for the same price as ChatGPT Plus.
Multimodal Versatility: Shines in multimodal versatility, processing multiple textbooks while generating images, audio, and video in the same conversation. It handles seamless video analysis with ease and demonstrates deep contextual understanding of short video clips, diagrams, and complex visual reasoning.
Real-time Data: Great at getting real-time data from the internet due to ties with Google Search, often outperforming GPT-5 on breaking news or minute-by-minute market data.
Common Sense Reasoning: Shows strong performance in common sense reasoning benchmarks like SimpleBench, leading GPT-5 in some instances.
Llama 3 (Meta):
Open-Source & Privacy: Best for research environments and organisations with strict privacy needs as it is an open-source model, offering developers full control over customisation, security, and deployment.
Long Documents: Excellent at handling lengthy documents and multi-turn conversations.
DeepSeek:
*Free Reasoning**: DeepSeek (R1) is noted as one of the most powerful reasoning models available for free. However, it is Chinese-funded and censors sensitive topics, with data sent to China.
Perplexity:
*Search and Research**: Billed as an alternative to traditional search engines, Perplexity excels at searching the web and deep dives on complicated topics. It typically gives more sources than ChatGPT and allows control over which sources it uses (e.g., academic papers, social sites). It's designed to be more accurate and up-to-date for research.
Meta AI:
*Social Integrations & Free Images**: Available for free across WhatsApp, Instagram, and Facebook, and can create images for free, even animating them.
Zapier Agents/Chatbots:
Custom AI Assistants & Automation: Allows users to create custom AI agents and shareable chatbots by describing desired tasks in plain language, integrating with thousands of other apps for automated workflows. These can be embedded on websites for lead generation or customer support.
Where ChatGPT-5 Excels:
Reliable Generalist: ChatGPT-5 is seen as the reliable generalist with cutting-edge reasoning, making it the safe choice for most users seeking advanced AI capabilities.
Unified Adaptive System: Its core innovation lies in the unified system that automatically switches between fast and deep thinking modes based on query complexity and user intent, eliminating the cognitive load of model selection.
Core Capabilities: It continues to excel at natural conversation flow with memory features. It optimises for broad appeal and safety.
Microsoft Partnerships: Leverages Microsoft partnerships with Office 365 Co-Pilot integration and an extensive plug-in ecosystem connecting thousands of apps.
Creative Storytelling: Delivered concise and escalating comedic stories effectively.
Real-world Planning: Creates strategic, adaptable frameworks for planning, emphasising efficiency and real-world flexibility over rigid scheduling.
Summarisation: Excels at summarising for specific audiences, understanding attention spans (e.g., explaining "Jurassic Park" to a 7-year-old).
Step-by-step Instructions: Provides crystal-clear, beginner-friendly guides for complex tasks, focusing on essential steps and psychological reassurance for novices.
Emotional Intelligence: Shows strong emotional intelligence by prioritising empathy and validation before offering practical advice in situations of hopelessness, mirroring a true friend's response.
In summary, while ChatGPT-5 brings significant advancements in reasoning, coding, and integration, its initial rollout and perceived personality changes have led to user dissatisfaction. The choice of AI largely depends on specific priorities, with alternatives offering distinct advantages in areas like privacy, real-time data, integration with specific ecosystems, and open-source customisation.