Archive
When AI Stalls: OpenAI’s GPT-4 and GPT-5 Failures in Real Investigative Workflows
By Marivel Guzman | Akashma NewsAugust 15, 2025

From: Marivel Guzman – Editor-in-Chief, Akashma News
To: OpenAI Engineering & Product Teams
1. Core Problem
While GPT-4 and GPT-5 are designed to be more advanced than GPT-3, they consistently fail in real-world investigative and editorial workflows. Instead of supporting long-form document creation, the system exhibits stalled execution, phantom waiting, and sudden resets, which destroy user progress.
—
2. Specific Issues Observed
a. Phantom Execution / Stalling
When the user instructs ChatGPT to generate a DOCX, PDF, or other compiled output, the system pretends to be “working,” but in reality produces nothing.
ChatGPT then waits silently until the user asks, “What’s happening?”
Only at that point does the system reveal: “I’m sorry, my memory was wiped / environment reset.”
This is a critical design flaw: the system should either deliver the file or immediately notify the user of a reset, not stall indefinitely.
b. Fragile Session Continuity
If the user switches screens, minimizes the app, or steps away, ChatGPT simply stops producing.
Long-running tasks do not continue in the background.
This undermines trust: professional users expect continuity, not dropped tasks.
c. Data & Work Loss
Resets erase files mid-build, with no partial recovery.
Days of iterative research and drafting are lost.
Users are forced to manually re-feed instructions and text, wasting enormous amounts of time.
d. Regression from GPT-3
GPT-3, though less advanced, allowed linear, lightweight workflows with fewer silent failures.
GPT-4/5, by contrast, stall, over-explain, and fail to finalize outputs, resulting in more friction for professionals, not less.
—
3. Impact
Productivity Loss: 10+ days of investigative research (hundreds of hours) lost due to resets and phantom file generation.
Economic Cost: Equivalent to hundreds of dollars if outsourced labor ($50/hr).
User Experience: Users feel gaslit — the system stalls, then only admits failure when prompted.
—
4. Requested Engineering Fixes
1. Immediate Error Feedback:
If a reset occurs mid-task, the system must notify the user immediately.
Do not wait until the user prompts for an update.
2. Background Task Continuity:
Allow document generation or long responses to finish even if the user changes screens or steps away.
3. Fail-Safe Autosave:
Auto-save partial drafts, so that if a reset occurs, the user can still retrieve the last working version.
4. Stability in File Generation:
Ensure DOCX, PDF, and image-heavy reports can be generated without triggering resets.
If file size is the issue, split automatically and notify the user.
5. Regression Fix:
Restore the simplicity and reliability of GPT-3 in handling straightforward tasks without over-explaining or stalling.
5. Why Publish This Complaint Publicly
This letter was originally drafted as a private complaint. I have already sent two such letters directly to OpenAI. The responses I received were evasive, avoiding responsibility and denying liability, while my professional work continued to suffer.
At this point, it is no longer a private issue. These persistent breakdowns in GPT-4 and GPT-5 undermine trust not only for me, but for any professional depending on AI for investigative, legal, academic, or editorial work. Silent failures, phantom executions, and resets without autosave destroy productivity and waste resources.
Making this public is an act of accountability. If OpenAI wants to promote its models as “professional-grade tools,” then it must also face scrutiny when those tools fail under real professional conditions.
By publishing, I also stand in solidarity with others who may feel isolated in facing the same flaws. A collective voice is harder to dismiss than a single complaint.
—
6. Closing Note
I rely on ChatGPT as an investigative journalist for multi-day projects that require stability, consolidation, and reliable file outputs. GPT-4 and GPT-5 are failing this use case because of phantom execution, reset amnesia, and stalled workflows.
If OpenAI wants this product to serve professionals, it must prioritize execution reliability and continuity over “conversation polish.” A tool that sounds smarter but fails to finish work is worse than a simpler tool that delivers consistently.
— Marivel Guzman
Editor-in-Chief, Akashma News
Editor’s Note:
The irony is not lost. The very AI models under critique — GPT-4 and GPT-5 — were also enlisted to help draft and polish this piece. In other words, the “culprit” assisted in writing its own indictment.