GPT-5.4 just launched. Here's what creative teams need to know.
OpenAI released a new model this week. The coverage has been predictably breathless, with the usual claims about AI reaching human-level performance and changing everything about how we work. So before any of that noise reaches your leadership team, your producer, or your next planning meeting, here's a grounded read on what GPT-5.4 is, what it's genuinely better at, and where the limits still are.
No fandom or fear, just a practical assessment for people who make things for a living.
What OpenAI is actually saying
For once, OpenAI's own positioning is relatively honest. They're not selling this as a general chat upgrade. The official framing is "designed for professional work," with the emphasis on documents, spreadsheets, presentations, research, and multi-step tasks. That's a more modest claim than usual, and it's roughly borne out by the evidence.
The version most people will encounter in ChatGPT is called GPT-5.4 Thinking. There's also a GPT-5.4 Pro version available in both ChatGPT and the API (the API is the direct technical connection that lets developers build tools on top of the model, rather than using the ChatGPT interface). The differences between tiers matter for cost and capability, but for most creative teams, the Thinking version in ChatGPT is the relevant starting point.
What's genuinely different
It can read and work with much more material at once.
The headline feature is a dramatically larger context window. In plain terms: every time you interact with an AI model, it can only "see" a certain amount of text at once. Think of it like a desk. Some models have a small desk; they can only work with a few documents at a time. GPT-5.4 has an enormous desk, technically capable of holding roughly 750,000 words of content simultaneously, which is more than War and Peace six times over.
In practice, this means you can feed it a full production brief, a research report, interview transcripts, and a script all at once, and ask it to synthesise across all of them. That's genuinely useful for the kind of multi-source work that eats time across every creative discipline, whether you're in an agency, a production company, a theatre, a publishing house, or an in-house team.
One caveat worth knowing: the standard desk size is actually closer to 200,000 words. The full million-word capacity is opt-in and costs significantly more. More on cost below.
It's better at producing structured deliverables.
According to OpenAI's own testing, GPT-5.4 is noticeably stronger at generating spreadsheets, presentations, and documents compared to its predecessor. Human reviewers preferred its presentation output 68% of the time. It's also producing fewer factual errors on structured tasks, with OpenAI claiming individual facts are 33% less likely to be wrong.
Worth noting: these figures come from OpenAI's internal benchmarks, and the specific benchmark used for documents and spreadsheets is an investment banking task, not a creative workflow. The direction of improvement is credible and independently corroborated; the exact numbers should be treated as indicative, not gospel.
It can now operate software directly.
This is the technically significant new capability: GPT-5.4 can look at a screenshot of any software, and then operate it, clicking buttons, filling in fields, navigating menus. It can see a screen and use the tools on it.
For most creative teams right now, this is less immediately relevant than it sounds. It's most useful for building controlled internal automations where a model needs to move between tools, complete repetitive steps across systems, or handle structured data entry at scale. That's genuinely powerful in the right context. It's also a capability that deserves careful thought before anyone lets it near live production systems, content management platforms, or anything touching client or audience data. When something with this level of access gets something wrong, the consequences are proportionally larger.
It's better at research.
GPT-5.4 Thinking has improved web search integration and can combine information from many sources more coherently. The model can pull together a briefing across multiple sources in a fraction of the time it would take manually. The practical applications are in the next section.
What it's useful for in a creative context
The honest answer is that GPT-5.4 is most interesting as an operational tool, not a creative one. Its improvements are on the production and synthesis side of the work, not the ideation side.
Turning source material into working documents. You have a folder of research, some development notes, interview transcripts, and a brief. GPT-5.4 can pull across all of that and produce a working summary, a decision document, or a stakeholder pack. Whether you're a producer pulling together a treatment, a creative director building a brief, or an ops lead prepping a project plan, this is where the improvement is most tangible.
The operational grind. Schedules. Checklists. Version logs. Naming conventions. Rollout plans. The stuff that's essential and unglamorous, whatever industry you're in. The document and spreadsheet improvements make GPT-5.4 more useful precisely here, where much of the delivery work actually lives.
Pitch and presentation work. Restructuring storylines, drafting slide copy, turning development notes into something pitch-ready. This applies whether you're pitching a campaign, a commission, a production, or a funding application. The presentation improvements are real enough to make this a legitimate use case, with human editorial judgement at the end.
Research and desk work. Audience insight synthesis, competitor scanning, trend summaries, background reading on a new sector or subject before a pitch. The research improvements mean you can ask it to pull together a briefing across multiple sources and get something genuinely usable, rather than a vague overview. Still verify anything consequential before it shapes a decision.
Where the limits are
This section matters as much as the capabilities, so please read it.
Better performance doesn't mean safe to publish. The model performs better on structured tasks. That doesn't mean it understands your audience, your voice, your context, or your particular sensitivities. Whether you're writing for a brand, a publication, a stage, or a screen, anything going out under your name needs human editorial review. That hasn't changed.
The error rate is lower, not zero. OpenAI has improved accuracy on fact-heavy tasks. Hands-on testers have still found consequential errors, including a ten-times dosage calculation error in a pharmaceutical-style task. "Less likely to be wrong" is different from "reliable enough to skip review." Treat it as a capable first drafter, not a source of truth.
Personal ChatGPT accounts and confidential work don't mix. If you or your team are using personal ChatGPT accounts (the free or paid individual plans), OpenAI's default policy allows those conversations to be used to train future models, unless you opt out. For any confidential work, whether that's an unreleased script, a client brief, a commission in development, or commercially sensitive material, that's a real risk. The business, enterprise, and API versions don't have this issue; they don't train on your data by default. The practical rule is simple: use the right account type for sensitive work, and make sure your team knows the difference.
A bigger desk doesn't make you a better thinker. Several developers have already flagged this. When you can throw everything at the model at once, the temptation is to do exactly that, with no structure or curation. That tends to produce expensive, unfocused output. A large context window is a capability, not a strategy. The discipline of thinking carefully about what you're asking and what you're giving it still matters, regardless of how much it can technically hold.
It doesn't replace judgement. No benchmark OpenAI published for GPT-5.4 tests for taste, cultural nuance, editorial instinct, or creative accountability. Those remain human responsibilities. The model is a strong production assistant. It is not a director, an editor, a dramaturg, or a creative lead.
A quick word on cost
AI models charge based on the volume of text being processed, both what you send in and what comes back. The GPT-5.4 Thinking model costs more per unit of text than its predecessors, reflecting the capability improvements.
For light, bounded tasks, this probably won't be noticeable. For teams running high-volume workflows, or anyone tempted to feed the model enormous amounts of material routinely, it adds up quickly. Before rolling out any workflow at scale, it's worth understanding what it costs at volume.
Is this the only option?
No. GPT-5.4 is one strong tool in a competitive market. Claude Sonnet and Opus (from Anthropic) and Gemini are also capable performers on structured document work. Many organisations already hold Microsoft 365 Copilot licences as part of their existing Microsoft subscriptions, which covers some of the same ground.
Before defaulting to GPT-5.4 for everything, it's worth mapping the task against what's already available and already paid for. The best tool is often the one already in the stack.
The honest summary
GPT-5.4 is a meaningful improvement for creative teams doing document-heavy, research-heavy, or operations-heavy work. The synthesis capabilities, the research improvements, and the stronger structured output are genuinely useful across industries, whether you're in advertising, film, publishing, theatre, music, design, or anywhere else where creative work gets made.
It's not magic, and it still needs clear workflows, appropriate account types, and human review on anything that matters.
The creative teams that get value from it will be the ones who use it deliberately: clear tasks, clear inputs, clear review steps. The ones who don't will be the ones who throw everything at it and wonder why the output feels expensive and slightly off.
Use it to reduce the drag on operational work. Keep the thinking human.
KINTAL helps creative businesses adopt AI without losing what makes them good. If you're working out where tools like this fit in your team, get in touch.
A note on how this was made: the research was validated using Perplexity Deep Research, drafting and editing was done with Claude, and the image was generated with Gemini. The framing, the audience, the editorial judgement, and every significant decision along the way were mine.

