The First AI War: How Algorithms Are Fighting in Iran
The US-Israel strikes on Iran aren't just a geopolitical escalation, they're the live debut of AI-driven warfare at scale. Here's what's actually happening.
Claude 4.x takes you literally. Here's how to use that to your advantage instead of fighting it.
Muunsparks
2026-04-14
If your Claude prompts still start with "Act as a helpful assistant," you're leaving performance on the table. Claude 4.x models follow instructions with uncomfortable precision — which means the vague prompts that worked fine eighteen months ago now produce vague output. The fix isn't more words. It's better structure.
Something fundamental changed between Claude 3.5 and the 4.x family. Earlier versions would infer your intent and fill in gaps generously. Ask for a "dashboard" and you'd get charts, filters, and data tables without asking. Claude 4.x builds you a frame with a title — because that's what you asked for.
This isn't a regression. It's a feature. Literal instruction-following means you get predictable, controllable output. But it requires a different mental model. You're no longer coaxing a creative collaborator. You're writing a contract with a very competent executor.
The shift has a name in the community now: context engineering. The idea, popularized by Andrej Karpathy in mid-2025, reframes the LLM as a CPU and the context window as RAM. Your job isn't to write clever prompts — it's to load the right information, in the right structure, at the right time.
Here's what that looks like in practice.
The single highest-ROI change you can make is separating your prompts into four explicit sections. Instead of writing one paragraph that mixes instructions, context, constraints, and output requirements, break them apart:
## INSTRUCTIONS
Analyze the following API error logs and identify the root cause.
## CONTEXT
These logs are from a production FastAPI service handling ~2,000 req/s.
The errors started appearing after a deployment at 14:32 UTC.
## CONSTRAINTS
- Focus only on 5xx errors
- Ignore rate-limiting (429s) — that's expected behavior
- If the root cause is ambiguous, say so rather than guessing
## OUTPUT FORMAT
- One-paragraph root cause summary
- Bullet list of affected endpoints
- Recommended fix with code snippet
This works because Claude parses structured prompts more reliably than freeform prose. Mixing context and instructions in one block forces the model to disambiguate intent — a task where it sometimes guesses wrong. Separation removes the guesswork.
Anthropic trains Claude on structured inputs internally, and XML tags activate pattern recognition that plain text doesn't. This matters most when your prompt combines multiple types of content — instructions, reference data, examples, and variable inputs.
<instructions>
Compare the two API designs below. Evaluate on: consistency,
discoverability, error handling, and versioning strategy.
</instructions>
<api_design_a>
<!-- paste first API spec -->
</api_design_a>
<api_design_b>
<!-- paste second API spec -->
</api_design_b>
<output_requirements>
Return a markdown table comparing both designs across the four criteria.
Declare a winner for each criterion with a one-sentence justification.
</output_requirements>
You don't need XML for simple questions. The rule of thumb: if your prompt has more than two distinct sections, tags will improve response quality. If you're asking "What's a mutex?", skip them.
Few-shot examples remain one of the most reliable ways to steer Claude's output. One well-chosen example consistently outperforms three paragraphs describing what you want.
The key insight: examples don't just demonstrate format — they calibrate tone, depth, and judgment. If your example response is two crisp sentences, Claude will be concise. If it's a detailed analysis, Claude will go deep. The example is the specification.
<examples>
<example>
<input>Review this PR: adds retry logic to the payment service</input>
<output>
**Approve with comments.**
The retry logic is sound, but the backoff strategy is linear (100ms, 200ms, 300ms).
For a payment service hitting external APIs, exponential backoff with jitter
would be more resilient under load. Also: the retry count is hardcoded to 3.
Consider extracting it to config.
</output>
</example>
</examples>
Anthropic's own documentation recommends three to five diverse examples for complex tasks. Start with one. Add more only if the output drifts.
Claude has a tendency toward epistemic honesty — it will hedge when unsure. For research and analysis, this is valuable. For tasks where you need a decision, it's frustrating.
You can control this behavior explicitly in either direction:
When you want honesty:
Analyze this financial data and identify trends.
If the data is insufficient to draw conclusions, say so
rather than speculating.
When you want commitment:
Give me your best assessment even if you're not fully certain.
Flag low-confidence claims with [LOW CONFIDENCE] but don't
omit them.
This works because Claude responds well to explicit permission structures. Telling it when to hedge and when to commit produces better calibration than either extreme.
Here's something non-obvious: explaining why a rule exists makes Claude follow it more reliably than just stating the rule. The model generalizes better from motivated instructions.
Weak:
NEVER use ellipses in your response.
Strong:
Your response will be read aloud by a text-to-speech engine,
so avoid ellipses since the engine won't know how to handle them.
The same principle applies to output format, length, and tone. "Keep it under 200 words" is fine. "Keep it under 200 words because this goes into a Slack notification that truncates at 300 characters" is better — Claude will also optimize for information density, even though you didn't explicitly ask.
A few techniques that dominated prompt engineering discourse in 2023-2024 have quietly become unnecessary or counterproductive with current models.
Aggressive emphasis is counterproductive. Phrases like "CRITICAL!", "YOU MUST", and "NEVER EVER" used to be standard practice. With Claude 4.x, they overtrigger and produce worse results than calm, direct instructions. The model is already literal — yelling at it doesn't make it more literal, it makes it anxious.
Role prompting is weaker than you think. "Act as a senior software engineer" is fine as a framing device, but it's not doing heavy lifting. What actually matters is the context you load: the specific codebase, the architectural constraints, the team conventions. Role prompting without context is like hiring a consultant and not giving them access to your systems.
Chain-of-thought prompting is now built in. You no longer need to manually instruct Claude to "think step by step" for most tasks. The model does this internally, especially with extended thinking enabled. Adding explicit chain-of-thought instructions to a model that already reasons this way can sometimes degrade performance by constraining its natural reasoning path.
The biggest gap between casual users and power users isn't prompt syntax. It's context preparation.
The developers getting the best results from Claude in 2026 are doing something unsexy: they're building persistent context systems. An identity file describing the project and its conventions. A style guide capturing voice and tone. A constraints file listing anti-patterns to avoid. These get loaded automatically into every conversation through system prompts or project instructions.
This is context engineering in its purest form. The model hasn't changed between your mediocre prompt and your excellent one — the information it's working with has.
One practical workflow worth stealing: instead of writing an exhaustive prompt upfront, start with a minimal description of your task and ask Claude to interview you. Let it ask about implementation details, edge cases, and tradeoffs you haven't considered. This consistently produces better final outputs than the "perfect prompt on the first try" approach, because it surfaces assumptions you didn't know you were making.
Tags: prompt-engineering, claude, llm, developer-tools
// RELATED ARTICLES
The US-Israel strikes on Iran aren't just a geopolitical escalation, they're the live debut of AI-driven warfare at scale. Here's what's actually happening.
Most people use Claude like a search engine. These 10 prompts show what it actually looks like to use AI as a daily thinking partner.
Everyone knows attention scales quadratically. Almost nobody talks about why that's a memory problem, not a math problem — and why it matters.