AI in Cybersecurity: Arming Hackers and Defenders
AI is reshaping enterprise security on both sides of the fight — expanding attack surfaces while giving defenders tools that operate at machine speed.
AI employees are real and fragile. Here's a practical engineering guide to building agentic workflows that survive production in 2026.
Muunsparks
2026-03-12
Everyone wants an AI employee. What you actually build is a carefully supervised state machine with a language model at the center, one that will confidently call the wrong API, loop forever on an ambiguous task, and occasionally do exactly what you asked in the worst possible way. Building one that works is an engineering problem, not a prompt problem.
The marketing term "AI employee" does a lot of work to obscure what's really happening under the hood. An AI employee, in any practical sense, is an agentic workflow: a system where a language model decides which actions to take, in what order, based on intermediate results — rather than following a fixed execution path you defined in advance.
That distinction matters because it changes the failure mode entirely. A traditional automation pipeline fails predictably: step 3 errors, you fix step 3. An agentic system fails combinatorially. The model makes a reasonable-looking choice at step 2 that creates a subtle problem at step 5, which only surfaces at step 8. The state space explodes. Debugging becomes archaeology.
In 2025, agentic AI was mostly demos and research prototypes. In 2026, the tooling has matured enough that production deployments are genuinely common — not just at frontier labs but at mid-sized engineering teams, growth-stage startups, and enterprise ops teams. The primitives are stable. What's still hard is the engineering discipline to build something that doesn't embarrass you when it runs unsupervised.
This guide is for teams building their first AI employee — or for teams whose first one broke and want to understand why.
A production agentic workflow has four components. Get any one wrong and the whole system becomes a liability.
At the center is a loop. The model receives a task, selects a tool, observes the result, and decides what to do next. This continues until the model either produces a final answer or hits a stopping condition.
Every production agent needs an explicit stopping condition. By default, models will keep going.
import anthropic
client = anthropic.Anthropic()
def run_agent(task: str, tools: list, max_iterations: int = 10) -> str:
messages = [{"role": "user", "content": task}]
for iteration in range(max_iterations):
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=4096,
tools=tools,
messages=messages
)
# Model finished — return final answer
if response.stop_reason == "end_turn":
return response.content[0].text
# Model wants to call a tool
if response.stop_reason == "tool_use":
messages.append({"role": "assistant", "content": response.content})
tool_results = execute_tool_calls(response.content)
messages.append({"role": "user", "content": tool_results})
continue
break # Unexpected stop reason — exit cleanly
return "Max iterations reached — task incomplete. Review agent state."
The max_iterations guard is not optional. Without it, a confused model will loop until you hit a token limit or a rate limit — whichever is more expensive. Set it low during development; you'll be surprised how rarely a well-designed agent needs more than 6-7 steps.
Tools are the hands of your AI employee — the actions it can take in the world. Each tool needs three things: a name, a description precise enough that the model calls it in the right context, and a parameter schema. The description is the interface. Treat it like an API contract.
tools = [
{
"name": "web_search",
"description": (
"Search the web for current information. Use when you need facts, "
"prices, recent events, or data that may have changed recently. "
"Do NOT use for general knowledge you already have — only for "
"information that requires verification or recency."
),
"input_schema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Search query. Be specific — 3 to 7 words."
}
},
"required": ["query"]
}
},
{
"name": "read_file",
"description": (
"Read the full contents of a file by path. Use when you need to "
"analyze existing documents or data. Do NOT use to check if a file "
"exists — use list_files for that."
),
"input_schema": {
"type": "object",
"properties": {
"path": {
"type": "string",
"description": "Absolute file path."
}
},
"required": ["path"]
}
},
{
"name": "write_output",
"description": (
"Write the final output to a specified location. Call this only "
"when the task is fully complete — not for intermediate drafts."
),
"input_schema": {
"type": "object",
"properties": {
"content": {"type": "string"},
"destination": {"type": "string", "description": "File path or channel ID."}
},
"required": ["content", "destination"]
}
}
]
The Do NOT use for... constraints matter. Without negative guidance, models reach for tools unnecessarily — adding latency, cost, and compounding error risk. Every tool definition should describe when not to use it.
By default, agents are stateless. Their only memory is whatever fits in the context window of a single session. For anything requiring continuity, you need external storage. There are three patterns, each with a different trade-off:
In-context (short-term): Just the conversation history. Zero infrastructure, works for single sessions under ~50k tokens. Falls apart for long-running tasks or cross-session continuity.
Semantic memory: Embeddings in a vector store. The agent retrieves relevant past context before each turn. Good for knowledge retrieval; unreliable for sequential task state — you can't reconstruct "what step am I on" from a vector search.
Episodic/structured memory: A key-value store or database where the agent explicitly writes and reads state. More setup, dramatically more reliable for anything that spans sessions or requires auditable execution history.
import json
from pathlib import Path
from datetime import datetime
def load_agent_state(session_id: str) -> dict:
state_file = Path(f"agent_states/{session_id}.json")
if state_file.exists():
return json.loads(state_file.read_text())
return {
"session_id": session_id,
"created_at": datetime.utcnow().isoformat(),
"completed_steps": [],
"context": {},
"history": []
}
def save_agent_state(session_id: str, state: dict):
Path("agent_states").mkdir(exist_ok=True)
state["updated_at"] = datetime.utcnow().isoformat()
Path(f"agent_states/{session_id}.json").write_text(
json.dumps(state, indent=2)
)
def record_step(session_id: str, step: str, result: str):
state = load_agent_state(session_id)
state["completed_steps"].append({
"step": step,
"result": result,
"timestamp": datetime.utcnow().isoformat()
})
save_agent_state(session_id, state)
Implement structured memory early. Retrofitting it into a running agent is painful — state is implicit in the conversation history, and extracting it reliably is harder than it sounds.
The system prompt is not a formality. It is your primary control surface — the document that defines what your AI employee is, what it's allowed to do, and how it handles ambiguity. Treat it like an employment contract, not a product description.
A minimal but complete agent system prompt covers:
You are a research analyst working for [company name].
YOUR JOB:
Research competitive landscapes on assigned topics and produce
structured reports using the provided template.
TOOLS AVAILABLE:
- web_search: find current information and recent developments
- read_file: access briefing documents and templates
- write_output: deliver the completed report
WHEN UNCERTAIN:
Ask exactly one clarifying question before starting. Do not begin
work on a task you cannot complete with high confidence.
WHAT YOU MUST NEVER DO:
- Send emails or messages to external parties
- Access files outside the /research directory
- Make purchasing decisions or commit to any agreements
- Continue past 8 iterations without producing a partial result
OUTPUT FORMAT:
All reports must follow the template at /templates/research-report.md
"Be helpful" is not a policy. "Ask exactly one clarifying question before starting" is a policy. The difference between a working AI employee and an unreliable one is often just the precision of this document.
Not all AI employees are built the same. The architecture above is the foundation — how you configure it varies significantly by function.
Task profile: Browse, read, synthesize, write. Long context, low stakes on individual actions, high stakes on final output quality.
Key design decisions: Give it generous iteration limits (12-15). Invest in retrieval quality — bad search results compound badly over a long research task. Add an explicit "draft first, then revise" step in the system prompt to improve output quality. Human review before delivery, not during.
Task profile: Classify incoming tickets, route to the right queue, draft initial responses, flag escalations. High volume, low variance, very fast feedback loop.
Key design decisions: Keep tools minimal — read ticket, check knowledge base, write response, set priority. Strict output schema. This is the case where a tightly defined agent outperforms a general one by a wide margin. Add a confidence threshold: below 0.8, escalate to human rather than auto-responding.
Task profile: Read code, run tests, suggest fixes, open PRs. High stakes — mistakes have real downstream consequences.
Key design decisions: Human checkpoint before any write operation. Sandbox all code execution. Log every tool call with inputs and outputs for audit. Start in read-only mode and earn write permissions incrementally. This is the slowest use case to deploy safely, and the one where the "move fast" instinct is most dangerous.
Task profile: Enrich lead data, research target accounts, draft personalized outreach. Medium stakes, high volume.
Key design decisions: Strong data validation on CRM writes — garbage in, garbage out and you've now corrupted your database. Rate limit external API calls. Cache aggressively — the same company profile shouldn't be re-researched every time it appears in a lead list.
Once you have a working single agent, the next temptation is to orchestrate multiple agents — an orchestrator that breaks tasks down and dispatches to specialized sub-agents working in parallel.
The appeal is real: parallel execution, specialization, fault isolation. So is the cost: every inter-agent handoff is a potential failure point, context doesn't transfer cleanly between agents, and debugging requires tracing execution across multiple threads.
# Minimal multi-agent orchestration pattern
def orchestrate(task: str) -> str:
# Step 1: Planner agent decomposes the task
plan = planner_agent.run(
f"Break this task into 2-4 parallel subtasks: {task}"
)
subtasks = parse_subtasks(plan)
# Step 2: Worker agents execute subtasks in parallel
import concurrent.futures
with concurrent.futures.ThreadPoolExecutor() as executor:
futures = {
executor.submit(worker_agent.run, subtask): subtask
for subtask in subtasks
}
results = {
futures[f]: f.result()
for f in concurrent.futures.as_completed(futures)
}
# Step 3: Synthesizer agent combines results
combined = "\n\n".join(
f"Subtask: {task}\nResult: {result}"
for task, result in results.items()
)
return synthesizer_agent.run(
f"Combine these parallel research results into one report:\n{combined}"
)
Multi-agent architectures are worth the complexity when tasks have genuinely parallelizable subtasks with clean interfaces between them, when specialization meaningfully improves quality, or when fault isolation is a hard requirement.
They're not worth it when you're distributing a sequential task across agents because it feels more scalable. Start with one agent that works. Split it when you have a concrete, measurable reason.
You can wire together the architecture above in a weekend. Production is where theory meets reality.
Models occasionally call tools with plausible-but-wrong parameters, or invent tool names that don't exist. Validate at the execution layer — treat model output as untrusted input.
REGISTERED_TOOLS = {
"web_search": execute_web_search,
"read_file": execute_read_file,
"write_output": execute_write_output,
}
def execute_tool_calls(content_blocks: list) -> list:
results = []
for block in content_blocks:
if block.type != "tool_use":
continue
if block.name not in REGISTERED_TOOLS:
result = f"Error: tool '{block.name}' is not available."
else:
try:
result = REGISTERED_TOOLS[block.name](**block.input)
except TypeError as e:
result = f"Invalid parameters for '{block.name}': {str(e)}"
except Exception as e:
result = f"Tool execution failed: {str(e)}"
results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": str(result)
})
return results
Returning errors as tool results — rather than crashing — lets the model recover. A well-written system prompt will tell the model explicitly what to do when a tool fails: retry with different parameters, escalate, or stop and report.
Long-running agents accumulate conversation history. Eventually context fills up, older context gets truncated, and the model loses track of what it's already done. This manifests as repeated tool calls, forgotten constraints, or silently restarting tasks from scratch.
Mitigations: summarize intermediate results into structured state rather than keeping raw history; use an explicit save_progress tool so the model records milestones; set a context budget at ~70% of the model's limit and trigger a summarization step before you hit the ceiling.
Step 3's slightly wrong output becomes step 5's broken input. By step 8, the output is confidently wrong. Each individual step looked reasonable. This is the hardest failure mode to catch.
The reliable mitigation is human checkpoints on irreversible actions — not every step, but any action that can't be undone: sending a message, writing to a database, making an API call with side effects. Build these in from the start, then remove them selectively as the agent demonstrates consistent behavior on each class of action.
The honest answer to "should I build an agent?" is: probably not, for most tasks.
If you can enumerate the steps in advance, build a pipeline. Pipelines are faster, cheaper, deterministic, and dramatically easier to debug. The model's reasoning ability is valuable. Its autonomy is a liability you should only accept when you need it.
Build an agentic workflow when:
Stick with a structured pipeline when:
The agentic AI vs. structured pipeline decision is not about capability — it's about how much of the task structure you can specify in advance. The more you can specify, the less you need an agent.
// RELATED ARTICLES
AI is reshaping enterprise security on both sides of the fight — expanding attack surfaces while giving defenders tools that operate at machine speed.
RAG solves real problems, but teams reach for it reflexively. Here are the specific scenarios where it makes your system slower, harder to maintain, and dumber.
Models advertise 1M tokens but fall apart at 130K. The context window arms race is solving the wrong problem.