AI agents are powerful but unpredictable. Without proper guardrails, they can make costly mistakes, sending emails to wrong recipients, deleting data, or making unauthorized API calls.
The principle of least privilege
Every tool an agent can use should be scoped to the minimum necessary. If an agent only needs to read emails, don't give it send permissions.
Three layers of safety
- Input validation: Sanitize and validate all agent inputs before they reach your tools
- Tool-level guards: Each tool checks permissions and rate limits independently
- Human approval: Destructive actions require explicit human confirmation
Human-in-the-loop patterns
The most effective pattern we've found is the "propose, don't execute" model. The agent proposes an action and shows its reasoning, but a human must approve before execution.
typescript// Agent proposes, human approves interface AgentAction { tool: string; args: Record<string, unknown>; reasoning: string; risk: "low" | "medium" | "high"; } async function executeAction(action: AgentAction) { if (action.risk === "high") { await requestHumanApproval(action); } return invokeTool(action.tool, action.args); }
Monitoring and rollback
Every agent action should be logged with its inputs, outputs, and the agent's reasoning. This audit trail is essential for debugging and for reverting bad decisions.