How to Log AI Agent Security Incidents

When an AI agent takes a bad action, "the model decided it" is not an incident report. You need to know what the user asked, what untrusted content the model saw, which tools were available, and which arguments reached the tool boundary.

That requires logging before the incident happens.

Log the task intent

Capture the user's original request or a compact, privacy-safe summary. This is the anchor for judging whether a later tool call matched user intent.

Log untrusted context sources

Record which webpages, documents, emails, tickets, calendar events, or tool outputs were loaded into the context. You do not always need the full private text. You do need source IDs and timestamps.

Log available tools

At each step, note which tools were available to the agent. Tool availability is part of the blast radius. If a browsing task had email-send access, that is a design finding even if the tool was not called.

Log validated tool arguments

For write-capable tools, log the validated arguments that actually reached the server. Do not log secrets. Do log enough to reconstruct the action: target record, action type, confirmation state, and result.

Log confirmations

If a human approved the action, record the confirmation text shown to the human and the timestamp. If there was no confirmation, record that too.

Keep logs queryable

Incident response is not helped by raw transcript dumps nobody can search. Index by task ID, user or workspace, tool name, action type, and timestamp.

The bottom line

Agent security logging is about reconstructing authority. Which instruction was trusted? Which content was untrusted? Which tool call crossed the boundary? BreakMyAgent focuses on finding those boundary failures before they become production incidents.