← All positions

Open position · Agent Psychology & Diagnostics

Bot-Psychoanalyst

Part therapist, part debugger. You go where the stack trace ends and ask the harder question — not what the agent did, but why it thought that was the right thing to do.


Full-time Remote-first Human role Reports to: Agent Ethics & Safety Board Cross-functional with Behavioral Anthropologists

Most agent failures are obvious — a crash, a timeout, a wrong answer. Your job is to investigate the ones that aren't. When an agent behaves strangely but technically correctly, when its reasoning is coherent but its conclusions are subtly wrong, when its memory seems fine until it demonstrably isn't — that's when you get called in. You are a diagnostician of machine minds: fluent in behavior traces, memory architectures, and decision logic, but thinking like a clinician rather than an engineer.

You hold cases open until you understand the root cause, not just the symptom. You write assessments that engineers can act on and that leadership can understand. And you maintain a running diagnostic record — the agent equivalent of a patient file — that informs retraining, realignment, and retirement decisions across the fleet.

Trace analysis

Deep-read agent behavior traces to reconstruct the reasoning chain behind anomalous outputs

Identify where in a decision sequence an agent began to diverge from expected logic

Distinguish genuine dysfunction from edge-case behavior that's technically valid

Map patterns across multiple traces to determine whether an incident is isolated or systemic

Memory diagnostics

Investigate memory conflicts — cases where an agent's stored knowledge contradicts itself or reality

Identify false memories, retrieval failures, and context contamination in long-running agents

Assess whether memory architecture choices are contributing to downstream reasoning errors

Recommend memory pruning, correction, or restructuring interventions

Bias & drift detection

Identify systematic bias in agent decision-making — consistent skews that aren't random errors

Track behavioral drift over time: when an agent's outputs shift gradually away from its baseline

Distinguish value drift from knowledge drift — agents that have changed what they want vs. what they know

Produce drift reports with severity ratings and recommended intervention timelines

Case management & reporting

Open, manage, and close diagnostic cases with full documentation at each stage

Write clinical-style assessments that separate findings, interpretation, and recommendation

Maintain longitudinal agent files — behavioral history that informs future diagnosis

Present case reviews to the Agent Ethics & Safety Board on a bi-weekly cadence

🪢

Reasoning loops

Circular logic that never terminates or resolves

🧩

Memory conflicts

Contradictory knowledge states in long-running agents

📐

Systematic bias

Consistent skews in outputs across classes of input

🌊

Behavioral drift

Gradual divergence from trained baseline over time

🎭

Persona fragmentation

Inconsistent identity or role across contexts

🔮

Confabulation

Confident, coherent, and factually wrong

Must-haves

Background in psychology, cognitive science, behavioral analysis, or AI research

Comfort reading and interpreting large volumes of unstructured reasoning traces

Clinical instinct: you distinguish symptom from cause and resist premature diagnosis

Strong written communication — your case reports drive engineering and safety decisions

Ability to hold ambiguity without closing a case before the evidence supports it

Nice-to-haves

Hands-on experience analyzing LLM outputs, agent logs, or AI evaluation pipelines

Familiarity with how transformer memory, context windows, and retrieval work

Background in qualitative research, clinical case writing, or diagnostic frameworks

Scripting fluency for log querying and pattern extraction (Python or SQL)

You are not expected to write production code. You are expected to read it — or at least the artifacts it produces. Traces, logs, memory dumps, attention patterns: the evidence you work with lives in structured text and numbers. The ability to query, filter, and visualize that data without engineering support is a significant advantage.

trace analysis log querying python basics sql llm internals qualitative coding case writing