When enterprise AI deployments fall short, failures often span several layers at once: weak task decomposition, tool unreliability, retrieval gaps, model reasoning limits, poor evaluation coverage, permissions design. Organizational context is one significant contributor among those, and one that is consistently underestimated. Understanding why requires thinking clearly about what each layer of the stack can and cannot know.
Practitioners who build agentic systems have converged on a useful taxonomy: model, harness, and context. Each layer can improve independently, and each one has a different update cost, speed, and failure mode. Harrison Chase, Co-Founder and CEO of LangChain, articulated this recently:
"Most discussions of continual learning in AI focus on one thing: updating model weights. But for AI agents, learning can happen at three distinct layers: the model, the harness, and the context. Understanding the difference changes how you think about building systems that improve over time."
The taxonomy maps cleanly to what we see in production enterprise deployments. And it surfaces a gap that most agent frameworks have not addressed: context at the organizational level is categorically different from context at the user or agent level, and it is the layer that most often determines whether an agent actually behaves correctly inside a real company.
The three layers of enterprise agent orchestration
Chase defines the three layers as follows. The model is the weights themselves: Claude, GPT-4o, Gemini. The harness is everything that is always on around the model: every piece of code, configuration, and execution logic that is not the model itself, including prompts, tools, MCPs, orchestration logic, memory, state, hooks, and the inner tool-calling loop. LangChain's framing makes this explicit: the harness is what operationalizes the model for every instance of the agent. The context is configuration that lives outside the harness and customizes it per tenant: instructions, skills, and memory that are not always-on but get pulled in based on who is running the agent.
The weights. e.g. Claude Sonnet, GPT-5.4, Gemini, GLM5. Updated via SFT, RL (GRPO), fine-tuning.
"Workflows are systems where LLMs and tools are orchestrated through predefined code paths. Agents, on the other hand, are systems where LLMs dynamically direct their own processes and tool usage... The most successful implementations weren't using complex frameworks or specialized libraries. Instead, they were building with simple, composable patterns."
Chase maps real products to this taxonomy clearly. Claude Code: model is claude-sonnet, harness is Claude Code itself, user context is CLAUDE.md, /skills, mcp.json. OpenClaw: model is many, harness is Pi plus scaffolding, agent context is SOUL.md and skills from ClawhHub. You can identify similar layers across many agent stacks, including LangGraph, CrewAI, and OpenAI Agents SDK, though the exact boundaries differ by framework. LangGraph, for instance, positions itself primarily as a low-level orchestration runtime for durable execution and human-in-the-loop workflows, not a harness in the narrow sense. The value of the taxonomy is the conceptual separation, not a precise mapping onto any single framework.
How learning works at each layer
Each layer has its own improvement mechanism, its own speed, and its own failure modes.
Model layer: SFT, RL (GRPO), fine-tuning. Slow and expensive. The central challenge is catastrophic forgetting: updating on new data tends to degrade performance on things the model previously knew. High ceiling of impact, but the highest cost and the least human-inspectable process.
Harness layer: The Meta-Harness paper (Lee et al., 2026) demonstrates this approach concretely. The pattern: run the agent over a set of tasks, evaluate results, store all execution traces to a filesystem, then run a coding agent over those traces to propose changes to the harness code. The harness improves through its own traces. Medium cost, medium speed. Usually done at the agent level, meaning one harness for all users, though you could in principle learn per-user harness variants.
"The performance of LLM systems depends not only on model weights, but also on their harness: the code that determines what information to store, retrieve, and present to the model."
Context layer: This is where the most flexibility lives. Context updates are low cost, fast, and human-inspectable. They can happen offline (batch over recent traces, extract insights, update configuration (what OpenClaw calls "dreaming")) or in the hot path (the agent updates its own memory as it runs). And critically, they can happen at multiple levels of granularity: per agent, per user, per team, per org.
"Context engineering is the delicate art and science of filling the context window with just the right information for the next step. When in every industrial-strength LLM app, context engineering is the core discipline — not prompt engineering."
"Context engineering is the delicate art and science of filling the context window with just the right information for the next step... Doing this well involves task descriptions and explanations, few shot examples, RAG, related data, tools, state and history, compacting — doing this well is highly non-trivial."
| Dimension | Model | Harness | User / Team Context | Org Context (BehaviorGraph) |
|---|---|---|---|---|
| Form factor | Model weights | Code | Config files (agent, user, team) | Dynamic behavioral graph |
| Level of granularity | Agent | Agent | Agent, user, org, team | Org-wide, continuously updated |
| Cost to update | High | Medium | Low | Low |
| Speed to update | Slow | Medium | Fast | Continuous |
| Human inspectable | No | Yes | Yes | Yes |
| Ceiling of impact | Highest | High | Medium | High (routing correctness) |
| Update pattern | Batch offline | Batch offline job | Batch offline; hot path | Continuous signal ingestion + batch graph refresh |
| What it teaches the agent | General reasoning | How to run reliably | User/team preferences & skills | Who to route to, who is trusted, how decisions actually move |
The table above lays out four layers. The first three (model, harness, and user/team context) are what most agent stacks already address. The fourth column is what the rest of this piece is about: why org-level context is categorically different from user context, why existing tools cannot substitute for it, and what it actually takes to build it.
The missing granularity: org-level context
Most agent stacks recognize three granularities at which context can apply: agent level, user level, and team level. Products like Hex's Context Studio, Decagon's Duet, and Sierra's Explorer do this well.
But in enterprise deployments, there is a fourth granularity that is categorically different from the others: organizational context. Not "what does this user prefer?" but "how does work actually route inside this org, who is trusted for what type of decision, which approval paths are real versus nominal, and when is a bottleneck forming?"
This is not information you can extract from user prompts, CLAUDE.md files, or even traces of individual agent runs. It is a property of the organization as a whole, and it changes continuously as teams restructure, people change roles, and projects shift priority. It requires a different kind of context layer.
Consider how much a well-integrated enterprise AI system can learn about a person. It can read their emails, infer their writing style, understand their background and domain expertise, observe their communication patterns, and build a detailed profile of who they are and how they work. With Slack and email access it can map who they exchange messages with most often. But there is a hard limit on what that picture can tell you.
People do not write "I don't trust Jordan's judgment on vendor contracts" in their work email. They do not tell the company AI model that they think their manager's approval is a rubber stamp. They do not explain in Slack that the person listed as team lead has not actually made a call in six months. The things that matter most about professional relationships, including peer trust, informal authority, and who someone actually goes to when they need something unblocked, almost never surface as explicit statements in any tool.
So even a model that knows a person extremely well does not know how that person is perceived by their peers, which of their relationships carry real weight, or where they sit in the org's actual decision network. You might know their communication network from email and Slack metadata. But knowing who someone exchanges messages with is not the same as knowing who they rely on, who trusts their judgment, or who can move something forward by picking up the phone. That is a different layer of signal, one that can only be inferred from behavioral patterns across the whole organization, not reconstructed from any single person's digital exhaust.
Why existing tools don't fill this gap
Engineers who have worked in enterprise software will recognize the problem and immediately reach for a familiar solution: role-based access control. Workday defines who can approve a purchase order. ServiceNow defines who can resolve a ticket. GitHub defines who can merge to main. Okta defines who can access which system. Every mature enterprise tool has a permissions model. Engineers know how to build these. Why isn't that enough?
Because RBAC answers a different question. Permissions define who is allowed to do something. Org context is about who actually does it. The two are not the same thing, and they often diverge over time, especially after re-orgs, role changes, and project transitions.
The formal approver for budget requests is the VP of Finance. The actual approver, the person whose reply unblocks things, is her chief of staff. That is not in Workday. The person with merge rights on the core auth service is a staff engineer who left six months ago and whose account was never deprovisioned. The team listed as owners of the data pipeline have not touched it in a year; the two engineers who actually maintain it are in a different org unit. These gaps are not edge cases. They are the steady-state of how organizations work.
Hard-coded permissions also do not scale for a second reason: the org changes faster than anyone updates the rules. Re-orgs, role changes, new hires, departures, project transitions, each one creating drift between the permissions model and organizational reality. Human teams absorb this drift through informal knowledge: people know to ask Marcus even though his title says something else. An AI agent has no such informal network unless you give it one.
| Dimension | RBAC / Static permissions | Behavioral org context |
|---|---|---|
| What it captures | Who is allowed to act | Who actually acts, and how |
| Defined by | Admin configuration at setup time | Observed behavioral signals, continuously |
| Reflects re-orgs | Only if admin updates it | Automatically, via signals |
| Captures informal authority | No | Yes: trusted deputies, shadow experts |
| Works for agent-to-agent routing | Not designed for it | Yes, queryable at runtime |
| Scales with org complexity | Role explosion, thousands of fine-grained rules | Graph-based, scales with signal volume |
| Useful for compliance | Yes, authoritative for access control | Yes, and complementary to it |
To be clear: RBAC and static permissions are not wrong. They are the right tool for access control, compliance, and security boundaries, and they should stay. The problem is using them as a substitute for organizational context when routing decisions. They answer "is this permitted?" not "is this the right path?"
How AI accelerates the problem
Even if you have good org context today, autonomous agents introduce new failure modes that do not exist in the same form for human workers, or they exist in humans but operate at a scale and speed that makes them far more dangerous when AI is involved.
Agent drift. An agent's behavior shifts over time as it accumulates context, patterns, and implicit reinforcement from past interactions. A human employee who learns "always escalate billing disputes to Jordan" carries that forward; if Jordan leaves, a good manager corrects the mental model. An agent left unmonitored keeps routing to Jordan's old account, or to whoever the system now maps that label to. Drift in human organizations is a management problem. Drift in agent organizations is a reliability and governance problem that compounds at machine speed.
Context cross-contamination. In multi-tenant or multi-user agent deployments, context from one user's session or one team's workflow can bleed into another's. An agent that handled a sensitive restructuring query for one executive should not carry implicit priors about that team's authority structure into the next user's completely unrelated request. This is a well-known technical problem in shared-context systems, but it has an organizational dimension that is often missed: the contamination is not just data leaking, it is organizational assumptions leaking.
Stale authority maps. The most common failure. An agent was trained or configured when Sarah was the approver for vendor contracts. Sarah was promoted six months ago. The agent still routes to Sarah. Sarah's approval queue is a graveyard of stalled requests that no human would have let pile up because everyone informally knows to go to David now. AI does not know that unless the org context layer is continuously updated.
None of these are novel human problems. Organizations have dealt with outdated process documentation, siloed knowledge, misrouted escalations, and authority confusion for as long as organizations have existed. What AI does is accelerate all of it. A human making a wrong routing decision makes it once, gets corrected, adjusts. An agent making the same wrong routing decision makes it ten thousand times before anyone notices the pattern in the logs.
Why organizational behavior science matters for enterprise AI
The informal organization, meaning the trust networks, influence patterns, and real decision paths that sit beneath the formal org chart, has been documented and studied for decades. It does not appear in job descriptions or org charts. It emerges from repeated interaction, shared experience, and social trust. This is the core insight behind Organizational Network Analysis (ONA), a field with roots in the 1970s–80s. Organizations that ignore the informal network make systematically wrong decisions. AI systems that cannot see it make those wrong decisions at scale.
There is a body of knowledge that has spent fifty years studying exactly this problem: how authority flows, how trust forms, how information actually moves through groups, why some people become informal hubs regardless of their title, and what happens when organizations change faster than people's mental models of them. It goes by several names: organizational behavior, organizational network analysis, knowledge management, and change management. It produced rigorous methods for measuring and mapping the informal org. The three-tier model of organizational data (structural, transactional, behavioral) is one way to frame what these fields have contributed; The Layer Every Enterprise AI Platform Is Missing unpacks that framework and explains why Tier 3 is the layer every AI category keeps hitting the wall on.
That field exists for a reason. Every large organization has discovered, usually the hard way, that the formal structure and the operational reality are two different systems running in parallel. Decisions that optimize for the formal structure and ignore the informal one tend to fail. The org chart tells you the reporting lines. The behavioral graph tells you how things actually get done.
Enterprise AI is now rediscovering this lesson at scale. The difference is that the tools to observe and map organizational behavior have improved dramatically, and the downstream system that needs this context, the AI agent stack, has a much lower tolerance for ambiguity than a human workforce does. A human new hire takes three months to build an adequate mental model of how the org really works. An AI agent has no onboarding period. It acts immediately, on whatever context it has.
This is where the intersection of organizational science and AI engineering becomes practically important, not as an academic curiosity but as an infrastructure problem. People who have studied both how organizations actually behave and how AI systems are built are in a position to do something that neither discipline can do alone: teach enterprise AI to understand beyond documents. Not just what was written down, but how the organization actually moves.
The documents tell you what was decided. The behavioral graph tells you how decisions actually get made, who they go through, and whose judgment counts. Enterprise AI needs both.
ONA established that authority maps and influence patterns can be derived from observed communication behavior, not from org charts or job titles. BehaviorGraph applies that same logic to the agent stack: infer who actually owns and routes decisions by watching how work moves, then surface that as queryable context at runtime. Here is what that looks like in practice.
What org-level context contains
BehaviorGraph is a dynamic organizational knowledge and behavior graph: a continuously updated map of real routing behavior, trust relationships, authority patterns, and escalation paths across an enterprise. It operates on behavioral metadata only: collaboration patterns, calendar signals, approval sequences, response latency, workflow timestamps. No message content, no document text.
Two caveats worth stating clearly. First, behavioral metadata still carries organizational sensitivity; the metadata-only scope reduces content exposure, not governance requirements. Second, the graph is probabilistic: it infers operational reality from observed signals, not ground truth. Routing frequency is directional evidence, not certification of who should own a decision. What it offers is continuously updated organizational inference that is more useful to an agent than stale system metadata, and more honest than treating the org chart as current.
The signals it ingests, and what the graph learns from them:
Where org context fits in the stack
A harness is the always-on execution layer around a model. BehaviorGraph adds a different kind of context: organizational reality at runtime.
The relationship between BehaviorGraph and the harness is not competitive. The harness runs the agent. BehaviorGraph answers the questions the harness cannot answer on its own at the moment of action: who owns this decision in practice, who is trusted to approve it, what path is shortest and legitimate, when should the agent stop and defer.
Each layer still does its job. MCP standardizes access to tools and data sources. RAG retrieves content. The harness (everything around the model that is always on) manages execution. The model reasons. Org context tells the agent what to do with all of that inside a real organization.
The failure mode when org context is missing
With a clear picture of what each layer does, it becomes easy to see what goes wrong when the org context layer is absent.
Just 34% of organizations are truly reimagining their business with AI. Deloitte's findings suggest the main bottlenecks are less about raw model quality and more about scaling, governance, skills, and operational integration, including the challenge of connecting AI systems to how work actually moves through the company.
Enterprise AI failures rarely trace to a single layer. Models reason poorly on edge cases, tools fail, retrievals miss, evals are thin. But a recurring pattern in enterprise deployments is context-poor execution about the organization itself. The agent retrieved the right document, identified the formal owner from system metadata, routed the task to that person, and the task stalled. The actual approver was different in practice. The trusted deputy was not in any system. The owner had quietly changed roles. We mapped this failure pattern in detail, including the six specific forms it takes across enterprise deployments, in A Large Behavior Model for AI Governance and Agent Orchestration Platforms. If the routing failure is specifically about RAG pipelines returning the right document but the wrong person, Your AI Agents Route to the Wrong Person goes deeper on that layer.
How org context stays current
BehaviorGraph's org context layer stays current the same way any well-designed context layer does: two update modes, different signal sources.
"Rigorous and systematic evaluation is the most important part of the whole system. A failure to create robust evaluation systems is the common root cause of unsuccessful LLM products... Success with AI hinges on how fast you can iterate."
Behavioral signals from collaboration patterns, approval sequences, and workflow metadata arrive continuously. The graph is refreshed in batch to update authority maps, trust scores, and routing paths. The result is context that reflects how the organization actually works right now, not how the org chart said it worked six months ago, and not what a user typed into a CLAUDE.md file.
This is the distinction that makes org-level context different from user-level context. User context is explicit and authored. Org context is observed and inferred: it is the accumulated shape of how the organization actually behaves.
Add org-level context to your agent stack
Designed to integrate with agent stacks through REST, MCP, or retrieval enrichment. No message content or document bodies; metadata-only design reduces content exposure.
Talk to us →- Chase, H. (2026). Continual learning in agentic systems: model, harness, and context layers. X / Twitter thread. Harrison Chase is Co-Founder & CEO of LangChain, the company behind the LangChain and LangGraph agent frameworks.
- Schluntz, E. & Zhang, B. (2024, December). Building Effective Agents. Anthropic Engineering Blog. Schluntz and Zhang are Members of Technical Staff at Anthropic; Schluntz leads work on tool use and computer use for Claude.
- Karpathy, A. (2025, June 25). On context engineering. X / Twitter. x.com/karpathy. Karpathy is Founder of Eureka Labs; former founding team at OpenAI and former Director of AI at Tesla.
- Willison, S. (2025, June 27). Context Engineering. simonwillison.net. Willison is co-creator of the Django web framework and creator of Datasette.
- Lee, Y., Nair, R., Zhang, Q., Lee, K., Khattab, O., & Finn, C. (2026). Meta-Harness: End-to-End Optimization of Model Harnesses. arXiv:2603.28052. Omar Khattab is an incoming Assistant Professor at MIT and creator of DSPy; Chelsea Finn is Associate Professor at Stanford and co-founder of Physical Intelligence.
- Husain, H. (2024). Your AI Product Needs Evals. hamel.dev. Husain is an independent consultant at Parlance Labs specializing in LLM evaluation and operationalization.
- Deloitte. (2026). State of AI in the Enterprise. Survey conducted August–September 2025. Deloitte US.