The Agent Training Substrate

Pipeline #3 — The Synthetic Task-to-Action Factory

Pipeline #3 — The Synthetic Task-to-Action Factory

The meeting that is about to keep happening

Your team built the agent demo. It worked. It answered the expense question, drafted the purchase order, pulled the data from three systems, and showed leadership what the future looked like. The room was impressed. Somebody said the word "transformational." Budget was discussed.

Then came the follow-up meeting. The one where leadership asked the real questions.

Which jobs in our company can this agent actually do? Not demo — do, in production, across the full surface of the role, without breaking when the task varies. What are the tasks that make up each job, and how do we know the decomposition is right? For each task, which actions should the agent take autonomously, which need human approval, and which are advisory-only? When the task involves five upstream systems and three downstream approvals, where does the agent fit in the sequence, and what does it actually call? And how do we prove to governance — audit, security, compliance — that we have thought through all of this with rigor?

These are not modeling questions. They are not infrastructure questions. They are structural questions about how real work decomposes into real tasks and real actions, and how an agent fits into that decomposition responsibly.

Almost no team building agents in 2026 has good answers to these questions. The reason is simple: the structured training data that would answer them — the Job-to-Task graphs, the execution schemas, the MCP tool definitions, the automation scorecards — does not exist in the form enterprises need it. Every serious agent team is building this substrate from scratch, in every vertical, for every role, at enormous cost and with wildly uneven quality.

This is the agent training substrate problem. It is the bottleneck the agentic AI wave is about to crash into, and it is not going to be solved by better models. It is going to be solved by better data — specifically, by structured, governance-grade Job-to-Task graphs shipped as productized SKUs.

Pipeline #3 at XpertSystems.ai is that substrate.

Why the agent wave is structurally different from the model wave

The last decade of enterprise AI was a modeling story. Better architectures, better embeddings, better fine-tuning, better retrieval. The training data question was important but manageable — ship a tabular dataset, train a classifier, move on. Pipeline #1 solved that problem at industrial scale.

The agent wave is a different story. Agents do not classify. They act. And action inside an enterprise requires something the prior AI waves did not: a structural model of what the work actually is.

Consider the difference. Training a fraud-detection classifier requires a dataset of transactions with fraud labels. Training an AP-automation agent requires the structural decomposition of the entire accounts-payable role — the dozens of tasks that make up the job, the sequence of actions within each task, the upstream data dependencies, the downstream approval chains, the governance constraints on each action, the realistic failure modes, and the full surface of tools and systems the role touches in a given week.

The classifier needs data about transactions. The agent needs data about the shape of work. These are fundamentally different data shapes, and the second one is almost entirely unproductized.

This is why every serious agent team — whether inside a hyperscaler, an agent-platform startup, or an enterprise AI organization — is spending an enormous fraction of their engineering on building Job-to-Task decompositions internally. Deloitte's AI practice is doing it. LangChain's customers are doing it. Every F500 building internal agents is doing it. And every single one of them is doing it in isolation, from scratch, with uneven methodology and no shared ontology.

The agentic AI wave, in other words, is running on the same data problem that the modeling wave ran on a decade ago — before productized datasets existed. It is waiting for a factory.

What the Synthetic Task-to-Action Factory actually ships

Pipeline #3 is the most ambitious of the three pipelines, and structurally the most important. It is a catalog of Job-to-Task-to-Action SKUs: one-hundred-twenty-plus productized decompositions across twelve functional verticals, each shipped with the complete training and evaluation substrate an agent builder needs to train, evaluate, and govern an agent for that role.

Every Pipeline #3 SKU ships a six-file package. The package is designed to be drop-in for agent training and drop-in for governance review — because the same artifact has to satisfy both audiences, which is part of why the data shape has been so hard for internal teams to produce.

1. The job definition. One named role — AP clerk, recruiter, SRE, medical coder, compliance analyst — with a complete specification of scope, inputs, outputs, success criteria, and governance constraints. This is the document that grounds everything downstream. A job definition is not a job description. It is a structured specification that answers: what does this role own, what does it produce, what constitutes successful performance, and what constraints bind the autonomous execution of this role?

2. The task graph. A directed acyclic graph of tasks that compose the job. Tasks are structured with dependencies (task B cannot start until task A completes), branching logic (if invoice matches three-way, proceed; otherwise route to exception handling), failure modes (what breaks, how often, how the failure propagates), and frequency distributions (this task happens daily, this one weekly, this one monthly with a quarterly spike). A real AP clerk job decomposes into fifty to eighty tasks, not five. The graph reflects that reality.

3. The execution schema. For every task in the graph, a formal specification of input schema, output schema, preconditions, postconditions, and expected latency bands. This is the file that makes the task graph machine-actionable — an agent training system can compile against it, an evaluation harness can benchmark against it, and a governance review can audit against it. Execution schemas are written to a consistent JSON specification across every SKU, which means an agent platform consuming twenty Pipeline #3 SKUs does not have to write a different parser for each one.

4. The MCP tool definitions. For every action the role takes against an external system, a canonical Model Context Protocol tool specification — parameter types, auth scopes, expected side-effects, error taxonomy, rate-limit semantics. This is the file that lets agent builders wire the role's action surface into their runtime with minimal integration engineering. A recruiter SKU ships with MCP tool definitions for ATS actions, calendar actions, email actions, and CRM actions. The tool definitions are consistent with the emerging MCP ecosystem and designed to plug directly into Anthropic's MCP, LangChain's tool interface, and native agent runtime patterns.

5. The automation scorecard. Per-task automation-feasibility scoring — full-auto, human-in-the-loop, or advisory-only — with explicit rationale for each score. This is the governance artifact. It answers, for every task in the job, the question every audit committee will eventually ask: which of these tasks is the agent permitted to execute autonomously, which require human approval, and which should the agent only advise on? The scorecard is grounded to the governance constraints in the job definition and to the risk profile of each action.

6. The dependency map. Upstream data dependencies (what systems feed this role), downstream system touchpoints (what systems this role writes to), and governance escalation paths (which exceptions route to which human roles). This is the file that situates the agent inside the broader enterprise architecture, and it is the file that lets integration teams plan the deployment rather than discovering the dependencies in production.

Six files. Every SKU. Every role. Every vertical.

The twelve functional verticals

Pipeline #3 is organized by functional vertical rather than industry vertical, because the shape of work at the role level is far more consistent across industries than within them. A senior SRE at a bank and a senior SRE at a retailer share more structural DNA than a senior SRE and a senior AP clerk at the same bank. The taxonomy reflects that reality.

Operations. Ops analyst, supply-chain planner, vendor-ops lead, facilities manager. Ten target SKUs.

Finance and accounting. AP clerk, AR analyst, FP&A analyst, controller, tax analyst. Twelve target SKUs. This vertical is particularly important — it has the clearest ROI story, the clearest governance requirements, and the clearest path to measurable enterprise deployment.

Data and analytics. Analytics engineer, data engineer, BI analyst, ML engineer. Ten target SKUs. Roles here are both targets of agent automation and builders of the agents themselves, which creates interesting substrate dynamics.

Engineering. Backend software engineer, SRE, QA engineer, release manager, platform engineer. Twelve target SKUs. The engineering vertical has the highest density of tool-intensive roles and the most complex execution schemas.

HR and people. Recruiter, HRBP, compensation analyst, L&D specialist, onboarding operations. Ten target SKUs. HR is the vertical most actively courted by HR-tech platforms looking for substrate to embed into talent, reskilling, and compensation products.

Sales and marketing. SDR, AE, CSM, marketing operations, content operations. Twelve target SKUs. The revenue-facing vertical, with the highest density of customer-interacting tasks and the clearest integration surface to existing GTM infrastructure.

Healthcare operations. Medical coder, prior authorization specialist, claims analyst, scheduler. Ten target SKUs. High-governance vertical with explicit regulatory grounding in every automation scorecard.

Retail operations, manufacturing operations, legal operations, IT operations, and executive and strategy round out the catalog at eight-to-ten SKUs each. Each vertical has a defined ontology, a defined governance model, and a defined set of integration targets.

One-hundred-twenty target SKUs. One governance-grade ontology. One consistent six-file package. Every SKU plugs into the same agent training and evaluation stack.

Pricing — and the three-tier strategy

Pipeline #3 pricing reflects three distinct go-to-market motions, each priced to match a specific buyer and a specific deployment pattern.

The single-job SKU is the atomic purchase — an agent platform company buying an HR recruiter SKU to bootstrap their HR vertical, or an enterprise AI team piloting a single role before committing to a broader deployment. At $8K–$15K per SKU, the purchase decision is manager-level, the evaluation cycle is two-to-four weeks, and the procurement path is light.

The vertical bundle is where the economics get interesting. A buyer who needs to cover all of finance-and-accounting — AP, AR, FP&A, controller, tax — pays $60K–$112K for the entire vertical rather than $96K–$180K for the individual SKUs. The discount is structural, not negotiated: a vertical bundle ships with a shared ontology and cross-role dependency graph, which a stack of individual SKUs does not. Consultancies and HR-tech platforms are the dominant buyers at this tier, because they are productizing agent capability across a full job family and need the structural coherence of a bundle to do so.

The Enterprise Workforce Twin is the strategic product. One buyer, all twelve verticals, unified governance, quarterly refresh. The deal size is $100K–$250K. The buyer is always a F500 enterprise AI team that is building the structural model of their own workforce to ground their internal agent rollout. Workforce Twin deals are the anchor — they define the Pipeline #3 narrative, they generate the reference logos, and they establish the precedent for governance-grade decomposition at enterprise scale.

The reference-agent add-on is deliberate and constrained. It is a thin, open-source agent scaffold that consumes the SKU's six files and demonstrates the substrate in action. It is not a production agent. It is not maintained as a product. It exists as a sales-enablement artifact — proof that the substrate works, that the execution schemas compile, that the MCP tool definitions wire up cleanly. It is priced to cover the engineering effort and nothing more, because the product is the substrate, not the agent.

The three buyer profiles — and why the motion works across all three

Pipeline #3 is the only pipeline in the XpertSystems.ai platform with three distinct buyer profiles that each engage at different tiers. The economics work across all three because the underlying data shape is invariant — what changes is how the buyer integrates it.

AI agent platform companies. Teams building horizontal agent runtimes — LangChain, CrewAI, the emerging agent-platform category — need task-graph training data to bootstrap their vertical agent offerings. They typically buy at the vertical-bundle tier, because they need the full job family to ship a credible vertical agent suite. Typical deal: $60K–$112K per bundle, often repeated across three-to-five verticals in a twelve-month window.

Workforce-automation consultancies. Deloitte's AI practice, Accenture's agent teams, boutique automation consultancies. These firms resell task-graph decompositions to their enterprise clients, usually with white-label rights and custom overlay. They buy at the white-label vertical bundle or full Workforce Twin tier, with pricing at the high end of the published bands because white-label carries a premium. Typical deal: $100K–$250K, with annual renewal and expansion into additional verticals.

HR-tech and future-of-work platforms. Platforms building talent, reskilling, or compensation products need task graphs embedded into their data model to support role-level automation analysis, skill-to-task mapping, and compensation-to-complexity calibration. They buy at the vertical bundle tier, typically starting with the HR vertical and expanding to adjacent verticals as their platform feature surface grows. Typical deal: $60K–$112K, with strong renewal economics.

Enterprise AI teams at F500 companies. The flagship buyer. These teams are building internal agents against their own workforce, and they need governance-grade task graphs to satisfy audit, compliance, and risk review. They buy the full Enterprise Workforce Twin. Typical deal: $100K–$250K, with the deal structure typically involving a pilot vertical first (often finance-and-accounting), followed by expansion to the full twelve-vertical Workforce Twin over twelve-to-eighteen months.

The motion is horizontal across these four buyer types because the substrate is invariant. One HR-recruiter SKU sells to an agent platform, to a consultancy, to an HR-tech company, and to an enterprise AI team. This is the single most important economic property of Pipeline #3 — the data is the product, the product is reusable across the full stack of agent buyers, and the engineering investment compounds across every sale.

What we sell — and what we explicitly do not sell

The most important clarification in the Pipeline #3 positioning is what the product is not. We sell the substrate. We do not operate the agents.

Pipeline #3 ships Job-to-Task graphs, execution schemas, MCP tool definitions, and automation scorecards. These are the training and evaluation inputs for someone else's agent runtime. This is a data business — the same engineering discipline, the same validation rigor, and the same margin profile as Pipelines #1 and #2.

Building the agents that execute in a customer's Salesforce, NetSuite, Jira, or Workday instance is a different company. It requires a per-customer authentication surface, customer-specific evaluation suites, long copilot-mode ramps before autonomous execution is approved, and an operational liability profile that is fundamentally incompatible with a factory model. The moment your agent sends the wrong email or creates the wrong purchase order, you are on the hook in a way that a data business is not. That liability profile is a legitimate business — Decagon, Sierra, Harvey, and Cresta are all building serious companies in that space — but it is not the business that sits inside Pipeline #3.

Selling the substrate keeps our motion horizontal. One HR task-graph SKU sells to every agent platform and every HR-tech company in the market. Selling executing agents would force a vertical, one-customer-at-a-time motion that would dilute both the factory velocity and the margin profile that Pipeline #3 is built on.

This is a deliberate strategic boundary. We understand the higher ceiling of the executing-agent market — vertical agent SaaS clears $50K–$500K ARR per deployment, and Workforce Twin agent platforms clear seven figures — and we have modeled it carefully. The path we have chosen instead is the substrate path: ship the training and evaluation data that every agent in the market needs, to every team building them, without taking on the operational complexity that would compromise the factory. If we pursue executing agents, it will be as a deliberate, bounded exception in exactly one vertical — most likely AP and invoice verification, where the ERP-grade SKUs in our existing catalog create a natural adjacency — and it will be structured as a separate product line with a separate economics profile.

For now, the product is the substrate. The substrate is the product.

What this means for your 2026 agent strategy

If you are an AI leader, a CTO, or a head of automation planning against the agentic AI wave in 2026, there are three claims in this article that should shape your planning.

First, the agent wave will be bottlenecked on task-graph data, not on models. The models are good enough — Claude, GPT, and the next generation will be more than good enough for production agent deployment. The gap between demo agents and production agents is not in the model. It is in the structural substrate that tells the agent what the work is, how it decomposes, which actions are authorized, and how to fit into the governance fabric of the enterprise. Teams that underestimate this bottleneck will spend 2026 and 2027 rebuilding it from scratch, vertical by vertical, at enormous internal cost.

Second, buying the substrate is structurally cheaper than building it. A full vertical bundle at $60K–$112K replaces a six-to-nine month internal engineering effort that, at loaded cost, runs into the seven figures. An Enterprise Workforce Twin at $100K–$250K replaces a two-to-three year internal effort to build the structural model of your own workforce. The math favors the buy decision by an order of magnitude, and that math only gets more extreme as the number of verticals scales.

Third, the substrate is a long-term compounding asset. Unlike a model, which is replaced every generation, a well-built Job-to-Task graph is stable — the shape of accounts-payable work changes slowly, the structure of engineering roles changes slowly, the decomposition of a medical coder's job changes slowly. An organization that invests in the substrate early builds an asset that compounds across every subsequent agent rollout, every governance review, every workforce-planning exercise, and every automation-to-role mapping their strategy team will run for the next decade.

The Synthetic Task-to-Action Factory is our strategic bet on the agentic AI wave. It is the training substrate that the next generation of enterprise agents will be built on, and it is the data layer that will determine which enterprises ship agents credibly and which ones stall at the demo-to-production transition. We built Pipeline #3 to be that substrate — governance-grade, horizontally reusable, priced to compound — so that when your team reaches the moment where the demo is done and the real questions begin, the substrate is already on the shelf.

The three-pipeline portfolio

This article closes a three-part series covering the complete XpertSystems.ai Synthetic Data Platform.

Pipeline #1 — the Synthetic Data Factory — is the volume engine. One-hundred-plus tabular ML training SKUs across twelve verticals, priced $2.5K–$175K, shipped weekly. The factory you use when your next ML model is waiting on data.

Pipeline #2 — the Synthetic Knowledge Base Factory — is the enterprise anchor. Five-to-ten high-depth conversational and knowledge-retrieval SKUs, priced $150K–$250K, shipped quarterly. The substrate you use when your enterprise RAG system needs an evaluation corpus that survives the security review.

Pipeline #3 — the Synthetic Task-to-Action Factory — is the strategic bet. One-hundred-twenty-plus Job-to-Task-to-Action SKUs across twelve functional verticals, priced $8K–$250K+, shipped monthly. The substrate you use when your team stops building demos and starts shipping agents into production.

Three pipelines. One platform. One consistent engineering discipline, one consistent validation rigor, one consistent brand of institutional-grade synthetic data — shipped as productized SKUs, at catalog velocity, across every data shape the modern AI stack requires.

XpertSystems.ai ships the Synthetic Data Platform — three pipelines of institutional-grade synthetic data across twelve verticals. To explore Pipeline #3, discuss a specific agent use case, or review a Workforce Twin architecture for your enterprise, reach us at pradeep@xpertsystems.ai or visit xpertsystems.ai.

Explore 432+ Synthetic Datasets

Browse our complete catalog of production-ready datasets across 14 industry verticals.

View Data Catalog →