Overview
This page follows the live model from role construction to final forecast. It starts with how the role is built, then explains how task pressure is scored, and ends with how those signals become the structural-state read shown on the page.
At a high level, the model works in eight steps.
- Resolve the occupation. We map your role to an O*NET occupation and load the default role data.
- Build the role. We pull your tasks (from O*NET data and public job postings), functions, task-to-function links, dependency edges, and allow you to choose a role variant baseline.
- Apply user edits. You can add or remove tasks, change the variant, edit function anchors, and answer a questionnaire to most accurately inform your role makeup.
- Resolve task evidence. The model decides how much direct evidence exists for each task and how much it should trust that evidence.
- Score pressure and retention. It scores task difficulty, direct pressure, dependency spillover, retained leverage, and a timing frontier for current, next, and distant scenarios.
- Build the trajectory. It converts those diagnostics into execution compression
P(s), continuous compression growthP(t), a rate-of-change readdP/dt, continuous demand responseD(t), structural necessityS, role viabilityL(t), a graph-ready timeline export, and threshold timing ranges. - Build the structural state layer. It reuses the same scored role to estimate dimensionality, bottleneck fragility, retained-core lift, demand offset, firm incentive to finish automation, and hierarchy persistence, then turns those into a continuous role-integrity timeline, a scenario band, state runs, and structural checkpoints at years
0,2, and5. - Classify the role. It combines those task and function signals into:
- a structural-state read (
retained,complemented,compressed,rebundled, anddisplaced) and the trajectory for those states - threshold timing ranges
- per-function trajectory contribution groups
- an accession map of shrinking vs. growing work bundles
- a transition-trigger diagram
- a separate recomposition estimate
- a structural-state read (
The input tuple is roughly:
In shorthand: tasks describe how the work gets done, functions describe what the role exists to own, and the graph describes how pressure moves between tasks.
Key terms
- Task: a concrete unit of work, such as drafting a client memo, reconciling accounts, triaging a support request, or reviewing a contract clause.
- Function: the human responsibility the role exists to own, such as client stewardship, compliance judgment, technical direction, or issue resolution.
- Role variant: a reviewed starting baseline for occupations that cover different work mixes under one title (i.e., audit vs. financial reporting accountant).
- Prior: the model's starting estimate before it sees strong task-specific evidence.
- Shrinkage: the rule that blends a noisy observation toward a broader default (to avoid trusting thin evidence too strongly).
- Spillover: pressure on a task because connected tasks become cheaper, smaller, or less necessary.
- Recomposition: the separate question of how much exposed work is likely to turn into workflow redesign or fewer labor hours.
The model is intentionally layered to reflect different competing states. A task can be exposed while the function survives, while a function can survive while the role still compresses.
Data sources by model stage
Stages 1 through 4 are part of the core scoring path, while stages 5 and 6 are outside checks: they help test whether that scoring story looks plausible, but they do not directly score tasks inside the main engine.
1. Occupation matching
- Feeds: O*NET occupation taxonomy, aliases, and work descriptors.
- Primary source: O*NET Database v30.1.
- Why it is used: it is the public backbone that tells the model what occupation the user most closely matches.
2. Role builder
- Feeds: baseline task inventory, reviewed task additions, dependency edges, function anchors, task-to-function links, and reviewed role variants.
- Primary sources: O*NET tasks, reviewed job-description expansions, reviewed role-graph expansions, reviewed function-accountability priors, and reviewed role variants.
- Why they are used: O*NET gives the starting structure, but the reviewed layers repair places where O*NET is too lacking specificity or plausible empirical coverage.
3. Evidence hierarchy
- Feeds: task-level AI evidence, reviewed task estimates, benchmark task labels, task-family defaults, and occupation-level defaults.
- Primary sources: the Anthropic Economic Index (2026-03-24 release, the main task-level empirical source, normalized from Claude.ai and first-party API task telemetry with O*NET task classification, with the 2026-01-15 release retained as backfill where the later release does not map cleanly) and GPTs are GPTs (Eloundou et al., 2023), both of which cover the 61-occupation launch library; reviewed task estimates; task-family prior tables; and occupation prior tables.
- Why they are used: the model should use the most specific evidence it has for the exact task row, but it still needs carefully bounded defaults when direct evidence is thin or ambiguous.
- What they do not do: benchmark task labels and defaults remain fallback structure in place of empirical evidence.
4. Runtime labor and adoption context
- Feeds: outer demand, labor tightness, adoption realization, and recomposition context.
- Primary sources: BLS OEWS (May 2024, employment counts and median wages), BLS Occupational Projections (2024-2034, projected employment growth and openings), BLS CPS (monthly unemployment series by occupation group), and Census BTOS-derived occupation context built through ACS PUMS sector mix.
- Why they are used: they help answer how quickly technical pressure is likely to turn into workflow change and labor-market change.
- What they do not do: they do not directly tell the model whether a task is automatable.
5. External calibration checks
- Feeds: accountability checks, fragmentation checks, and adoption realism checks.
- Primary sources: BLS ORS (2025 preliminary with 2023 backstop), ACS PUMS (2024 via Census API), and Census BTOS.
- Why they are used: they test whether the model's structural story looks plausible against outside data. ORS pressure-tests accountability and autonomy guardrails, while ACS checks whether the model is treating an occupation as too uniform or too fragmented relative to observed wage, education, industry, and worker-mix spread. BTOS helps test whether the model is turning technical pressure into organizational change too aggressively or too mildly. The calibration stack also uses observed individual AI usage as a review-only tempering signal when sector BTOS adoption is clearly outrunning worker-level uptake in an occupation, so journalism-style sector overhang does not become a recomposition miss. In that review layer,
individual_highergaps are treated as more actionable than review-flaggedorg_highergaps, because the latter more often reflect enterprise rollout outrunning worker-level usage. The same review-routing logic also downweights known wage-leverage floor/ceiling cases when the report's weak wage proxy is outrunning the model's structural bargaining floor or ceiling.
External role heterogeneity check
One outside check asks whether the model's role fragmentation risk is plausible given how much an occupation actually varies across employers. The check uses a heterogeneity index built from ACS PUMS 2024 microdata, then computes a target fragmentation pressure and compares it to the model's runtime output.
Heterogeneity index
For each occupation with sufficient ACS PUMS coverage, a composite heterogeneity index is derived from five observed dispersion measures:
- Wage dispersion percentile: how spread out earnings are within the occupation, relative to the full distribution.
- Education dispersion: share-weighted spread across education attainment levels.
- Industry dispersion: how evenly workers are spread across industries (a high value means the role is common in many sectors).
- Worker-mix dispersion: spread across age bands, sex mix, and class-of-worker categories.
The ACS pipeline first tries an exact SOCP query, then a grouped fallback, and in rare cases a small reviewed code override where ACS uses an alternate code for the same occupation family. Only when those paths still produce no usable employment rows does the heterogeneity layer fall back to a proxy estimate derived from O*NET SOC definition breadth, BLS industry distribution data, and cross-occupation wage dispersion evidence. Proxy rows are flagged with acs_confidence: 0.45 (versus 0.85–0.95 for ACS-observed rows) and document the evidence basis in their source notes. Lawyers is the one occupation where the ACS/BTOS external mapping does not resolve cleanly: the audit layer keeps its ORS-backed guardrail check, but treats the missing ACS heterogeneity and BTOS sector mix as a documented external-source exception rather than forcing an artificial join.
Target formula
The heterogeneity index is combined with the occupation's people-intensity share (from the adaptation layer) to produce a calibration target for role fragmentation pressure:
Where H is the heterogeneity index and ppeople is the people-intensity share. The people-intensity term conditions the signal downward for roles where human interaction is the primary work mode: those roles are less likely to fragment into separable AI-and-human sub-roles even if within-occupation wage spread is high. The target range runs roughly 0.15–0.45 across the occupation set.
What the check flags
The check computes the absolute gap between the model's role_fragmentation_risk and this target, then grades each gap against the check's confidence. A gap is flagged medium at ≥0.15 × confidence and high at ≥0.25 × confidence. Flagged occupations are reviewed at the role_shape_heterogeneity layer, especially where the model may be treating a role as more uniform or more fragmented than observed within-occupation variation supports.
6. Benchmark comparison layer
- Feeds: directional comparison against prior AI exposure literature.
- Primary sources: AIOE (AI Occupational Exposure, 2023), Webb AI Exposure (2020, NBER WP 24762), Brynjolfsson, Mitchell & Rock (SML, AEA P&P 2018), and GPTs are GPTs (Eloundou et al., 2023).
- Why they are used: they are useful audit references when checking whether the runtime ranking looks directionally strange.
- What they do not do: they do not determine the final role-fate label on their own.
How the role builder works
Each task family starts with a blend of two share estimates: the cluster's structural share prior and the task inventory's observed share. The model normalizes these into a starting composition:
That baseline is then filtered through your composition edits. Removing tasks takes them out of the active run; adding tasks puts them in. The model renormalizes both the task shares and the function weights after edits so the active composition always sums to one:
So, plainly, the baseline is just a starting point. The user's edits decide what stays in the active run. Removing a task removes it from scoring, adding a task inserts it, changing the variant swaps in a different reviewed starting bundle when the occupation supports that split. Adding a support link tells the model that one task partly exists because another task exists, while adding a task-to-function link tells the model that a task contributes to a more central function, which can increase that task's effective centrality and bargaining weight.
This stage is where the model enforces its first scientific discipline: represent the role before trying to predict the role. If the task mix or function map is wrong, no later evidence layer can rescue the output.
How the evidence hierarchy works
For each task, it decides two things: what the best available evidence source is, and how much the model should trust that source.
Evidence order
- Live task evidence: direct task-level evidence from the Anthropic Economic Index stack.
- Reviewed task estimate: a reviewed judgment about the exact task row in the exact occupation.
- Benchmark task label: a mapped task label from external literature that is useful but usually broader than the row being scored.
- Cluster prior: a family-level default for tasks like this.
- Occupation prior: the broadest fallback when task and cluster evidence are both weak.
The model prefers the strongest claim about the exact row it is scoring; a reviewed estimate can outrank a broader benchmark label for that reason. A task-family default is still useful, but it is a fallback, not proof about the exact task.
Implementing shrinkage
Direct evidence is not always strong enough to trust at face value. The model therefore shrinks thin evidence toward a broader prior:
λ is the reliability weight. When reliability is high, the task-level evidence carries most of the score. When reliability is low, the model falls back toward the broader default (not pretending to know more than it does).
Task evidence reliability
For Anthropic task evidence, the model weights stated confidence by observation volume. Internal stub sources (placeholder rows with no real observation) receive zero reliability:
A task with 40 observations gets roughly half-weight from its own evidence; a task with 400 observations gets roughly 91% weight. For reviewed task estimates, benchmark task labels, and fallback rows, the model keeps reliability and evidence weight separate: reliability comes from stated confidence plus a source-role multiplier (0.88 for reviewed estimates, 0.76 for benchmark labels, 0.42 for task-family proxies, 0.25 for occupation-level fallbacks) and a source-specific eligibility adjustment, while evidenceWeight enters once later when the runtime builds the weighted task-level consensus.
Task evidence blend gate
Resolved task evidence only affects task difficulty or task pressure when reliability clears a threshold of 0.20. Above that threshold the evidence blend weight ramps linearly up to a cap of 0.85:
If a task already clears the stronger task-first baseline gate, the runtime reduces the remaining blend weight by the amount already consumed in that baseline promotion instead of applying the same evidence twice.
Task evidence signal weights
The model computes two weighted signal composites from each task's resolved evidence. The ease signal feeds the difficulty formula; the direct evidence signal feeds the direct-pressure formula:
When more than one high-specificity task-level source is available, the runtime resolves a weighted task-level consensus using source reliability, evidence weight, and source-role multipliers before applying the blend:
The runtime only averages over the highest-specificity task-level tiers: Anthropic evidence, reviewed task estimates, and benchmark task labels. Task-family defaults and occupation-level fallbacks remain visible as fallback metadata, but they do not become part of the task-evidence blend itself.
Cluster prior reliability
Cluster prior reliability starts from stated evidence confidence but penalizes sources that rely on internal stubs rather than real observations:
From evidence to empirical resistance
Some types of work have historically resisted automation more than others, regardless of what AI can do today. The model captures that as empirical resistance: a score derived from the cluster prior layer that reflects how much a task family has structurally resisted automation so far. Higher resistance means the work has more built-in friction against organizational absorption. The prior values used — pipartial and pihigh — are exposed in the cluster result output as prior_partial_automation_likelihood and prior_high_automation_likelihood, alongside prior_reliability, so the prior layer is auditable for any specific occupation.
When cluster-level task evidence has enough coverage and reliability, the model can shift the empirical ease estimate toward the task-first cluster average instead of relying only on the prior layer. The task-first cluster weight activates when coverage exceeds 0.35 and mean reliability exceeds 0.30, up to a maximum weight of 0.90:
Here ĉi and r̂i are the normalized coverage and reliability after subtracting their thresholds.
Human advantage
Each task family carries a static human advantage score hi that reflects how much the work type inherently depends on human presence, relationships, or embodied judgment. The current values are:
- Client interaction: 1.0; Relationship management: 1.0
- Oversight/strategy: 0.85; Coordination: 0.65
- Decision support: 0.55; QA/review: 0.45
- Research/synthesis: 0.35; Analysis: 0.30
- Documentation: 0.20; Drafting: 0.15
- Workflow admin: 0.12; Execution/routine: 0.05
The engine applies hi directly inside the difficulty formula. Human advantage also interacts with function retention in the task-graph adjustment step.
Task-first baseline promotion
When an individual task has strong enough direct evidence, it can start from its own task-specific baseline rather than inheriting only the task-family seed. The task-first weight is source-aware: live task evidence can clear the threshold at lower reliability (0.38) than reviewed estimates (0.42), while benchmark labels face a stricter threshold (0.55) and a lower maximum weight (0.65 vs 1.00). Task mapping confidence also damps the weight so ambiguous mappings do not over-promote:
Here wttf is the task-first task weight and Dc(t) is the cluster seed difficulty. If the task does not clear the promotion threshold, wttf = 0 and the task inherits the cluster baseline directly.
How your answers enter the model
The questionnaire does not directly choose the final label. It adjusts the conditions around the work. The model uses those answers to estimate how much of the role still needs human ownership, how decomposable the work is, how much trust or sign-off remains, how strong the workflow bottlenecks are, and how readily the surrounding organization can absorb AI into the workflow.
The hierarchy input mainly works through the s term. Higher hierarchy levels tell the model that your version of the occupation carries more sign-off, coordination, stakeholder alignment, and retained human accountability. Lower levels push the same occupation toward more decomposable execution.
The live model now also adds a narrow hierarchy persistence effect on top of that profile. This is not a blanket reduction in AI exposure for senior titles. It only adds protection when higher hierarchy is paired with real retained ownership signals such as accountability, decision authority, and coordination centrality. In other words, hierarchy helps the model treat senior seats as slower to dissolve once execution compresses, but it does not make their tasks inherently less exposed. A related shared calibration pass also adds a smaller people-and-authority lift inside retained accountability itself, so manager-like seats can hold together more credibly without turning every high-judgment profession into an automatic safe case.
Composite signals
The model converts your responses into a structured profile, then derives four composite signals that feed all later scoring stages:
Friction dimensions
The model also derives five friction dimensions from your profile. These become the user-side input to the cluster friction blending in the difficulty formula:
The friction weights used in the difficulty formula are: tacit 0.28, judgment 0.26, accountability 0.18, exception 0.15, inverse document intensity 0.13.
Which answers matter most
The 14 questionnaire inputs are not equally independent. Sign-off frequency and exception/context load each appear in the derivation of most profile factors, so they carry more weight through to the final score than any other single question. Task independence (decomposability) and organizational adoption readiness are the next most influential. This is not a flaw — those inputs genuinely covary with the others in real roles — but it means answering those four questions carefully matters more than fine-tuning the rest. If your role is atypical for its cluster, concentrate your attention there.
Friction cluster blending
The model blends your questionnaire-derived friction dimensions with a cluster-profile default for the task family being scored. The cluster profile anchors the result so that a single misread question cannot flip the difficulty score. The user weight w adapts based on how consistently your answers deviate from neutral across all five friction dimensions:
At w = 0.30 a fully extreme answer shifts the cluster default by at most +/- 0.15. At w = 0.55 — reached only when all five friction dimensions are consistently high or consistently low — the user can shift the default by up to +/- 0.275. This prevents outlier answers from dominating while still letting someone describe a role that is structurally atypical for its cluster.
Function-layer blending
When the occupation has a reviewed function graph, the model blends function-level accountability signals into the questionnaire-derived values before scoring. This anchors the signals to the occupation's actual authority and delegation structure rather than relying on user responses alone:
The same blending applies to three friction dimensions: accountability_load, judgment_requirement, and tacit_context_dependence each blend at 0.75 questionnaire / 0.25 function-graph. Because every occupation covered by the model ships with a reviewed multi-anchor default function graph, this structural blend is part of the score for the full library rather than only for a small reviewed subset.
How task difficulty is scored
Task difficulty is the model's estimate of how hard it is for an organization to absorb a task into AI-assisted workflow. It is not the same thing as raw model capability. A task can be technically reachable and still be difficult to hand off because of judgment, tacit context, accountability, or workflow coupling.
Cluster friction profiles
Each task family has a baseline friction profile across five dimensions: exception burden, accountability load, judgment requirement, document intensity, and tacit context dependence. Those baselines are then adjusted by your questionnaire responses using a delta-from-neutral blend:
The cluster's structural friction profile is the anchor. Your answers can shift each dimension by up to ±0.15 from the baseline. An answer at the midpoint changes nothing; extremes pull the friction up or down.
Friction weights
The five friction dimensions are combined into a single intrinsic friction score using empirically grounded weights:
The weights are calibrated against external labor market evidence. Dallas Fed research (February 2026) found that wages rise specifically in AI-exposed occupations that require tacit knowledge and experience, while employment falls where those qualities are absent. OECD job-posting analysis (AI-WIPS, November 2024) found that originality, which maps closely to judgment requirement, saw the largest skill demand increase in high-AI-exposure occupations. Together these point to tacit context and judgment as the friction dimensions that actually protect roles in practice. The current weights reflect that: tacit (0.28) and judgment (0.26) together outweigh accountability (0.18).
Difficulty base formula
The cluster-level difficulty score combines four components with the following weights:
The AUTOMATION_DIFFICULTY_WEIGHTS are: intrinsic friction 0.40, human advantage 0.25, empirical resistance 0.25, coupling protection 0.10.
Task-graph adjustments
After the base difficulty, the model adjusts for task-graph signals. Tasks with high bargaining power or value centrality resist automation, while high AI-support observability and substitution risk lower difficulty:
Here Bi is bargaining-power weight, Ci is core-task share, Vi is value centrality, and Ai is AI-support observability.
Criticality boost
Task difficulty blend
That cluster-family difficulty is still the baseline. The model then projects it onto each active task row and lets reliable resolved task evidence alter the task's own difficulty:
In plain language, each task starts from its family-level difficulty, then reliable task evidence can push it easier or harder. Lower difficulty means the task is easier to absorb sooner. Higher difficulty means the task keeps more friction.
Timing frontier
After difficulty is scored, the runtime derives a task-derived cluster timing frontier and asks when that bundle clears an explicit hurdle. This timing layer helps determine when different parts of the role begin moving and later feeds the threshold timing ranges shown in the five-year read.
Each cluster gets four timing components:
- Capability readiness: difficulty complement, direct pressure, evidence confidence, evidence coverage, and the questionnaire capability signal.
- Supervision readiness: capability plus decomposability, observability, and low exception/accountability burden.
- Economic pressure: role share, wage context, direct pressure, economic-pressure context, and the complement of demand expansion.
- Organizational friction: accountability, judgment, tacit context, dependency pressure, retained leverage, function retention, and sign-off load.
The cluster timing base is:
Where C is capability readiness, S is supervision readiness, E is economic pressure, and Forg is organizational friction.
The outer recomposition layer then turns occupation context into scenario activation levels. That layer uses effective adoption pressure, workflow compression, organizational conversion, AI adoption context, adoption-realization context, and the derived recomposition-context fields next_scenario_lift, distant_scenario_lift, and organizational_adoption_ceiling.
The earliest scenario with a nonnegative margin is the first point where that bundle becomes active in the timing frontier:
Current, next, and distant are internal scenario checkpoints. They mark the earliest stage where a bundle clears the timing frontier, rather than a specific calendar promise or a generic difficulty bucket. The live timing score underneath them is continuous rather than just a categorical label. It blends assist, delegate, compress, and structural-break readiness with scenario-activation lift, which is why two occupations can share the same broad checkpoint label while still landing at different timing strengths.
Adoption realization and absorption rate
Within each task family, the model estimates an absorption rate: how much of that family's share is likely to be absorbed into AI-assisted workflows. This is not binary; it captures the realistic pace of organizational uptake.
Adoption realization is a rate multiplier, not a probability. The min(1, ...) cap keeps it bounded at 1.0, with the current coefficients only reaching that ceiling at the maximum adoption-pressure setting.
The floor is 0.25 (not 0.45) so that high-friction clusters with strong difficulty scores can realistically show minimal absorption. The absorbed share feeds into the recomposition layer's workflow compression estimate via a soft-weighted sum described below.
How pressure moves through the role
The model distinguishes between direct pressure and spillover. Direct pressure means AI can reach the task itself. Spillover means the task becomes less necessary because connected tasks are changing.
Task share derivation
Each task's share of the role is derived from its cluster share and its within-cluster time-share prior, then renormalized across the active task set:
Your role's task and function structure matters as much as your task or occupation-level exposure. Each task carries a share of your role, contributes to one or more human-owned functions, and can support or depend on other tasks in the graph. A task that is small, peripheral, or weakly linked can be heavily exposed without decomposing the role. A task that carries a larger share, supports a central function, or contributes to many connected tasks matters much more.
Task direct pressure
The baseline direct pressure for each task starts from three core signals: the complement of difficulty (how easy the task is to absorb), the cluster's absorption rate, and the task's AI-support observability. The runtime then adds narrow structural lifts when the active role mix makes routine workflow-admin, documentation, or clerical execution work especially reachable. A later calibration pass also lets part of high-pressure drafting, documentation, and research-synthesis work count toward that same structural pressure path in lower-people, knowledge-heavy roles, so content-heavy occupations do not read as "not clerical, therefore structurally untouched." Reliable task evidence can then blend into the final score, but the runtime also dampens that evidence weight in those same routine and clerical contexts so proxy-heavy office work does not look more precise than it is:
Here αc(t) is the cluster absorption rate, Rt is the routine-reachability lift, Ut is the administrative-routine lift, and Ct is the clerical-execution lift. The effective direct-evidence weight wtevidence is already discounted by routine, administrative, and clerical evidence dampening before the final blend is applied.
Spillover
This is why the dependency graph exists. Support work often survives the capability test but fails the dependency test. If upstream work compresses, support tasks can still lose value:
The term says: spillover rises when a task depends on another task that is under direct pressure and central to the role's value.
The dependency graph is not a full dense cross-product. Direct reviewed/manual edges stay explicit, but the seeded proxy layer is capped: for any cluster pairing the builder uses at most two anchor tasks, prefers a mixed authored-plus-seeded anchor set when both exist, and skips generic proxy links between two authored tasks. That keeps newly reviewed tasks from generating broad synthetic spillover loops just because they share a cluster.
Retained leverage
After pressure and spillover are scored, the model asks how much meaningful human work is still left to hold the role together:
The model captures a concentration effect in the remaining human work. When routine or lower-value execution tasks decompose, the human worker may spend a larger share of time on tasks that require judgment, ownership, coordination, or escalation. The remaining work can become more central or valuable because the role is no longer spread across as many routine tasks.
Cluster summaries
After the task table is scored, the model aggregates those task rows back into cluster summaries. The exposed-cluster and retained-cluster views are derived from the scored task table rather than inherited from a separate pre-task stack:
The browser does not have to expose those summaries under raw cluster names. After aggregation, the runtime also synthesizes a task-derived public bundle label from the highest-share task text plus linked function anchors, so user-facing readouts can say things like software development or documentation authoring while still preserving the structural cluster id under the hood.
Residual role integrity
The role-level residual integrity score combines retained leverage, retained core share, low spillover, and low dependency penalty:
This is the stage where the model becomes a role-transformation model instead of a pure exposure model. It is no longer only asking what AI can do. It is asking what pressure does to the structure of the role. A role with many partially retained tasks can stay coherent even when some work is exposed. A role built around one narrow core activity can collapse much faster once that activity is absorbed or weakened. That is why the model tracks role integrity separately from raw exposure.
How the forecast is built
The runtime produces three main output layers, but they are not all equally primary.
- Task and function diagnostics: direct pressure, spillover, retained leverage, exposed core share, retained core share, retained accountability strength, retained bargaining power, and role fragmentation risk.
- Canonical trajectory layer: execution compression
P(t), demand responseD(t), structural necessityS, viabilityL(t), and the transformed-share timeline used by the structural-state layer. - Structural-state layer: dimensionality, bottleneck fragility, retained-core lift, demand offset, firm incentive, hierarchy persistence, a continuous role-integrity timeline, tipping-point logic, curve-family logic, and the normalized state shares used by the forecast.
Retained function strength
Retained function strength combines the complement of function exposure pressure, the function-level delegability guardrail, and weighted bargaining:
Here pfnexposure is the weighted function exposure pressure (0.78 direct + 0.22 indirect), G is the delegability guardrail, and B̄ is the weighted bargaining score across tasks.
Retained accountability strength
That formula does more of the human-guardrail work than older trust-heavy versions did, so reviewed support anchors can be softened when calibration shows the graph is overstating real sign-off ownership. In practice that means some service-sales, billing, lending-intake, executive-support, bookkeeping, and statistical-support anchors can keep coordination or quality-control value without automatically reading as strong formal authority or analyst-owned sign-off.
Retained bargaining power
Bargaining power is not just an average of task bargaining weights. It leans more on pressure-adjusted retained task leverage, then blends in function-level bargaining retention, guardrails, and a specialization lift from the adaptation layer. Routine or support-heavy work already under high pressure pulls bargaining down:
Here σ is the specialization context, a blend of knowledge share (0.42), learning intensity (0.33), and adaptive capacity (0.25). The specialization lift means high-knowledge, high-learning roles can retain leverage even when some execution work is exposed. The max(0, σ - 0.72) term adds an extra lift for roles above the 72nd percentile of specialization.
Empirical function-context blending
The runtime adds an outer empirical function-context layer on top of those formulas. ORS structural guardrails feed an accountability context, labor/quality/adaptation signals feed a bargaining context, and ACS heterogeneity plus adaptation structure feed a fragmentation context (see Calibration-only role heterogeneity check above for how the heterogeneity index is constructed). These inputs do not replace the reviewed function map. They act as confidence-weighted constraints:
The accountability and bargaining weights scale with channel confidence as w = 0.10 + (confidence x 0.18). Fragmentation uses a slightly lighter blend, wf = 0.08 + (confidence x 0.18), so the ACS outer context constrains the authored fragmentation read without overwhelming the task-and-function graph. In the accountability channel, ORS dominates the outer blend when ORS exists: the accountability backstop is 0.82 ORS plus 0.18 quality, and that quality backstop itself is intentionally narrow (autonomy, social interaction, and working-environment structure only). That keeps the context layer useful as a guardrail without letting broad labor-security or learning-opportunity proxies inflate sign-off ownership for support/admin roles without a clearer basis. When no context is available, the model uses the authored values directly.
Runtime demand and adoption context
The trajectory and structural-state layers both use runtime demand and adoption signals. When pre-computed runtime context rows are available, those override the defaults. The fallback formulas are:
The demand expansion signal combines adaptive capacity (0.22), transferability (0.16), learning intensity (0.12), demand expansion context (0.34), and labor tightness (0.16).
Structural-state layer inputs
The state layer is built on top of the shared task/function scorer rather than on the older fate labels. It estimates five structural signals first.
Here the d terms are normalized effective counts from task breadth, cluster breadth, function breadth, and retained-task breadth. Roles with more complementary anchors score higher; roles dominated by one task or cluster are penalized.
This is the mechanism for the “AI strips lower-value execution first” read. It goes up when routine work is large, next-step compression is real, and the retained human core is still strong enough to concentrate on.
Here ε is the trajectory layer's demand-expansion score, and bdemand is the user-facing demand-assumption slider on the range [-1, 1].
The investment slider binvest is also continuous on the range [-1, 1]. It lets the user stress-test how aggressively firms would pursue the last mile of automation once a bottleneck is reachable.
State timeline math
The engine then turns the trajectory layer's transformed-share timeline into a new time series for role integrity. That internal signal still helps drive the state checkpoints, the five-state support chart, and the broader state classifier. In the live calibration, role integrity now decays more sharply once transformed share rises and bottleneck risk compounds, so the model is less willing to leave medium- and high-pressure roles parked in a vague middle state.
Here r(t) = t/10 is the adoption ramp over the 10-year horizon, badopt is the adoption-speed control, and bstay is the role-staying-power control.
The role-integrity band is still computed by rerunning the same point calculation against the lower and upper transformed-share trajectories from the canonical trajectory layer. It is therefore a compression-growth band, not a full probabilistic uncertainty interval, even though the live page no longer shows it as a dedicated support chart.
From continuous state metrics to the top forecast chart
The engine does not emit the main hero chart directly. It emits the continuous timeline above. The client then converts each yearly point into a five-state forecast across retained, complemented, compressed, rebundled, and displaced, and compresses that into three outcome bands for the hero chart: what stays mostly intact, what changes but still points toward a surviving seat, and what reads as downside pressure. The full five-state mix remains visible in the smaller stacked support chart. In the current runtime, the displaced share in that client layer is also gated by transition pressure and by the engine's own current state, so retained or complemented roles do not inherit large early displaced share from low integrity alone. That means the chart's downside band is broader than the engine's formal displacement_plausible tipping point: downside pressure can rise through compression and displaced-share weight before the engine decides true seat-level displacement has become structurally plausible.
In the time model underneath, adoption speed and task exposure growth play different roles. Adoption speed enters task readiness and shifts the main logistic midpoint earlier or later as organizations operationalize AI faster or slower. Task exposure growth then governs a broader capability-driven process: it still changes the steepness of already-moving exposure curves, but it also controls a frontier-unlock term that lets moderately hard tasks become more exposed later as model capability expands. Both assumptions affect P(t), but they do so through different mechanisms.
The feature vector x(t) contains role integrity, structural support, transformed share, transition pressure, demand offset, bottleneck risk, and firm incentive. The weights live in a named STATE_FORECAST_WEIGHTS table in app.js, and the engine checkpoint state adds a moderate dominant-state boost so the chart stays directionally aligned with the engine's discrete state read without drowning out the continuous signals. The boost is set low enough that the continuous inputs (integrity, compression, demand, bottleneck) visibly compete in the chart, making sliders and questionnaire answers meaningfully reflected in the output. In that simplified top chart, demand_expanding folds into complemented, and rebalanced folds into rebundled. In the engine classifier underneath, rebalanced is intentionally narrower than before: it is meant for roles that still survive through a changed core, not as a catch-all resting place for any role that is neither plainly retained nor fully displaced.
Why recomposition is separate
Recomposition is a narrower read on how much exposed work looks compressible and how likely that compression is to turn into real workflow redesign or fewer labor hours. A role can have meaningful recomposition pressure while still keeping a strong human-owned function, which is why the model keeps that layer separate from the final state label.
Workflow compression
The recomposition layer uses a soft-weighted sum of absorbed shares rather than a hard filter on the current-wave label. Each cluster's contribution is scaled by a linear ramp over a ±0.10 window around its current scenario margin, eliminating the cliff at the current/next boundary:
A cluster solidly in the current scenario (Mi(current) ≥ 0.10) contributes its full absorbed share, while a cluster solidly in later scenarios (Mi(current) ≤ -0.10) contributes nothing. Clusters near the boundary get a weight between 0 and 1 proportional to how far their margin clears the hurdle. That weight is used only in this aggregation.
The context blend ensures that compression is not purely runtime-derived. The occupation-level workflow compression context, built from adaptation structure and routine share, constrains the estimate from outside the task graph. Compression does not necessarily mean the role disappears. It often means the same worker can spend less time on routine execution and more time on the smaller set of tasks that still require human ownership. That is why the model treats compression separately from full role breakdown: fewer tasks can mean a weaker role, but it can also mean a more concentrated and judgment-heavy role. Once the runtime has task-derived cluster summaries, it recenters compression on the task-graph read and then applies a stronger final context blend. The task-graph-stage blend is 0.40 / 0.60 for workflow compression, which helps knowledge-work roles avoid looking artificially static after the task graph already shows meaningful narrowing.
Organizational conversion
As with workflow compression, that is the base blend. In the task-graph path the runtime then recenters organizational conversion on task-derived exposure and retained-leverage structure and applies a stronger final context blend. The current task-graph-stage blend is 0.50 / 0.50 for organizational conversion.
The substitution gap is the portion of workflow compression that has not yet converted into real organizational change. A large gap means technical pressure exists, but the organization has not yet restructured around it.
The outer adoption layer that feeds organizational conversion is intentionally hard to inflate in low-BTOS occupations. Adoption realization is driven mainly by the occupation-level BTOS adoption signal itself, with smaller covered-sector current-use and workflow-change terms, and labor tightness only adds a meaningful lift once adoption is already nontrivial. That keeps thin or low-adoption clerical roles from inheriting a broad realization floor just because the labor market is tight.
Default baseline runs are conservative: if the user does not complete the questionnaire, the model still uses reviewed default questionnaire settings for that occupation and hierarchy. But it does not treat organizational adoption readiness as an implicit midpoint answer layered on top of those defaults. The default-profile path uses a lower baseline adoption-readiness term before blending with occupation-level adoption realization context, so low-adoption occupations do not read as if the user selected "medium adoption" by doing nothing.
Reviewed role variants also behave conservatively in plain baseline runs. If the user has not supplied questionnaire answers, a structured questionnaire profile, or material composition edits, the runtime holds the reviewed default variant instead of letting a synthetic default questionnaire preset override that curated baseline shape. Variant recommendation activates once real profile or edit signal exists.
Technical appendix
This appendix collects the most important mathematical objects and explains what the UI does with them.
Role structure
- Active tasks $\mathcal{T}^{*}$, active functions $\mathcal{F}^{*}$, and active dependency edges $\mathcal{D}^{*}$.
- Evidence terms: prior score, observed task evidence, reliability weight $\lambda$, and evidence blend weight $w_t^{\text{evidence}}$.
- Task-scoring terms: difficulty $D_t$, direct pressure $p_t^{\text{direct}}$, spillover $p_t^{\text{spill}}$, and retained leverage $L_t$.
- Role-level summaries: direct exposure pressure, indirect dependency pressure, residual role integrity, retained accountability strength, retained bargaining power, role fragmentation risk, and recomposition outputs.
TopK task rankings
The walkthrough panels and appendix use task-level rankings to show the most important rows:
Task evidence notes
The supporting detail shows a task-level breakdown and task map. Those surfaces show which tasks are under the most direct pressure, which ones are mainly moving through spillover, and which rows are anchored by direct task evidence versus broader fallback structure. At the task-row level, the page explains in plain English whether a task is mostly following a task-family fallback model or whether stronger task-level evidence is pulling the read.
Coverage and confidence
The runtime tracks direct_coverage_ratio (how much of the task graph resolved to task-level evidence rather than broader fallback structure) and composite confidence. That matters because the model is designed to say when it is reading a role mostly through direct task evidence and when it is still leaning on broader defaults.
When direct coverage is unusually thin, high-specificity evidence is scarce, and fallback rows dominate the active role mix, the engine activates a narrow thin-evidence guardrail. That guardrail lowers confidence in the structural read, makes timing less sharp, and widens recomposition bands rather than presenting the output as equally sharp. In the reviewed baseline occupation library this mostly acts as a sparse-evidence backstop, but it remains important for weaker-support or more heavily edited compositions.
Epistemic limitations
- It does not output calibrated probabilities. The top structural-state chart shows normalized forecast shares from a calibrated client-side mapping, not trained probabilities of employment states.
- It does not have equal evidence coverage everywhere. Some occupations and task families still rely more heavily on fallback defaults than others. The library currently covers 61 launch occupations.
- User edits are powerful, but they still operate inside a reviewed structural model.
- Function detail still varies across occupations: some occupations already have rich multi-anchor function graphs, while others still use thinner defaults.
- Calibration sources do not replace the core scoring logic by default. ORS, ACS, and most of BTOS remain outside the task-scoring loop because they answer different questions. BTOS contributes to the outer adoption-realization layer through a derived context row, but it does not drive task automability directly.
- It does not claim that exposed work automatically becomes job loss. The entire model exists to avoid that shortcut.
- The classifier is calibrated heuristically against anchor occupations, not trained on a labeled dataset. The same is true for the new state-forecast share mapping in the client: it is a documented calibrated transformation of engine outputs, not a learned probabilistic model.
Structural assumptions
The model embeds several structural assumptions that are not derived from data:
- Logistic adoption. AI adoption within each task cluster follows an S-curve. Adoption could instead be step-function (sudden capability jumps), linear, or multi-modal. The logistic shape is parameterized but the functional form itself is an assumption.
- Monotonic transformation. The transformed-share curve only goes up. The model has no mechanism for AI capabilities regressing, regulatory rollback, or organizational reversion. This is reasonable over a 10-year horizon but is still an assumption about the directionality of technological change.
- Firm rationality. The firm-incentive signal assumes organizations act on economic incentives to automate remaining bottlenecks. This ignores organizational inertia, institutional politics, union resistance, or deliberate decisions to preserve human roles for non-economic reasons.
- Occupation as unit. The model treats an occupation as a meaningful unit of analysis. The same occupation title can span very different work across employers, industries, and geographies. Role variants address this for a small reviewed subset; most occupations are treated as structurally homogeneous.
- Task independence within clusters. Tasks within a cluster are scored independently and aggregated. In practice, automating one task can make adjacent tasks simultaneously easier and more important. The dependency graph captures some of this, but only for explicit edges.
Sensitivity and precision
- Occupation selection dominates the output. The occupation you select determines the task inventory, cluster composition, function anchors, dependency graph, all priors, and all context layers. The questionnaire adjusts the result within a bounded range (friction delta is +/-0.15 from the occupation baseline). The assumption sliders stress-test but cannot override the structural occupation profile. If two users select different occupations but give identical questionnaire answers, the results will differ far more than if they select the same occupation and give opposite questionnaire answers.
- Sub-year precision is a rendering artifact. The timeline plots at 0.1-year intervals from subjective Likert-scale inputs. The charts are drawn continuously for readability, but the underlying inputs do not support distinguishing year 3.4 from year 3.5 of an AI adoption curve. Threshold crossings shown in the UI are approximate structural markers, not precise temporal predictions.
- The 10-year forecast has no planned retrospective validation. The calibration framework targets structural plausibility against current labor-market data, not predictive accuracy. There is currently no mechanism to check whether these forecasts were right or wrong. Evidence that would count against the model: if an occupation classified as "displaced" at year 5 shows strong employment growth and stable task composition at that point, or if an occupation classified as "retained" undergoes rapid structural change well ahead of the model's timeline.
Weight provenance
The model contains dozens of hand-tuned weight vectors. Their provenance falls into three categories:
- Empirically motivated. The friction dimension weights (tacit context 0.28, judgment 0.26, accountability 0.18, exception 0.15, inverse document intensity 0.13) are directionally supported by Dallas Fed research on wages in AI-exposed occupations and OECD AI-WIPS data on originality and skill demand. The direction is evidence-based; the specific ratios are proxy estimates.
- Calibrated against the occupation set. Timing thresholds, state-classification boundaries, and the classifier's discrete thresholds were tuned so the 61 shipped occupations produce structurally plausible outputs. Novel occupations or unusual questionnaire profiles may fall into regions of the parameter space that were never tested.
- Proxy defaults. Most other weights (role integrity coefficients, structural necessity terms, firm incentive terms, fate margin terms) are proxy estimates or fallback values — they are not derived from empirical data and there is no external constraint on their specific values. They sum to 1.0 for mathematical hygiene and are documented in the source code, but a different set of reasonable proxy values would produce different outputs.
Input sensitivity band
Point estimates in this model — role integrity at year 5, exposed task share, retained share at the next scenario — chain through many proxy-weight constants and depend on self-reported questionnaire answers that have realistic precision limits. A user who answered one question differently, or whose situation sits between two Likert steps, would get a different number.
The engine exposes a computeResultWithBand function that quantifies this. It reruns the full engine twice more — once with all questionnaire answers shifted up one Likert step, once shifted down — and computes the min and max of six key metrics across the three runs:
residual_role_strength_scoreexposed_task_sharenext_checkpoint_role_integritydistant_checkpoint_role_integritynext_checkpoint_retained_sharedistant_checkpoint_retained_share
The result is returned as sensitivity_band on the nominal result object, with _lo and _hi suffixes on each metric name. This is an input-sensitivity band, not a probability interval. It does not represent a range of plausible futures — it represents how much the output moves when the inputs are off by one questionnaire step, which is a realistic estimate of self-report precision. A wide band means the result is sensitive to small input changes and should be read as a rough range. A narrow band means the result is stable across nearby inputs.