Beyond Automation: Stanford's New Framework Redefines the Future of AI at Work

A groundbreaking audit of 1.5k workers and 52 AI experts shows we're building the wrong tools. Here's the map to get it right.

ArXiv In-depth Analysis

Towards Dev

· ~11 min read · June 17, 2025 (Updated: June 26, 2025) · Free: No

Stanford University's "Future of Work with AI Agents" paper introduces WORKBank, a new framework and dataset auditing automation vs. augmentation. By analyzing worker desires and AI capabilities, the study reveals a major disconnect between tech investment and what the labor market actually needs, proposing a human-centric path forward.

The narrative surrounding AI in the workplace is at a fever pitch. We are told, with a mix of breathless optimism and existential dread, that autonomous AI agents are on the verge of revolutionizing every job, from coding and marketing to law and logistics. The prevailing question in boardrooms and venture capital pitches seems to be a simple, powerful one: "What can we automate next?"

This question, driven by the pursuit of efficiency and profit, has fueled a gold rush to build ever-more capable AI systems. But what if it's the wrong question? What if, in our haste to build the future of work, we've forgotten to consult the very people who will inhabit it?

A landmark new paper from Stanford University, "Future of Work with AI Agents," delivers a stunning reality check. It argues that the current, technology-first approach is not just incomplete; it's dangerously misguided. By focusing solely on technical capability, we risk creating tools that workers don't want, misallocating billions in R&D, and fostering a future of work that diminishes human agency rather than enhancing it.

Overview of the auditing framework and key insights. The framework captures dual perspectives on automation and augmentation by eliciting both worker desires and expert assessments of technological capabilities. It guides participant reasoning through structured prompts and an audio-enhanced interface. We instantiate this framework to build the WORKBank database, enabling a data-driven analysis of worker-centered needs, the desire–capability landscape, the Human Agency Scale (HAS) spectrum, and implications for core human skills.

The paper doesn't just diagnose the problem. It provides a solution: a novel, systematic framework for auditing the workplace, not from the top down, but from the worker up. The result is the most comprehensive map we have to date of the true landscape of opportunity for AI agents — a map that reveals critical mismatches, hidden gems, and a fundamentally more collaborative vision for our AI-powered future.

A New Blueprint for the Future of Work: The WORKBank Project

The Stanford researchers recognized that to understand the real potential of AI, you need to reconcile two powerful forces: what is technologically possible and what is humanly desirable. To capture this, they built WORKBank, a massive database constructed by engaging two distinct groups:

1,500 Domain Workers: Real people from 104 different occupations — from accountants to artists — were surveyed about the specific tasks they perform. Using an innovative audio-enhanced interview process, they shared nuanced views on which tasks they would want an AI agent to take over and where they'd prefer to remain in control.
52 AI Experts: Leading researchers and practitioners in AI agent development were asked to assess the current technical capability of AI to perform those very same tasks.

This dual-perspective approach is a radical departure from the norm. It moves the conversation from a monologue by technologists to a dialogue with the workforce. And to facilitate this dialogue, the researchers introduced their most powerful conceptual tool: The Human Agency Scale (HAS).

The Human Agency Scale (HAS) is the key. It's a shared language to move beyond the crude "automate or not" dichotomy and into a more sophisticated discussion about collaboration.

Think of it like the SAE levels for autonomous driving. We don't just talk about cars that "drive themselves"; we have a nuanced scale from Level 1 (driver assistance) to Level 5 (full automation). The HAS does the same for workplace tasks, but with a crucial "human-first" twist.

Levels of Human Agency Scale (HAS). We introduce the Human Agency Scale (i.e., H1H5) to quantify the team dynamics and degree of human involvement required. HAS provides a shared language to quantify automation vs. augmentation, complementing the traditionally "AI-first" perspective used in defining levels of automation. Importantly, higher HAS levels are not inherently better — different levels suit different AI roles.

H1: Full Automation (No Human): The AI agent handles the task entirely on its own.
H2: Minimal Human Input: The AI drives the task but needs a few key inputs for optimal performance.
H3: Equal Partnership: The human and AI collaborate closely, outperforming either one alone. This is the realm of true augmentation.
H4: Human-Led with AI Assistance: The human takes primary responsibility but requires AI input to successfully complete the task.
H5: Essential Human Involvement: The AI cannot function without continuous human involvement.

This scale is transformative. It acknowledges that the goal isn't always to reach H1. For many complex, creative, or high-stakes tasks, the optimal outcome might be a balanced H3 partnership. It provides a framework for designing AI not just as a replacement, but as a true collaborator.

Technical Deep Dive: Deconstructing the Human Agency Scale (HAS)

For those of us building and deploying these systems, the HAS is more than an academic concept — it's a practical design specification. Why is it so much more useful than a simple capability score?

A traditional "automation potential" score, like those used in many economic studies, typically asks, "What percentage of this task's activities can be performed by an algorithm?" This is a purely technical and reductionist view. The HAS, by contrast, captures the nature and quality of the human-computer interaction required.

Consider two tasks:

Task A: Transcribe audio from a meeting into text.
Task B: Mediate a dispute between two team members to find a resolution.

A simple automation metric might score both as having low automation potential if current tech isn't perfect. But the HAS reveals a deeper truth. Task A is an H2 or H1 candidate. The goal is to remove the human as much as possible to achieve speed and accuracy. The AI's role is to replace a tedious human function.

Task B is fundamentally different. It's an H4 or H5 task. An AI might assist by providing relevant documents or suggesting compromise frameworks, but the core work — empathy, nuanced communication, trust-building — is essentially human. The AI's role is to enhance human capabilities, not replace them.

By asking workers to place tasks on this scale, the WORKBank project captures crucial context that a simple capability assessment misses:

Risk and Stakes: Workers desire more agency (H4/H5) in high-stakes decisions.
Enjoyment and Creativity: Tasks that are central to a worker's sense of purpose and enjoyment are preferred as H3 or higher.
Interpersonal Dynamics: Tasks requiring empathy, negotiation, or complex social understanding demand high human agency.

Designing an AI agent for an H2 task is a different engineering challenge than designing for an H3 or H4 task. The latter requires sophisticated human-computer interaction (HCI), explainability, and features that support seamless collaboration. The HAS gives us the vocabulary to define these requirements.

The Four Zones of AI at Work: A GPS for R&D and Investment

By plotting every one of the 844 tasks on two axes — Worker Desire for Automation (low to high) and AI Expert-Rated Capability (low to high) — the study creates a powerful "Desire-Capability Landscape." This map is effectively a GPS for the entire industry, highlighting where we should be accelerating, where we should be cautious, and where the biggest untapped opportunities lie.

Integrating worker and AI expert perspectives divides the automation landscape into four zones: Automation "Green Light" Zone, Automation "Red Light" Zone, R&D Opportunity Zone, and Low Priority Zone. a, Tasks from WORKBank are plotted in this desire-capability landscape. b, We collect Y Combinator (YC) companies and map them to tasks based on the description on their official YC detail pages using gpt-4.1-mini. The average number of YC companies per task shows no significant difference across zones, highlighting the importance of steering more investment toward the Automation "Green Light" Zone and R&D Opportunity Zone. c, We collect AI agent research papers from arXiv and evaluate their applicability to each occupational task in our database Opportunity Zone, though increased emphasis on this area remains desirable. using gpt-4.1-mini. Encouragingly, the paper-task mappings are concentrated more in the R&D

The landscape is divided into four distinct zones:

1. Automation "Green Light" Zone (High Desire, High Capability):

What it is: These are the no-brainers. Tasks that are tedious, repetitive, and workers are eager to offload, and for which the technology is already mature.
Examples: Scheduling appointments, converting file types, generating routine reports.
Implication: This is the low-hanging fruit. Deploying AI agents here offers immediate, widespread productivity gains and improves worker satisfaction. We should be flooring the accelerator in this zone.

2. R&D Opportunity Zone (High Desire, Low Capability):

What it is: The holy grail for researchers. These are tasks that workers desperately want to automate or augment, but where current AI technology falls short.
Examples: Creating complex production schedules, approving operational budgets, arranging for the distribution of materials.
Implication: This is where our brightest minds and R&D dollars should be focused. Solving these problems meets a proven, articulated need and represents the next frontier of valuable AI development.

3. Automation "Red Light" Zone (Low Desire, High Capability):

What it is: The danger zone. Technology makes it possible to automate these tasks, but workers are highly resistant. This is where AI is perceived as a threat to job security, enjoyment, or quality.
Examples: Contacting vendors to build relationships, tracing lost baggage for frustrated customers, creating original designs.
Implication: Proceed with extreme caution. Forcing automation here is a recipe for backlash and failure. The better approach might be to design for high-agency (H3/H4) collaboration, using AI to assist humans rather than replace them.

4. Low Priority Zone (Low Desire, Low Capability):

What it is: The back burner. Workers don't want these tasks automated, and the tech isn't ready anyway.
Examples: Complex, creative, or interpersonal tasks that are core to a professional's identity.
Implication: This is where we should invest the least effort for now. The market isn't asking for it, and the technology isn't there.

This framework is not just an academic exercise. It's a powerful diagnostic tool for anyone in the tech industry. What zone does your product or startup operate in?

Four Revelations That Could Reshape the Labor Market

Digging into the WORKBank data yields a series of profound insights that should challenge the core assumptions of technologists, investors, and policymakers.

Revelation 1: Workers Want a Partner, Not a Replacement

Across the board, the primary motivation for wanting automation (cited in 69.4% of pro-automation responses) isn't to do less work, but to "free up time for high-value work." Workers see AI not as a way to become obsolete, but as a tool to offload the monotonous and repetitive parts of their jobs so they can focus on what humans do best: strategy, creativity, and complex problem-solving. This aligns perfectly with the augmentation-focused vision of AI.

The data also reveals a fascinating disconnect. When comparing WORKBank's findings to actual usage data from Anthropic's Claude.ai, the study found that the top 10 occupations with the highest desire for automation account for a mere 1.26% of total usage. This suggests that current AI usage patterns are skewed by early adopters and don't reflect the vast, latent demand across the broader economy. There's a huge, underserved market of workers waiting for the right tools.

First-hand data from domain workers reveals positive attitudes towards AI agent automation on certain occupational tasks, particularly due to perceived benefits such as freeing up time for high-value work. However, the sentiment varies notably across sectors. a, Automation desire scores A_{w}(t) over 844 occupational tasks, ranked based on WORKBank data, together with sector-specific breakdowns. The distribution indicates a mixed attitude, revealing high diversity of needs and preferences of workers that should be considered in AI agent R&D. b, Reported reasons for responses with A_{w}(t) ≥3 . The most selected reason — "Automating the task would free up my time for high-value work" — accounts for 69.38% of the responses. c, Comparison with usage data from Claude.ai, a LLM-based chatbot (Dec 2024-Jan 2025, from Handa et al. (2025)), shows that the top 10 occupations with the highest average automation desire represent only 1.26% of total usage. This highlights the importance of directly soliciting worker input, as usage data may lag behind actual workplace needs.

Revelation 2: The Glaring Mismatch Between Silicon Valley's Bet and Worker Needs

This is perhaps the most damning finding. The researchers analyzed Y Combinator (YC) companies as a proxy for current startup investment. The results were shocking: 41% of YC company-to-task mappings fall into the "Red Light" or "Low Priority" zones.

Let that sink in. A significant portion of venture capital and entrepreneurial energy is being poured into building AI agents for tasks that workers don't want automated and/or that the tech can't yet handle.

Meanwhile, the "Green Light" and "R&D Opportunity" zones — where real worker demand and massive value creation lie — remain comparatively under-addressed. This is a colossal market failure in the making, driven by a technology-first mindset that is out of sync with human reality. The data screams for a realignment of investment priorities toward solving problems that people actually have.

Revelation 3: The Future is Collaborative — The Dominance of "Equal Partnership"

When looking at the desired Human Agency Scale (HAS) levels, a clear pattern emerges. The most dominant desired level across occupations is H3 (Equal Partnership), which is the top choice for 45.2% of the 104 occupations studied. This isn't a call for full automation; it's a resounding vote for collaboration.

However, the study also surfaces a critical point of future friction. In general, workers prefer higher levels of human agency than AI experts deem technically necessary. As AI capabilities advance (shifting tasks from low-capability to high-capability), many tasks may cross into the "Red Light" zone, where workers resist the level of automation that technology now allows. Navigating this tension will be a key challenge for responsible AI deployment.

Top 10 occupations with the largest discrepancies between the worker-desired Human Agency Scale levels H_{w}(t) and AI expert-assessed feasible levels H_{e}(t) . The discrepancy is measured by the Jensen–Shannon divergence (JSD) between these two distributions, where distributions are computed based on all annotated tasks within each occupation.

Revelation 4: The Great Skill Reset — From Information Crunching to Human Connection

What happens to the value of human skills in an AI-augmented world? By mapping tasks to the core skills required to perform them (as defined by the O*NET database), the researchers provide a glimpse into a potential "Great Skill Reset." They compare the current economic value of skills (proxied by average wages) with the required human agency as AI enters the workplace.

Comparing skill rankings by average wage and required human agency. Each line represents a skill (Generalized Work Activity) mapped from O*NET tasks. Based on the skill-task mapping, we compute the average wage using data from the U.S. Bureau of Labor Statistics (May 2024) and the average expert-assessed human agency level H_{e}(t) to indicate the degree of human involvement required as AI agents enter the workplace. Skills are ranked by average wage (left) and average required human agency (right). The figure highlights the top five skills with the largest upward (green) and downward (red) shifts in rank, suggesting a potential shift in valued workplace skills — from information processing toward interpersonal and organizational competencies.

The findings are stark:

Shrinking Demand for Information Processing: Skills like "Analyzing Data or Information," while currently highly compensated, rank much lower on the required human agency scale. These are precisely the skills that AI agents excel at, suggesting their economic premium may decline.
Greater Emphasis on Interpersonal & Organizational Skills: Skills like "Organizing, Planning, and Prioritizing Work," "Training and Teaching Others," and "Communicating with Supervisors, Peers, or Subordinates" surge to the top of the human agency ranking. These are skills rooted in human interaction, coordination, and leadership — areas where AI is a tool, not a replacement.

The implication is clear: the most valuable human workers of the future won't be the ones who can process information the fastest, but the ones who can lead, teach, empathize, and organize most effectively. This has profound consequences for our education systems and corporate training programs. Are we preparing people for the skills of the past or the skills of the future?

Why This Matters: A Call for Human-Centered AI Development

The Stanford "Future of Work" paper is more than a study; it's a manifesto. It calls for a fundamental shift in how we approach the development of AI for the workplace — a shift from a purely technological perspective to a deeply human-centered one.

The path of least resistance is to continue down the current road: building what's technically possible and forcing it onto the market. This study shows that path leads to wasted investment, worker alienation, and a future that fails to live up to AI's true potential.

The better path, illuminated by this research, is to use this new map to guide our efforts.

For AI Developers & Entrepreneurs: Focus your efforts on the "Green Light" and "R&D Opportunity" zones. Solve the problems workers are asking you to solve. When building for tasks in the "Red Light" zone, design for H3/H4 collaboration and human agency.
For Investors: Use the Desire-Capability Landscape as a due diligence framework. Is this startup addressing a real, articulated need, or are they pushing a technical solution in search of a problem?
For Workers & Educators: The "Great Skill Reset" is coming. Double down on developing uniquely human skills: communication, leadership, creative problem-solving, and empathy. These are the competencies that will be augmented, not automated.

The future of work is not yet written. We have a choice. We can build a future where AI is a force that displaces and diminishes, or one where it acts as a powerful collaborator, freeing us from drudgery and empowering us to reach new heights of creativity and impact. For the first time, thanks to this work, we have a data-driven map to help us choose wisely.

Key Takeaways

There is a major disconnect between what AI can do, what workers want it to do, and where R&D investment is flowing.
The Human Agency Scale (HAS) provides a crucial framework for moving beyond automation vs. augmentation to a nuanced discussion of human-AI collaboration.
41% of AI startup efforts (proxied by YC companies) are focused on areas workers don't want automated or where tech is not ready.
The most common desire among workers is for an "Equal Partnership" (H3) with AI, using it to free up time for high-value tasks.
The future of work will likely see a "Great Skill Reset," devaluing routine information processing skills and increasing the premium on interpersonal, organizational, and leadership skills.

References

Shao, Y., Zope, H., Jiang, Y., et al. (2025). Future of Work with AI Agents: Auditing Automation and Augmentation Potential across the U.S. Workforce. arXiv:2506.06576v2.

#future-of-work #ai #ai-agent #economics #technology

< Go to the original