Flagship research · v1.0

The SMB Agent Economy Benchmark 2026.

How 30–300-person firms are actually building, buying and operating AI agents in 2026. Eight benchmarks we observe across our engagements, nine recurring failure patterns, and a three-stage operating action set. Anchored against Microsoft platform documentation, the analyst frame, and the named regulator work. Free, no gating, methodology-first.

Published July 21, 2026·~22 minute read·By The Star Nova AI Specialists

What this benchmark is — and isn’t

This is a synthesis of patterns observed across our own agent inventory sweeps, Copilot Studio reviews, multi-agent orchestration migrations and governance retrofits with 30–300-person firms running Microsoft 365 + Power Platform across the UK, EU, US and MENA. It is anchored against Microsoft platform documentation, the named analyst frames (Forrester, Gartner, McKinsey), and our own shipped governance pack and ROI methodology. It is not a primary survey with an N=, not vendor-funded, not legal or fiscal advice.

When we say “~60%” or “9–14” these are pattern descriptions from a non-random sample of engagements, presented for diagnostic value, not statistical inference. Runtime choice, governance posture and retire decisions remain a matter for the practice running them; this benchmark helps you frame the questions, it does not answer them.

Executive summary

Eight benchmarks, in one page, for the board pack.

  1. 1
    SMBs underestimate their live agent inventory by 3–5×live agents in a typical 100-person smb tenant on first inventory sweep.
  2. 2
    The wrong tool is picked 60% of the time — Copilot Studio where Power Automate fits, and vice versaof new smb agent builds we audit are on the wrong runtime for the workload.
  3. 3
    Shadow-agent procurement now outpaces shadow-SaaS — because the agent shipped on the SaaS you already paid forof smb ai agent footprint enters via a saas update, not a procurement decision.
  4. 4
    Agents carry the same governance load as Copilot but get a fraction of the treatmentof smb agents we audit have a use-register entry, owner, eval set and incident log.
  5. 5
    Agents go live with no offline evaluation set — failures surface in production, on customer turnsof smb agents in production have no offline eval set or regression-test corpus.
  6. 6
    SMBs are hand-rolling multi-agent orchestration that the platform now ships as primitivesof multi-step smb agent workflows we see are hand-rolled where 2026 platform primitives would do it.
  7. 7
    Agent ROI gets credited to the wrong department — or to no department at allagent projects we review have no defensible roi attribution to a p&l owner.
  8. 8
    Agents accumulate, nothing gets decommissioned — cost and risk both compoundof smbs we audit have ever formally decommissioned an agent.

Benchmarks

Each benchmark: the pattern, what we observe, the public anchor it sits against, and what to actually do.

Benchmark 1
9–14
live agents in a typical 100-person SMB tenant on first inventory sweep

SMBs underestimate their live agent inventory by 3–5×

The pattern. When asked "how many AI agents do you run?" the SMB buyer answers 2–3 — typically a customer-facing chatbot, an internal HR Q&A bot, and "Copilot." A 30-minute sweep across Copilot Studio, Power Platform connectors, browser extensions and SaaS integrations surfaces 9–14 agent-shaped workflows in flight.

What we observe. The delta lives in three places: SaaS-bundled agents that shipped on the next vendor update (HubSpot Breeze, Salesforce Agentforce, Intercom Fin, Notion AI, Zendesk Answer Bot, ServiceNow Now Assist), browser-extension agents installed by individuals (Magical, Sider, Merlin, Monica), and Power Automate flows that quietly added an LLM step in 2025. None of these surface on the IT inventory because none of them required IT to install.

Anchored against: Microsoft Power Platform Release Wave 1 2026 (Copilot Studio expansion into multi-agent orchestration); Microsoft Build 2026 "year of agents" keynote; Forrester State of AI Agents 2026 (median enterprise agent count 47, SMB segment under-studied).

So what. Run an agent inventory sweep as a discrete artefact, not as a sub-section of the use register. Power Platform admin centre + browser-extension audit + SaaS subscription review. Repeat quarterly. You cannot govern, secure or measure what you have not counted.

Benchmark 2
~60%
of new SMB agent builds we audit are on the wrong runtime for the workload

The wrong tool is picked 60% of the time — Copilot Studio where Power Automate fits, and vice versa

The pattern. Copilot Studio gets reached for when the work is a deterministic 6-step flow with one approval gate; Power Automate gets reached for when the work needs conversational reasoning across unstructured artefacts. Both run, neither is cost-defensible at scale, and the chosen tool encodes the wrong cost curve for the eventual workload.

What we observe. The cheap diagnostic: if the workflow has a fixed sequence of steps and structured inputs, the answer is Power Automate (with an AI Builder card if needed). If the workflow has variable steps, requires multi-turn clarification, or composes results from multiple unstructured sources, the answer is Copilot Studio (or a custom agent on Foundry). The decision matters because the per-invocation cost on Copilot Studio is 5–20× a Power Automate run, and the per-flow cost on Power Automate scales linearly with the steps you tried to encode.

Anchored against: Microsoft Copilot Studio + Power Automate pricing pages (2026 update); Microsoft Power Platform "decision matrix" guidance for partners; Microsoft AI Builder per-call meter; Forrester Wave for Low-Code AI Development Platforms 2026.

So what. Adopt a one-page decision matrix for new agent builds: structured vs unstructured, fixed vs variable steps, deterministic vs reasoning. Apply it before any prototype starts. We publish ours in the Copilot Studio patterns post; copy it or write your own.

Benchmark 3
~70%
of SMB AI agent footprint enters via a SaaS update, not a procurement decision

Shadow-agent procurement now outpaces shadow-SaaS — because the agent shipped on the SaaS you already paid for

The pattern. In 2024 the shadow-AI conversation was about end-users opening ChatGPT. In 2026 it is about agents that activated themselves on the next vendor release: Salesforce Agentforce switched on under existing Sales Cloud licences, HubSpot Breeze rolled into Service Hub, Intercom Fin sits inside Inbox by default. There was no procurement gate because there was no procurement event.

What we observe. The SaaS-bundled agent typically retains your data through the vendor’s standard contract, not through an AI-specific addendum. Most SMB DPAs do not contemplate LLM inference on customer data; the SaaS vendor is now doing exactly that. The Article 4 literacy obligation, the use-register entry, the Annex III triage — none of these have been done, because no one inside the SMB was asked to do them.

Anchored against: Salesforce Agentforce, HubSpot Breeze, Intercom Fin, Zendesk Answer Bot, Notion AI, ServiceNow Now Assist 2025–2026 product announcements; EDPB 2025 opinion on LLM inference under existing data processing agreements; ENISA 2025 supply-chain risk note on agentic SaaS.

So what. Build a SaaS-bundled-agent review into your quarterly vendor cadence. Pull the AI feature list from every renewal-eligible SaaS, score it as a deployer obligation, decide before the next renewal whether to opt out, gate-open, or re-paper.

Benchmark 4
<15%
of SMB agents we audit have a use-register entry, owner, eval set and incident log

Agents carry the same governance load as Copilot but get a fraction of the treatment

The pattern. When Microsoft 365 Copilot rolled out, SMBs built a one-time governance pack: AUP, vendor DPA, use register, literacy programme. Agents — because they ship one at a time, often by line-of-business teams — inherit none of that scaffolding. Each agent is a new AI system that has not been classified, not been Annex-III-triaged, not been entered in the register.

What we observe. The minimum-viable per-agent governance artefact is four lines in the register (name, owner, purpose, data classes touched), one rationale paragraph (Annex III in-scope? FRIA triggered? human-oversight model?), and a short incident-log section. Most SMBs have zero of those four for any agent built since January 2026. The Copilot pack covers Copilot. Everything else is uncovered.

Anchored against: EU AI Act Art. 26 (deployer obligations), Art. 27 (FRIA), Art. 50 (transparency); Microsoft Responsible AI Standard v3 (2025); ICO AI auditing framework; Forrester State of AI Governance 2026.

So what. Extend the existing Copilot governance pack to be agent-aware. Per-agent register entry, per-agent rationale, per-agent owner, per-agent incident log. The artefact is short. The discipline of writing it forces the design conversation that should have happened before deployment.

Benchmark 5
~85%
of SMB agents in production have no offline eval set or regression-test corpus

Agents go live with no offline evaluation set — failures surface in production, on customer turns

The pattern. A traditional deterministic flow gets unit tests. An agent gets a demo to the steering committee and a thumbs-up. The agent then meets reality — ambiguous user phrasing, edge-case data, prompt-injection in inbound content, model upgrades that silently change behaviour. The first time the gap shows up is the first customer escalation.

What we observe. The minimum viable eval loop for an SMB-scale agent is a 30–80 turn corpus, hand-curated, covering the top 5–10 expected intents plus 5–10 deliberate edge cases. Run it nightly. Diff outputs across model upgrades. Sample 1–3% of live production turns weekly into the corpus. Total weekly maintenance cost: ~2 hours. Companies that ship this catch behavioural drift before the customer does. Companies that don’t learn about drift from the support queue.

Anchored against: Microsoft Azure AI Foundry evaluation tooling (2026 GA); Microsoft Agent Framework eval primitives; Anthropic + OpenAI published guidance on agent eval; Gartner Hype Cycle for Composable AI 2026 (eval discipline named as the through-the-trough lever).

So what. Make a 30-turn eval corpus a release-gate artefact for every agent. No corpus, no production. Nightly run, weekly review, sample-from-production into the corpus. This is the single highest-leverage operating discipline available to an SMB agent programme.

Benchmark 6
~50%
of multi-step SMB agent workflows we see are hand-rolled where 2026 platform primitives would do it

SMBs are hand-rolling multi-agent orchestration that the platform now ships as primitives

The pattern. Copilot Studio multi-agent orchestration, Microsoft Agent Framework, Azure AI Foundry connected agents, and AutoGen-on-Foundry shipped or matured in 2025–2026. SMBs that started agent work in 2024 wrote their own glue — a Power Automate flow calling a Copilot Studio agent calling a Logic App calling a function calling another agent. The glue is fragile, undocumented, and one of two developers can maintain it.

What we observe. The retrofit cost is real but bounded. Most hand-rolled glue can be migrated to platform primitives in a 3–5 day sprint per multi-agent workflow. The result is fewer surface areas to audit, observability for free, native handoff semantics, and a system the next consultant can pick up without a tribal-knowledge handover.

Anchored against: Microsoft Agent Framework GA (Build 2026); Copilot Studio connected agents and multi-agent orchestration (Release Wave 1 2026); Azure AI Foundry agent service; Semantic Kernel agent abstractions; Forrester Wave for Low-Code AI Development Platforms 2026.

So what. Audit every multi-agent workflow against the platform primitives shipped in 2026. Where a primitive exists, migrate. Where one is coming (release-notes confirmed), schedule the migration into the roadmap rather than building more bespoke glue on top.

Benchmark 7
~3 of 4
agent projects we review have no defensible ROI attribution to a P&L owner

Agent ROI gets credited to the wrong department — or to no department at all

The pattern. The Copilot ROI conversation evolved from "hours saved across the company" to "hours saved per role, by team, with a CFO-grade business case". Agents have regressed to the earlier framing. The customer-support agent saved X hours — but those hours sit on the support team's line, while the build cost sits on IT, and the licence sits on procurement, and nobody can point to the net P&L line that moved.

What we observe. Three corrections fix the bulk of the misattribution. First, name the P&L owner before building — not after. Second, decide attribution between deflection (didn't need a human), augmentation (faster human turn) and elimination (the role does less of this task now). Third, write the counterfactual: what would have happened without the agent, in the same week, in the same volume? The third one is the one nobody does and it is the one CFOs ask for.

Anchored against: Microsoft Copilot Studio business value calculator (2026); Forrester TEI for AI Agents 2026; McKinsey "Operating model for the agentic enterprise" (2026); our own SMB ROI methodology at /ai-roi-calculator.

So what. Add an "Agent ROI sheet" to every agent build kickoff: P&L owner, attribution model (deflect / augment / eliminate), baseline volume, counterfactual statement, measurement cadence. The sheet is one page. The discipline of completing it removes 70% of the post-deployment ROI argument.

Benchmark 8
~0%
of SMBs we audit have ever formally decommissioned an agent

Agents accumulate, nothing gets decommissioned — cost and risk both compound

The pattern. New agents ship. Old agents stay. The Q3 2024 sales-prospecting agent that nobody uses sits idle, still has API keys, still has data access, still costs $40/month on inactive licence. Multiply by 9–14 agents per tenant, and the unmanaged agent footprint is non-trivial in cost, security surface, and audit liability.

What we observe. The retire-rate is zero because there is no quarterly review that asks "is this agent still earning its keep?" The criteria for retirement are not hard — usage frequency, ROI realised vs projected, ownership still claimed, governance still current. The discipline of running the review, and acting on its output, is the missing piece.

Anchored against: Microsoft Power Platform admin centre usage telemetry; Copilot Studio analytics; Forrester State of AI Agents 2026 (retire-rate listed as the single most-missing operating discipline); our own agent inventory sweep methodology.

So what. Run a quarterly agent retire review. Four criteria, four columns, fifteen agents to a page. Agents that fail two criteria get a 30-day notice; agents that fail three get decommissioned this quarter. The retire-rate is the closest agent-economy equivalent of cost discipline in software portfolios.

The pattern catalogue

Nine recurring shapes we name and watch for during agent portfolio audits. Diagnostic shorthand for fast triage.

The inventory-of-three

Buyer names 2–3 agents off the top of their head. The actual inventory is 9–14. Sweep surfaces the delta in 30 minutes.

Where we see it: Every first agent audit. Universal across verticals.

Implication: Run the sweep before the strategy. Strategy on top of a wrong inventory is wasted.

The wrong-runtime trap

Copilot Studio used for a 6-step deterministic flow. Power Automate used for conversational reasoning. The chosen tool encodes the wrong cost curve.

Where we see it: ~60% of agent builds we triage. Common across the Microsoft-stack SMB segment.

Implication: One-page decision matrix before any prototype. Structured + fixed + deterministic → Power Automate. Unstructured + variable + reasoning → Copilot Studio.

The SaaS-bundled bypass

New agent ships on the next SaaS update under existing licence. No procurement event, no IT review, no governance entry.

Where we see it: Salesforce, HubSpot, Intercom, Zendesk, Notion, ServiceNow customers. ~70% of SMB agent footprint.

Implication: Quarterly SaaS-AI-feature review. Each agent that ships gets a deployer-obligations decision before the next renewal.

The governance-pack gap

Copilot governance pack written. Agents shipped after the pack inherit none of it. Per-agent register, rationale, owner, incident log all missing.

Where we see it: 85% of SMBs that completed a Copilot governance retrofit in 2024–2025.

Implication: Extend the pack to be agent-aware. Per-agent artefact is short. Discipline of writing it is the lever.

The demo-to-prod skip

Agent shown in a steering-committee demo, thumbs-up given, agent in production within a week. No eval corpus, no regression set, no measurement plan.

Where we see it: 85% of SMB agents in production.

Implication: 30–80 turn eval corpus is a release-gate artefact. No corpus, no production. Nightly run, weekly sample-from-prod.

The hand-rolled-glue debt

Multi-agent orchestration written as a Power Automate flow calling a Copilot Studio agent calling a Logic App calling a function calling another agent. One person can maintain it.

Where we see it: ~50% of multi-step SMB agent workflows started before mid-2025.

Implication: Audit against 2026 platform primitives. Migrate where a primitive exists. Schedule where a primitive is coming.

The orphan-P&L misattribution

Hours saved sit on one team’s line, build cost on another, licence on procurement, ROI argument unfinished. CFO declines the next agent budget.

Where we see it: ~3 of 4 SMB agent projects we review.

Implication: Name P&L owner at kickoff. Attribution model (deflect / augment / eliminate). Counterfactual statement. One page. Mandatory.

The zero-retire ratchet

Agents accumulate. Nothing decommissioned. Cost, security surface and audit liability all compound.

Where we see it: ~100% of SMBs with more than 6 months of agent build activity.

Implication: Quarterly retire review. Four criteria, four columns. Agents failing two → 30-day notice; failing three → decommissioned this quarter.

The "AI strategy" without an agent strategy

Board signs off "AI strategy". Underneath it is a Copilot rollout plan. Agents — the operating unit of 2026 AI — are not mentioned. The strategy is a year behind its own date.

Where we see it: Mid-2024 strategy documents that have not been refreshed against the 2026 platform reality.

Implication: Refresh the AI strategy with an explicit agent strategy: build-vs-buy, runtime choice, governance posture, retire discipline, eval discipline, multi-agent posture.

Recommendations, by portfolio stage

The action set depends on where you are. Three stages, four actions each. No 50-item playbook.

Inventorying (0–3 months, no current agent picture)

SMBs that have shipped agents one at a time without a portfolio view

  • Run a 30-minute agent inventory sweep: Power Platform admin centre, Copilot Studio analytics, browser-extension audit, SaaS-AI-feature review per renewal-eligible vendor.
  • Score each agent against a one-page rubric: runtime correctness, P&L owner named, register entry exists, eval corpus exists, retire-criteria documented.
  • Identify the top 3 mis-runtimed agents (most expensive on the wrong runtime). Schedule replatforming in the next quarter.
  • Name one internal agent lead with portfolio accountability. Not a builder — an owner.
Operating (3–12 months in)

SMBs with an inventory and a lead; building the operating discipline

  • Make a 30–80 turn eval corpus a release-gate artefact for every new agent. Nightly run, weekly sample-from-prod. No corpus, no production.
  • Adopt platform primitives for multi-agent orchestration. Migrate hand-rolled glue to Copilot Studio connected agents, Agent Framework or Foundry agent service one workflow at a time.
  • Extend the Copilot governance pack to be agent-aware. Per-agent register entry, per-agent rationale, per-agent owner, per-agent incident log.
  • Stand up the quarterly retire review. Four criteria, four columns. Act on it — do not let it become a ceremony.
Compounding (12+ months, portfolio in motion)

SMBs running an agent portfolio with discipline; converting the discipline into compounding advantage

  • Publish your agent portfolio posture externally — inventory size, retire rate, eval discipline, governance posture. Converts to procurement, to talent, and to trust.
  • Run an annual external review of the agent portfolio. Independent eyes; not the same team that built or operates.
  • Contribute anonymised benchmark data to industry reports (this one, Forrester, McKinsey). The benchmarks improve when SMBs participate.
  • Make agent literacy a tier in the staff development plan. Build-track, operate-track, govern-track. Different paths, same goal: the practice is not on three keystrokes from one person.
The SMBs that win the agent economy in 2026 are not the ones with the most agents. They are the ones that counted their agents, named the runtime correctly, evaluated nightly, retired quarterly, and kept a one-page P&L attribution per build.
— the through-line across every benchmark in this report.

What this benchmark is grounded in

The shipped surface of the practice. Every benchmark in this report is observable inside one or more of the following.

17
Long-form posts
7
Case studies
8
Industry landings
9
Governance pack documents
Supporting writing

Primary and secondary research anchors

The platform documentation, analyst work and regulator notes this benchmark leans on.

Methodology & limits

Scope

30–300-person firms running Microsoft 365 + Power Platform with any combination of Copilot Studio agents, Power Automate flows with AI Builder, custom agents on Azure AI Foundry, GPAI API integrations, and SaaS-bundled agents. Geographic concentration of engagements: UK, EU, US, MENA.

Method

Pattern synthesis from agent inventory sweeps, Copilot Studio reviews, multi-agent orchestration migrations, governance retrofits and post-deployment reviews. Each benchmark is anchored against (a) Microsoft platform documentation or release notes, (b) one or more analyst publications, and (c) one or more of our own published posts.

What this is not

Not a randomised primary survey. Not an N= study. Not legal or fiscal advice. Not vendor-funded. We do not claim statistical representativeness across the SMB population. We claim diagnostic utility: if you recognise the pattern in your own portfolio, the recommended action is the one we would prescribe at the start of an engagement.

Conflicts

We are a Microsoft-aligned SMB AI consulting practice. We deploy Copilot Studio, Power Platform and Azure AI as our default toolchain. This shapes the benchmarks we see; the report would look different from a vendor-agnostic generalist, an Anthropic-first build shop, or a public-sector seat.

Versioning

v1.0, published July 21, 2026. Minor bumps as Power Platform release waves and analyst publications land through 2026\u20132027. Major bumps for added benchmarks or revised recommendations.

Grow this into primary research

If you run a 30–300-person firm and would contribute anonymised agent-portfolio data points (inventory size, runtime mix, eval discipline, retire rate, governance posture, ROI attribution model), we will fold them into v2.0 and credit you (or keep you anonymous, your call). Ten minutes, no sales follow-up unless you ask.

Companion: SMB AI Adoption Pattern Report 2026

The adoption layer underneath the agent layer. Seat utilisation, Champion patterns, ROI defensibility, build-vs-buy posture.

Read the adoption report →

Companion: SMB EU AI Act Readiness Index 2026

The regulator layer. AI Act readiness gaps, FRIA scope, Annex III triage, GPAI deployer obligations for SMBs with EU exposure.

Read the readiness index →

Want your agent portfolio benchmarked against this report?

The 8-minute readiness assessment includes an agent inventory + portfolio scoring pass and recommends a calibrated next step. Or skip the assessment and start a conversation directly.

Hi, I'm Nova. Chat, speak, or show me — I'll point you at the right tool.