Chapter 11

Enterprise at Fleet Scale

Gate and govern APM usage across an organization without turning developer setup into a bottleneck.

Objective

After this chapter you can operate APM across a whole organization — not one repo, but a fleet. You will wire apm audit --ci into branch protection as a required, unbypassable pull-request check (the same integrity gate from Chapter 7 and policy gate from Chapter 9, re-run in CI as defence in depth), stand it up with microsoft/apm-action, route dependency traffic through an approved registry proxy (or build for a fully air-gapped network), publish and own an org apm-policy.yml baseline, and roll it all out with a measured adoption playbook. This is the capstone: it introduces no new property. Instead it asks whether the four you already know — Portability, Reproducibility, Provenance / security, and above all Governance — hold for every repo, enforced and operated at scale. Governance is the star of the chapter: the org’s third promise is that “every AI package your developers install is governed by org policy before it touches disk” (Enterprise overview). One honesty note up front, at apm v0.23.1: the apm audit --ci baseline is stable, but three surfaces you will meet here are preview — the --policy flag is experimental, the dedicated registry (distinct from the proxy) is experimental behind a feature gate, and policy enforcement is early preview — so pin the CLI (and microsoft/apm-action, latest v1.10.0) before you lean on any of them as a production gate.

Concept/Theory

From “can this repo install?” to “can every repo install?”

A single team gets almost all of APM’s value from three files and one habit: apm.yml, apm.lock.yaml, and apm install. Everything in Chapters 59 answered a repo-sized question: does my project install, reproduce, stay current, stay safe, and stay within policy? An enterprise asks a different, larger question — the shift this whole chapter turns on: not “can this repo install?” but “can every repo install safely and predictably, without the platform team in the loop?”

That reframing surfaces five things the single-team story never forces you to build (Making the case):

  • Repeatable rollout — a runbook any team follows, not a hero who hand-holds each repo.
  • Policy ownership — a named, protected owner of the org’s apm-policy.yml.
  • Audit gates — a required CI check on every pull request, authoritative regardless of what a developer does locally.
  • Registry strategy — controlled, mirrored, and (when needed) offline dependency traffic.
  • Exception handling — a visible, reviewable way to grant a waiver.

The docs’ worked figure is a mid-to-large org: 50 repositories, 200 developers, five AI coding tools. Without central management a predictable failure set emerges — manual config drifting per repo, no audit trail (“what agent configuration was active at release 4.2.1?” has no answer), version drift between developers and CI, onboarding friction, and ungoverned dependencies: “the same problem regulated industries spent a decade solving for application code, now back in a new form” (Making the case). These are the Chapter 1 pains, multiplied by the repo count.

Two boundaries keep the rest of the chapter honest. First, consuming policy is not owning policy: a fleet team (including the original pilot) consumes the org policy discovered from its git remote — authorship lives in <org>/.github behind CODEOWNERS and branch protection, exactly the org-remote model from Chapter 1 and Chapter 9. Second, nothing here is a new capability: fleet scale is Chapters 6–10 made repeatable, owned, gated, and measured. The common trap is to think “fleet scale is just the single-repo setup, copied N times” — but N copies without central ownership simply reproduce the drift problem at higher cost. The missing pieces are ownership, gates, a registry strategy, and exceptions, none of which a single repo ever needs.

In APM

The fleet gate: apm audit --ci

The authoritative enforcer at fleet scale is one command run on every pull request: apm audit --ci. It is not a new engine — it is the same gate you already know, re-run in CI as defence in depth. On a PR it runs the eight baseline lockfile checks (lockfile-exists, ref-consistency, deployed-files-present, no-orphaned-packages, skill-subset-consistency, config-consistency, content-integrity, includes-consent), the install-replay drift check, and — if an apm-policy.yml is discovered from the git remote — the org Governance checks. Exit code is 0 clean, 1 on any violation (Enforce in CI).

Why re-run a gate the developer already passed on their machine? Because the install-time gate from Chapter 8 and Chapter 9 protects only the developer’s own disk, and it is locally bypassable: a developer can pass --no-policy, --force, or set APM_POLICY_DISABLE=1. CI re-runs the identical checks on the pull request itself — and “--no-policy does not work here — CI ignores the local bypass flag” (Enforce in CI). Wired into GitHub Rulesets as a required status check, a violating PR simply cannot merge (GitHub rulesets). That is what makes an org rule actually authoritative on every merge across every repo.

The apm audit --ci flags that matter in a fleet gate. Verified against apm v0.23.1.
Flag Job Fleet note
--policy <src> Explicit policy ref: org | owner/repo | https://… | local path [experimental]. Omit it and APM auto-discovers from the git remote, like apm install.
--no-cache Force a fresh policy fetch Recommended in CI — a cached policy file must not mask a same-day org update.
--no-policy Skip policy discovery (baseline + drift only) Not a bypass of the org gate — CI wires the unflagged command; baseline is never bypassable.
--no-fail-fast Run every check even after one fails Use for full reports and drift sweeps; the default stops at the first failure.
-f sarif|json, -o <path> Structured output; write to a file (format inferred from extension) SARIF feeds Code Scanning. Markdown is not supported in --ci mode.

Two properties of the gate are worth holding onto. The eight baseline checks plus drift always run and are never bypassable — that is the tie-back to Chapter 6: Reproducibility is enforced in CI even when Governance is off. And there is no per-PR override flag, by design. Exceptions are visible or they do not exist: you either amend <org>/.github/apm-policy.yml through normal review (allow-list the package, raise a cap) or lower enforcement from block to warn for that scope — findings still appear in SARIF, they just stop failing the job. “Bypass must be visible in the policy file’s history” (Enforce in CI). That is the exception handling the concept promised, made concrete.

microsoft/apm-action: the turnkey CI path

microsoft/apm-action is the convenience wrapper that installs the CLI, runs apm install, and can emit SARIF for Code Scanning (microsoft/apm-action). It is not the enforcer — the gate is apm audit --ci; the action just stands it up. Pin the action to the major tag @v1 and pin the CLI with apm-version: for reproducible runs. A GitHub Action cannot be executed locally, so every workflow below is SKIPPED-needs-network — documented, not run — while the underlying apm audit --ci gate it invokes is verified (see the worked example).

backend/examples/ch11/workflows/apm-audit.yml — the minimal required-check gate. Made a required status check via GitHub Rulesets, a violating PR cannot merge. needs networkapm v0.23.1
# .github/workflows/apm-audit.yml   -- SKIPPED-needs-network (the `apm audit --ci` it runs IS verified)
name: APM audit
on:
  pull_request:
    paths: ['apm.yml', 'apm.lock.yaml', '.apm/**', '.github/**', '.claude/**', '.cursor/**']
jobs:
  audit:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: microsoft/apm-action@v1        # pin to the major tag; installs CLI + runs `apm install`
        with:
          apm-version: '0.23.1'              # pin the CLI for reproducible CI
      - run: apm audit --ci --no-cache       # verified gate: exit 0 clean / 1 on violation
        env:
          GITHUB_APM_PAT: ${{ secrets.APM_PAT }}   # same-org private repos work with zero config

For findings to appear inline on the PR diff and in the repo’s Security tab, emit SARIF and upload it. The if: always() step is load-bearing — SARIF must upload even when the audit exits 1, or the failing run produces no Code Scanning entry:

backend/examples/ch11/workflows/apm-audit-sarif.yml — SARIF for Code Scanning. needs networkapm v0.23.1
# .github/workflows/apm-audit-sarif.yml   -- SKIPPED-needs-network (documented)
jobs:
  audit:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      security-events: write                 # required to upload SARIF to Code Scanning
    steps:
      - uses: actions/checkout@v4
      - uses: microsoft/apm-action@v1
        with: { apm-version: '0.23.1' }
      - name: Audit
        run: apm audit --ci --no-cache -o apm-audit.sarif   # format inferred from the .sarif extension
        env: { GITHUB_APM_PAT: '${{ secrets.APM_PAT }}' }
      - name: Upload SARIF
        if: always()                          # upload even on exit 1, or a failing run has no alert
        uses: github/codeql-action/upload-sarif@v3
        with: { sarif_file: apm-audit.sarif, category: apm-audit }

Two refinements are worth knowing. The default action runs apm install first, which overwrites managed files and can hide bytes that were hand-edited after the last install; setup-only: true puts the CLI on PATH only, so the committed bytes are audited as ground truth (content-integrity still verifies each file’s SHA-256 against the lockfile). And because apm audit --ci is a plain CLI call, the same gate is vendor-neutral — it runs in Azure Pipelines, GitLab CI, and Jenkins, not only GitHub Actions (CI/CD integration); microsoft/apm-action is just the GitHub-shaped convenience.

Registry strategy: proxy and air-gapped

Because “enterprise networks rarely allow agents to reach github.com directly,” APM routes dependency traffic through composable, environment-variable controls (Registry proxy). The single most important verified fact: these are environment variables, not apm config keys. At v0.23.1 apm config persists only auto-integrate, temp-dir, and mcp-registry-url — so at fleet scale you pin the proxy settings in CI secrets, dev-container env, and shell profiles, “in the same place you pin Python and APM versions,” not via apm config set.

The registry-strategy knobs, with exact environment-variable names. All SKIPPED-needs-network at v0.23.1.
Need Knob Note
Allow outbound traffic at the firewall HTTPS_PROXY / HTTP_PROXY / NO_PROXY Standard forward-proxy vars — “if git clone works through the proxy, apm install works too.”
Mirror every archive for audit + replay PROXY_REGISTRY_URL Rewrites every GitHub-hosted download to fetch via the mirror (e.g. an Artifactory GitHub remote).
Authenticate to the mirror PROXY_REGISTRY_TOKEN Bearer token on proxy requests; independent of GITHUB_APM_PAT.
Refuse direct-VCS fallback (mandatory + auditable) PROXY_REGISTRY_ONLY=1 APM refuses to fall back to github.com; the lockfile records a registry_prefix, and replay aborts on a directly-pinned entry until re-resolved.
Silence the plaintext-token warning on http:// PROXY_REGISTRY_ALLOW_HTTP=1 Use only inside an isolated network. (Deprecated ARTIFACTORY_* aliases still work with a warning.)
Fully air-gapped CI (no egress at all) apm pack on a connected host → restore offline Chapter 10’s bundle, reused as an air-gapped delivery mechanism.

The reproducibility tie-back matters here more than anywhere: the proxy is not the trust anchor — the lockfile is. “Every install verifies the content_hash recorded in apm.lock.yaml regardless of where the bytes came from. A tampered proxy that rewrites archive contents is caught by the lockfile guard, not the cache” (Registry proxy). So routing through Artifactory does not weaken Reproducibility; if anything PROXY_REGISTRY_ONLY=1 makes the proxy path mandatory and auditable. Be honest about coverage, though: the proxy covers apm install for GitHub-hosted deps and marketplace fetches, but not Azure DevOps deps, not MCP servers (a separate registry), and not the apm-policy.yml fetch (which uses the GitHub API directly). And do not confuse the proxy with the dedicated registry: the proxy fronts an upstream git host and is stable; the dedicated registry is a separate, additive package source with no git upstream and is experimental (v0.2 API). APM’s distribution is git-based today — there is no npm-style central registry.

Org policy at fleet scale

The mechanism of org policy is already fully covered in Chapter 9 — the schema, the warnblock dial, tighten-only inheritance. Fleet scale changes only where it lives and who owns it: the authoritative policy sits in <org>/.github/apm-policy.yml behind CODEOWNERS and branch protection, and every consuming repo (the pilot included) only consumes it, discovered from its git remote. Past the pilot, the org file is enforcement: block with fetch_failure: block so a repo whose policy cannot be fetched fails closed rather than silently fail-opening. Two parts of this are exactly the parts that need infrastructure a reader’s sandbox will not have — org-remote discovery and the tighten-only extends: merge — so both are SKIPPED-needs-network here (recall from Chapter 9 that a local-file extends: does not merge a parent; real inheritance needs an org / owner/repo / https:// ref). You author and test the file locally, then land it in <org>/.github.

When to use / pitfalls

Roll out in phases, measure by leading indicators

Rolling APM out to a fleet is a staged program, not a switch. The official adoption playbook is five phases, each with a single owner, a deliverable, and a gate to clear before advancing (Adoption playbook). The outline’s four-word mnemonic — pilot → measure → standardize → gate — maps directly onto them:

The five-phase adoption playbook. Each phase buys evidence the next one needs — do not skip phases. From the APM adoption playbook.
Phase Owner Gate to advance
Discover Platform team Shadow apm install --dry-run + apm audit on representative repos; answer “what breaks if we turn this on tomorrow, and for whom?”
Pilot One product team + platform Manifest, lockfile, CI audit, and policy in warn; two consecutive weeks of clean pilot CI with every warning triaged.
Harden Security + platform Flip warnblock, add the registry proxy, stand up marketplaces; a fresh repo installs against org policy + proxy with no manual help.
Scale Product teams (self-service) Platform is no longer in the critical path; new repos onboard from a checklist.
Sustain A named on-call Steady state: weekly drift triage, monthly lockfile review, quarterly marketplace refresh.

Success is measured by leading indicators, not package counts. The docs are explicit that “measuring apm.yml count and nothing else” is a vanity metric — a repo with a manifest but a failing audit or rising drift is not adopted, it is at risk. Watch audit pass rate, the drift trend (findings closed vs. opened), and marketplace uptake instead (Adoption playbook).

Worked example

Meridian’s four moves, in order. Only the first is executable in a reader’s sandbox — the CI gate itself — so it is shown running against a local project (offline, one public package, no tokens). The proxy, the workflow, and the org-policy merge each need infrastructure and are marked SKIPPED-needs-network.

Move 1 — the required gate (RUNNABLE offline)

A clean project — one pinned public dependency plus its committed lockfile — passes the gate. The clean run is the eight baseline checks plus the drift check: nine checks, exit 0. (The Could not determine org… line is benign: a scratch directory has no git remote, so org auto-discovery is skipped.)

apm audit --ci on the clean project — nine checks pass, exit 0. This is the gate the required PR check runs. Replays from cache and scans locally, so it runs offline. Transcript abbreviated for space. apm v0.23.1
$ apm audit --ci
  [!] Could not determine org from git remote; enforcement skipped (set policy.fetch_failure_default=block in apm.yml to fail closed)
  [>] Replaying install (cache-only)...  [+] Replayed 2 package(s)  [+] No drift detected

                           [>] APM Policy Compliance
  │ [+] │ lockfile-exists          │ Lockfile present                                  │
  │ [+] │ ref-consistency          │ All dependency refs match lockfile                │
  │ [+] │ deployed-files-present   │ All deployed files present on disk                │
  │ [+] │ no-orphaned-packages     │ No orphaned packages in lockfile                  │
  │ [+] │ skill-subset-consistency │ Skill subset selections match lockfile            │
  │ [+] │ config-consistency       │ No MCP configs to check                           │
  │ [+] │ content-integrity        │ No critical hidden Unicode or hash drift detected │
  │ [+] │ includes-consent         │ No local content deployed -- includes … skipped  │
  │ [+] │ drift                    │ no drift detected against lockfile                │

  [*] All 9 check(s) passed          # exit 0 -- 8 baseline checks + drift

Now the case that makes the required check authoritative: a block policy (enforcement: block, require_pinned_constraint: true) against an unpinned direct dependency on a bare #main branch. The dependency-pinned-constraint check fails and the gate exits 1 — a required PR check would block the merge, and no local --no-policy can rescue it in CI:

apm audit --ci --policy ./pol-block.yml when a direct dependency tracks a bare branch — the check fails, exit 1. Runs offline against a local project + policy file. Transcript abbreviated for space. apm v0.23.1
$ apm audit --ci --policy ./pol-block.yml   # enforcement: block, require_pinned_constraint: true
  [>] Replaying install (cache-only)... [+] No drift detected
  ...                                        # baseline + policy checks run
  │     │ dependency-pinned-constraint │ 1 dependency(ies) use unbounded constraints
  │     │                              │ (hint: pin to a semver range, literal tag, or SHA) │
  │     │                              │   - microsoft/apm-sample-package: bare branch 'main' tracks a moving tip │

  [x] 1 of 18 check(s) failed          # exit 1 -- the required check blocks the PR

The same block policy against a pinned direct dep — even one that pulls an unpinned transitive — passes, because require_pinned_constraint is direct-only. Note the higher check count: with no failure, fail-fast never trips, so every check is enumerated:

Same policy, but the direct dep is pinned (#v1.0.0) — the unpinned transitive does not trip the direct-only rule; all checks pass, exit 0. Runs offline. apm v0.23.1
$ apm audit --ci --policy ./pol-block.yml   # DIRECT dep pinned to #v1.0.0; a transitive dep is unpinned
  │ [+] │ dependency-pinned-constraint │ All dependencies use pinned constraints │
  ...
  [*] All 29 check(s) passed          # exit 0 -- direct-only rule; count higher because nothing fails
                                      #          (fail-fast never trips; use --no-fail-fast for a full report)

Move 2 — make it required in CI (SKIPPED-needs-network)

Meridian stands the verified gate up on every PR in all three product groups with microsoft/apm-action, then makes the job a required status check via GitHub Rulesets so a violating PR cannot merge. The workflow is the one shown earlier (backend/examples/ch11/workflows/apm-audit.yml); the SARIF variant surfaces each finding on the PR diff. Both are documented, not run — but the apm audit --ci they invoke is the command proven in Move 1.

Move 3 — route traffic through the approved proxy (SKIPPED-needs-network)

Meridian’s security org already mandates Artifactory for npm and PyPI, so APM joins the same operating model. The platform team pins these environment variables in CI secrets and the shared dev container — not in apm config — so the whole fleet resolves identically:

Routing APM dependency traffic through the approved registry proxy. These are environment variables, not apm config keys. Needs an Artifactory / private host to run. needs networkapm v0.23.1
# SKIPPED-needs-network: pin in CI secrets / dev-container env, not `apm config`.
export HTTPS_PROXY="http://proxy.meridian.example:8080"
export PROXY_REGISTRY_URL="https://artifactory.meridian.example/artifactory/github-remote"
export PROXY_REGISTRY_TOKEN="$ARTIFACTORY_TOKEN"   # bearer token; independent of GITHUB_APM_PAT
export PROXY_REGISTRY_ONLY=1                        # refuse direct github.com fallback (mandatory + auditable)

apm install                                        # every archive fetched via the mirror;
                                                   # content_hash from apm.lock.yaml still verified (Ch6)

A fully air-gapped group would instead receive a pre-built bundle: apm pack on a connected host (Chapter 10), restored offline. Either way, integrity is anchored to the lockfile’s content_hash, so the mirror is a routing and audit convenience, never a new trust anchor.

Move 4 — publish and own the org baseline (SKIPPED-needs-network)

Finally the platform team lands the org apm-policy.yml in meridian-finance/.github, behind branch protection. This is the Chapter 9 schema, relocated and set to fail closed; the three product-group repos consume it from their git remote. Authoring and testing happen locally; the org-remote discovery and tighten-only extends: merge that make it fleet-wide need infrastructure, so they are SKIPPED-needs-network:

meridian-finance/.github/apm-policy.yml — the fleet baseline, owned behind CODEOWNERS + branch protection; consuming repos only consume it. Org-remote discovery + extends: merge need infrastructure. needs networkapm v0.23.1
# meridian-finance/.github/apm-policy.yml   -- SKIPPED-needs-network (org-remote discovery + extends: merge)
name: meridian-org-baseline
version: "1.0.0"
enforcement: block               # past the pilot: fail closed on violations
fetch_failure: block             # if the policy can't be fetched, fail closed (do not fail-open)
dependencies:
  require_pinned_constraint: true   # fires on a bare-branch DIRECT dep (Move 1) -- direct-only
  allow:                            # only these sources (deny still wins)
    - meridian-finance/**
    - microsoft/**
    - github/awesome-copilot/**
  deny:
    - sketchy-org/**
compilation:
  target:
    allow: [copilot, claude, cursor]   # target rules live HERE, not a top-level `targets:` (Ch9)

With the baseline in warn first, Meridian watches Code Scanning across the three groups for two sprints, remediates the top offenders, then flips to block — the same measured rollout as Chapter 9, now fleet-wide, with the pilot repo as the canary. Their status report to leadership is audit pass rate and drift trend, not “repos with a manifest.”

Recap & next

Recap

  • The question changes at scale. From “can this repo install?” to “can every repo install safely and predictably?” A fleet needs five things one team never forces: repeatable rollout, policy ownership, audit gates, a registry strategy, and exception handling. This is the capstone for Governance — enforced and operated across the org.
  • The fleet gate is apm audit --ci. The same integrity gate (Chapter 7) and policy gate (Chapter 9), re-run in CI as defence in depth: eight baseline checks + drift (+ discovered policy), exit 0/1. Made required and unbypassable via GitHub Rulesets — local --no-policy does not work in CI. Stand it up with microsoft/apm-action@v1; it is vendor-neutral.
  • Registry strategy is environment-variable driven. HTTPS_PROXY + PROXY_REGISTRY_URL + PROXY_REGISTRY_ONLY=1 route and mirror traffic; apm pack serves air-gapped networks. Integrity stays anchored to the lockfile content_hash (Reproducibility), not the proxy — the proxy is never the trust anchor.
  • Adoption is change management. Discover → Pilot → Harden → Scale → Sustain, each with an owner, a deliverable, and a gate. Measure by leading indicators — audit pass rate, drift trend, marketplace uptake, setup time — not by manifest count, which is a vanity metric. Carrot before stick; a named on-call, never “the platform team.”
  • Mind the preview edges at apm v0.23.1. require_pinned_constraint is direct-only; the baseline is stable and unbypassable but --policy is experimental; the proxy knobs are env-vars, not apm config; and never claim a locally-run Action result.

Next

You have now consumed, locked, maintained, secured, governed, produced, and operated agent context across a fleet — the full arc from one repo to an organization. Chapter 12 — The Landscape & What’s Next steps back to place APM among the standards it builds on (AGENTS.md, Agent Skills, MCP, OpenAPM v0.1) and the roadmap ahead — including the dedicated registry API this chapter flagged as experimental — so you can decide what to adopt, watch, or build around.