[ 00 ] Why a ladder

The North Stack
Ladder.

AI doesn't fail in regulated operations because the models are weak. It fails because nobody sequenced the risk.

The North Stack Ladder is our five-rung method for adopting AI in complex, regulated operations. Each rung adds exactly one new thing the operation has to trust, and earns that trust before the next rung is built. We map your operation, score every workflow, prove the hard part before you commit to a build, and climb one rung at a time, with the governance tightening as the autonomy rises.

Build only what pays back. Never automate past the point a human can still be accountable.

Book an operations review See the seven services

Isometric illustration of the North Stack Ladder: an ascending five-step staircase from a cluttered operation to an orchestrated summit. — The North Stack Ladder

less manual effort, PMD Finance

less document-review time, Marshall Peters

operational mins/day returned to one client (≈£120,000/yr)

Every critical action

human-approved, with a full audit trail

If we can't prove the hard part on your real documents, you don't pay for a build. We'll put that in writing.

[ The five rungs ]

Each rung adds one
thing to trust.

The top rung is supervised agents operating closed portals you'll never get an API for: the most-stuck, highest-value work in a regulated operation. We earn our way up to it; we don't lead with it.

Mapped

you can see the work

Before anything is built, the whole operation is visible and every workflow is scored.

What it means

You have an honest, end-to-end picture of how work actually flows: the shared inboxes, the shadow spreadsheets, the portal swivel-chairs, the periodic compilations. Every workflow carries a number that says whether it's worth automating, in what order, and at what cost.

What changes

The conversation stops being 'we should do something with AI' and becomes 'we build these three, in this order, for this payback, and we deliberately leave these alone.' Decisions get made on evidence instead of the loudest opinion.

What we build

An operations map of your core functions, with the four universal organs flagged: the shared inbox, the shadow spreadsheet, the portal swivel-chair, the periodic compilation.
A scored workflow register. Every candidate is rated on the six axes, with a build / assist / decline call made explicitly.
A costed, sequenced plan mapping each workflow to a rung of the Ladder.
A written 'do not automate' list: the workflows where AI won't pay back, and why.

Governance

Read-only access to a representative sample. Nothing in your systems is changed during the audit.
The score thresholds are fixed and the same for every client, so the recommendation can't be talked up to suit a budget.

When you're ready

You're ready the moment you suspect parts of your operation are slow and manual but can't say which parts are worth automating or in what order. This is the entry rung, and almost everyone starts here.

Explore the service for this rung

The human role

Humans decide; nothing is touched

AUTONOMY

Assisted

AI reads, humans act

AI retrieves, summarises and drafts. Your people still make every call.

What it means

AI does the reading and the finding; humans keep the decision. A private system answers questions from your own documents with citations, or drafts a first version a person edits. It speeds the human up without ever acting on its own.

What changes

New staff get productive in weeks instead of months. Your experts stop being a human search engine for everyone else. The team learns to trust AI on low-stakes reading before anything depends on it.

What we build

A private, access-controlled knowledge base built from your own documents.
A retrieval interface that answers in plain English with citations back to the source passage, so the answer is checkable, not just plausible.
Drafting and summarisation that a human reviews and owns.

Governance

Runs inside your boundary; nothing trains a public model or leaves your environment.
Role-based access control, so people only retrieve what they're cleared to see.
Citations on every answer, so a claim can always be traced to its source.

When you're ready

You're ready when the answer is reliably somewhere in your documents but finding it is slow, and when the cost of a confidently-wrong answer is low enough that a human reviewing the output is sufficient.

Explore the service for this rung

The human role

AI reads, humans act

AUTONOMY

Automated

pipelines do the typing, humans approve

Pipelines read the inbound flood and compile the periodic returns. Humans approve before anything commits.

What it means

A pipeline does the repetitive typing: reading inbound mail and documents and re-keying them, or assembling the quarterly declaration by hand. A person reviews the result against the original, with field-level confidence scores, and approves. The output is a clean import file or record, never an auto-sent action.

What changes

The single biggest source of operational drag disappears. The team stops typing and starts approving. The days-long quarterly scramble becomes a review-and-approve session.

What we build

An ingestion and extraction pipeline tuned to your real document types and the fields that matter.
A human-in-the-loop review workspace showing the original alongside extracted data, with field-level confidence scores and low-confidence flagging.
Compilation engines that assemble periodic declarations, returns and board packs from live data on a schedule.
A structured output (import file or record) ready for the downstream system.

Governance

A human approves before anything is committed; output is never an auto-sent action.
Every extraction, edit, approval and rejection is logged in a full audit trail.
Anything bound for a regulator passes a hard pre-submission human gate.
Each period's compilation is versioned and reproducible, so it's defensible after the fact.

When you're ready

You're ready when a high volume of repetitive work runs through a shared inbox or a periodic compilation, the rules are knowable, and a human reviewing flagged items is an acceptable control. This rung is behind most of our headline numbers.

Explore the service for this rung

The human role

Pipelines type, humans approve

AUTONOMY

Integrated

systems agree with each other

Your sources stop contradicting each other, and approved results write straight back into your core system.

What it means

First we reconcile the data. The same customer, vehicle, policy or matter that exists in three slightly different versions becomes one golden record with full lineage. Then, once a human approves a result, it's written back into your core system through its API, so the last manual re-keying step disappears.

What changes

Every downstream automation, report and decision stops inheriting contradictory data. The final 'copy the approved answer into the core platform by hand' step is gone. That step was slow, and it was where the errors you just removed crept back in. The savings compound: the workflow goes from much faster to genuinely transformed.

What we build

A harmonisation layer with entity matching, including fuzzy and probabilistic matching for the near-duplicates exact matching misses.
A golden record per entity with full field-level lineage, and conflict-resolution rules you set and we encode.
An API write-back from your approved review workspace into your system of record, sourced from the golden record.

Governance

Conflicts the rules can't resolve are surfaced to a human, never silently guessed.
Writes are validated before they land, idempotent so nothing is written twice, and reversible via rollback paths.
A complete audit trail maps every write back to the person who approved it.
We insist on reconciliation before any write-back. Writing bad data faster is not a win.

When you're ready

You're ready when a reviewed pipeline is proven and live, but a person is still re-keying the approved output into your core system. The prerequisite is clean, reconciled data, and we won't write back without it.

Explore the service for this rung

The human role

Humans approve every write-back

AUTONOMY

Orchestrated

agents work the portals, humans hold the gate

Supervised agents operate the closed portals you can't integrate with, the ones locked behind multi-factor authentication (MFA), with a human approving every critical action.

What it means

Some of your most painful work happens in systems you'll never get an API for: the insurer portal, the regulator submission system, the legacy core behind a login. Supervised Claude agents operate them the way a trained member of staff would, but every critical action pauses for explicit, per-action human approval before it happens.

What changes

A £50-a-month SaaS can never touch the swivel-chair work, because the system is closed by design. Supervised agents get it done at machine pace, while a human stays accountable for every consequential step. This is the highest-value, most-stuck terrain in a regulated operation, and almost no one else will go near it.

What we build

Supervised agents that log in, navigate, read and enter data in your closed, MFA-gated or un-integratable portals.
A supervision and exception workspace where your team approves, edits or halts the agent.
A failure and incident playbook so any stuck or unexpected state escalates safely to a human.

Governance

Per-action human approval on every critical step, enforced before execution. Nothing high-stakes is autonomous.
Permanent red lines: no automated payments, no automated bank-detail changes, full stop.
Regulator submissions sit behind a hard pre-submission gate.
Role-based access control and a complete, exportable audit trail of every agent action.

When you're ready

You're ready only when the rungs below are proven: clean reconciled data, a reviewed pipeline that works, and controls a board has signed off. We build this last, on purpose.

Explore the service for this rung

The human role

Humans hold the gate

AUTONOMY

[ How we engage ]

Audit. Proof. Build. Run.

Regulated buyers buy the process before they buy the product. You can stop after any stage with something you own, and no build is ever priced before the hard part is proven.

Audit

We map the whole operation and score every workflow.

What happens

A senior consultant and an engineer walk your operation end to end, score every candidate workflow on the six axes, and produce a costed, sequenced plan, including the workflows we tell you not to touch.

What you get

An operations map with the four universal organs flagged
A scored workflow register with build / assist / decline calls made in writing
A costed, sequenced roadmap mapped to the five rungs
A written 'do not automate' list

What it costs

Fixed fee. Roughly two to three weeks. A handful of half-day sessions with the people who do the work, plus read access to a representative sample.

Who owns what

North Stack runs it end to end. You provide access and the right people in the room. You own every artefact, whether or not you go on to build.

Exit state

You can stop here with a roadmap you own and act on yourself. Many of the recommendations need no further spend with us.

Proof

We prove the single hardest part on your real data before anyone commits to a build.

What happens

For any workflow worth building, we run a small, fixed-fee proof of the hardest thing. Usually that's extraction accuracy on your actual documents, or whether a closed portal can be operated safely. The risk is tested before the budget is.

What you get

Measured accuracy or feasibility on your own real data, not a generic demo
An honest go / no-go, with the evidence behind it
A firm build scope and price if it's a go

What it costs

Small fixed fee, deliberately a fraction of a build. A short window. A sample of real items from you.

Who owns what

North Stack builds and runs the proof. You supply representative data and confirm what 'good enough' means.

Exit state

If the proof fails, there is no build and you don't pay for one. That's a successful outcome, not a wasted one. No build is ever priced before this passes.

Build

We build the production system in controlled milestones.

What happens

We build the pipeline, workspace, integration or agent to production standard, not a prototype, with governance built in from the first commit. Delivery is staged in milestones so you see working software early and pay against it.

What you get

A production system your team actually uses, not a proof-of-concept
Governance built in: role-based access control (RBAC), audit trail, confidence scores, and human approval where it's needed
Milestone-based delivery and payment, so spend tracks working software

What it costs

Fixed fee, anchored to what the manual version of the work costs you each year, not to our day rate. Paid against milestones. Typical builds run weeks, not quarters.

Who owns what

North Stack delivers; your subject-matter experts validate against real cases at each milestone; your team signs off go-live. You own the result.

Exit state

You go live with a working system. A retainer to run and improve it is agreed now, at sign-off, never bolted on afterwards once you're dependent.

Run

We run, monitor and improve it, and help you climb the next rung when the first has paid for itself.

What happens

The system is monitored, maintained and tuned as your real-world inputs shift. We track the metrics that prove payback, handle the edge cases that emerge, and tee up the next rung once the current one has earned it.

What you get

Monitoring, maintenance and tuning as inputs change
The metrics that show the payback, reported back to you
A sequenced path to the next rung, when you're ready

What it costs

A fixed monthly retainer, tiered to the scope of what's running. Agreed at sign-off, not after.

Who owns what

North Stack runs and improves; you approve changes and decide the pace of climbing. Nothing escalates in autonomy without your sign-off.

Exit state

A system that keeps paying back, and a clear, evidence-led decision about whether and when to climb.

[ The score ]

Six axes. One number
out of a hundred.

Every workflow is scored on the same six axes, with the same weights, for every client. The number decides whether we build, assist, or decline, not the loudest opinion in the room.

Volume

How often does this workflow run?

High-frequency work is where automation compounds. A painful job done twice a year rarely pays back; the same effort done a hundred times a day always might.

Repetitiveness

How similar is each instance to the last?

Repeatable patterns are automatable. Genuine one-offs and bespoke judgement calls are not, and pretending otherwise is how AI projects produce confident nonsense.

Cost of error

What happens if this goes wrong?

High error cost doesn't rule a workflow out; it dictates the rung and the controls. The more it costs to be wrong, the heavier the human gate must be.

Data messiness

How clean and consistent is the input?

Messy, contradictory data is the quiet killer of AI projects. We score it honestly because it usually means reconciliation has to come first.

Compliance load

What does the regulator require here?

Compliance shapes the design: audit trails, sign-off gates, what may never be automated at all. We'd rather price that in from the start than discover it in production.

Integration difficulty

Can the surrounding systems be connected, and how?

This axis writes the phase. An import file is the easy path, an API is harder, and a closed portal that needs a supervised agent is the hardest and highest-value of all.

70 and above

Build

The payback is real and the risk is controllable. This is where we propose a proof.

50 to 69

Assist, don't automate

Worth AI support, but a human stays in the seat. We'll tell you which rung.

Below 35

Decline, in writing

Judgement, negotiation, relationship, or genuinely one-off work. Automating it would cost more than it returns. We name it out loud.

We say no in writing.

We hand you a written 'do not automate' list with every audit. A vendor whose audit always concludes you should buy their product is not running an audit. Concluding you shouldn't build yet is one of the most useful things an operations review can do.

[ Red lines ]

The lines we
will not cross.

These do not move for a deadline, a client, or a confidence threshold. The exclusions are the credibility: they're how you know the yes is honest.

We never automate a payment.
Not with approval, not with a confidence threshold, not just this once. A human makes payments by hand, every time.
We never automate a change to bank details.
The single highest-fraud-risk action in any operation. It stays manual, permanently, by design.
No submission reaches a regulator without explicit human sign-off.
Every regulatory filing sits behind a hard pre-submission gate. The pipeline can assemble it; a named person releases it.

[ The stack ]

Seven services,
mapped to the rungs.

You don't buy the whole stack. Every productised service is a rung. Start where the payback is clearest, prove it, then climb.

Service	01Mapped	02Assisted	03Automated	04Integrated	05Orchestrated
Operations Audit & AI Roadmap
Knowledge Retrieval & Private AI
Inbox & Document Automation
Compilation & Reporting Engines
Reconciliation & Data Harmonisation
Core-System Integration & API Write-Back
Supervised Portal Agents

[ What holds it together ]

Six principles, never
bent for a deal.

Only build what pays back

Every workflow gets a score out of 100 against six axes. The score decides, and the threshold is the same for every client, so it can't be talked up to suit a budget.

Prove the hard part before you commit

No build is ever priced before we've proven the single hardest thing on your real data. If the proof fails, there's no build and you don't pay for one.

Climb one rung at a time

Autonomy only ever increases behind heavier controls, never ahead of them. Most clients start at Mapped or Automated and climb once the first pipeline has paid for itself.

Governance is the structure, not a slide

Role-based access, audit trails, confidence scores and human approval are built into every rung from Assisted up. Two red lines never move.

We say no in writing

Every audit ends with a 'do not automate' list. Naming the work AI shouldn't touch is how you know to trust us on the work it should.

Anchor price to your cost, not our rate

Build fees are anchored to what the manual version of the work costs you each year. A retainer to run and improve it is agreed at sign-off, never bolted on once you're dependent.

[ Questions ]

The method, answered.

How long does an engagement take, end to end?

The audit runs two to three weeks. A proof is a short window after that. A typical build runs weeks, not quarters, delivered in milestones so you see working software early. Most clients are live within roughly four to eight weeks of a build starting, and you can stop after any stage with something you own.

What does it cost, and how is it priced?

Every stage is fixed-fee. The audit and the proof are each a defined fee you agree up front. A build is priced against what the manual version of the work actually costs you each year, not against our day rate, and paid across milestones. A monthly retainer to run and improve it is agreed at sign-off.

What if the proof shows it won't work?

Then there's no build, and you don't pay for one. That's a successful outcome: you've learned where AI doesn't pay back for the price of a proof, not the price of a project. We'd rather tell you no early than sell you a build that disappoints.

Do we need a technical team to work with you?

No. We handle the build. From you we need access, the right people for a few sessions, and a decision-maker to set rules and sign off go-live. We build systems your whole team can use, not ones that need an engineer to operate.

Will this replace our core systems?

No. We build around the systems you already use. We sit on top of your CRM, case-management, policy or ERP platform, and write back into them through their API where one exists. Where a system is closed and has no usable API, a supervised portal agent operates it instead.

What will you refuse to do?

We never automate payments or changes to bank details, and nothing reaches a regulator without an explicit human sign-off. We also decline to build for workflows that score below threshold, and we put that in writing. The exclusions are the credibility: they're how you know the yes is honest.

How do you keep an AI agent safe inside a regulated portal?

Every critical action pauses for explicit, per-action human approval before it executes; nothing high-stakes is autonomous. There's role-based access, a complete exportable audit trail, and a failure playbook that escalates any stuck or unexpected state to a human. We only deploy agents on top of reconciled data and proven lower rungs.

[ Start where you can see the work ]

Start where you
can see the work.

A fixed-fee operations review maps your whole operation, scores every workflow on the six axes, and hands you a costed, sequenced plan, including the workflows we'd tell you to leave alone. No build is priced before the hard part is proven.

Book an operations review

UK STUDIO · REMOTE-FIRST ACROSS THE UK AND EUROPE
[email protected]

The North StackLadder.

Each rung adds onething to trust.

Mapped

Assisted

Automated

Integrated

Orchestrated

Audit. Proof. Build. Run.

Audit

Proof

Build

Run

Six axes. One numberout of a hundred.

Volume

Repetitiveness

Cost of error

Data messiness

Compliance load

Integration difficulty

Build

Assist, don't automate

Decline, in writing

The lines wewill not cross.

We never automate a payment.

We never automate a change to bank details.

No submission reaches a regulator without explicit human sign-off.

Seven services,mapped to the rungs.

Six principles, neverbent for a deal.

Only build what pays back

Prove the hard part before you commit

Climb one rung at a time

Governance is the structure, not a slide

We say no in writing

Anchor price to your cost, not our rate

The method, answered.

How long does an engagement take, end to end?

What does it cost, and how is it priced?

What if the proof shows it won't work?

Do we need a technical team to work with you?

Will this replace our core systems?

What will you refuse to do?

How do you keep an AI agent safe inside a regulated portal?

Start where youcan see the work.

The North Stack
Ladder.

Each rung adds one
thing to trust.

Six axes. One number
out of a hundred.

The lines we
will not cross.

Seven services,
mapped to the rungs.

Six principles, never
bent for a deal.

Start where you
can see the work.