Risk · QSRA·9 Jun 2026·11 min read

Building a risk-loaded schedule: three-point estimates that mean something

A Monte Carlo engine will produce a beautiful S-curve from any inputs whatsoever. Whether the P80 it prints deserves to be believed depends entirely on work done before the first iteration runs: the state of the network, the honesty of the ranges, and a handful of modelling decisions most workshops skip. This is the practical companion to our explainer on what the simulation actually does.

Step 1: fix the network before you range anything

The simulation recalculates your network exactly as built, several thousand times. Every structural defect is therefore repeated several thousand times, and the defects are not neutral — they are systematically flattering.

An activity with no successor can absorb any sampled overrun without moving the completion date: its risk simply vanishes from the analysis. An activity with no predecessor starts on the data date in every iteration, however late the work feeding it would really run. And constraints are worse, because they don't just leak risk — they clamp the distribution. A Must-Finish-On milestone caps every iteration at the pinned date; the S-curve rises and then goes obediently vertical, and the analysis reports a near-certain finish that is an artefact of the constraint, not a property of the project.

So the first step of a credible QSRA contains no probability at all: run the DCMA 14-point check, close the open ends, justify or delete the constraints, and confirm the network actually transmits delay end to end. If the deterministic float values look wrong — and float is where broken logic shows up first — the simulation will be wrong in the same places, with more decimal places.

The vertical S-curve. If your cumulative curve hits a wall and climbs vertically at one date, you have almost certainly simulated a constraint, not a project. We see this regularly in submitted risk analyses: a P90 that equals the contract date to the day, courtesy of a Must-Finish-On nobody declared. It looks reassuring and means nothing.

Step 2: three-point estimates done honestly

The three-point estimate — minimum, most likely, maximum — is where most QSRAs are quietly won or lost. The mechanics are trivial; the honesty is not.

Where ranges should come from

Reference-class data beats workshop intuition, every time you can get it. If your organisation has actuals from previous projects — how long piling packages of this size really took, across ten jobs — the spread of those actuals is your range, and no further debate is required. The workshop's job is then to argue for adjustments ("this site has better ground conditions"), not to invent numbers from a standing start. Most teams have more historical data than they think; they just store it in the heads of people who are about to retire.

The too-narrow-range disease

Where workshop judgement is unavoidable, it arrives with a well-documented defect: people anchor on the deterministic duration and offer a polite ribbon around it. Ask an engineer for a range and you will get ±10%, almost regardless of the activity. Compare claimed ranges against realised outcomes on almost any portfolio and the truth is closer to −5%/+50%: the downside is short because work rarely finishes much early (and when it can, it isn't reported), while the upside runs long because there are a hundred ways for an activity to go wrong and three ways for it to go right.

Fig 1. The claimed ±10% range versus where outcomes actually land. Symmetric, narrow ranges are an anchoring artefact, not a property of construction work. Synthetic data, indicative.

Three consequences for how you set the points:

Asymmetry is the normal case. A symmetric range on a non-trivial activity should be challenged by default. If the most likely is 20 days, "18–22" is an admission that nobody thought about it; "18–32" at least looks like the real world.
The minimum is not the best case ever recorded. It is a duration that would plausibly occur on this project, with this team, perhaps one time in twenty. Heroic minimums stretch the left tail and quietly drag the P50 earlier.
The maximum is where the courage goes. Ask "what happened the last time this went badly?" rather than "how bad could it get?" — the first question retrieves a memory, the second invites a negotiation.

Triangular or PERT? Mostly: it doesn't matter

The triangular distribution takes your three points literally and linearly; BetaPERT fits a smooth curve that concentrates weight near the most likely and thins the tails, so the same three points produce a slightly tighter spread and a mean pulled less towards the extremes. People argue about the choice with surprising energy. They shouldn't: the difference between triangular and PERT on the same three points is dwarfed by the difference between an honest range and an anchored one. Pick PERT if your maximums are genuinely rare worst cases, triangular if you want the extremes to carry real weight, and spend the saved meeting time on the ranges themselves.

Fig 2. Triangular versus BetaPERT over an identical three-point estimate. The shapes differ; the conclusions rarely do. The range is where the analysis is won.

Step 3: don't range everything

On a 4,000-activity network, eliciting 4,000 bespoke three-point estimates is neither possible nor useful. The standard answer is banding: group activities by risk class — trade, design maturity, procurement route, weather exposure — and assign each band a percentage range (say, −10%/+35% for groundworks, −5%/+15% for proven M&E installation). Reserve bespoke estimates for the twenty or thirty activities that the deterministic critical path, the float profile and your own judgement nominate as the ones that matter. A QSRA with eight honest bands and thirty considered ranges beats one with four thousand copy-pasted ±10%s, and takes a tenth of the time.

Step 4: correlation, the honesty multiplier

Here is the quiet scandal of cheap risk analysis: sampling every activity independently makes the output narrower than the truth. If each duration is drawn separately, one activity's overrun is forever being cancelled by another's early finish, and the extremes average away. But real overruns travel in packs — they share causes. A wet winter hits every earthworks activity at once. A weak subcontractor is slow on all of their packages. Immature design bleeds into every fabrication duration downstream. When the bad versions happen together, the project's bad tail is much worse than independence predicts.

Modelling this properly is a research topic; modelling it adequately is an afternoon. Define correlation groups for the obvious common causes — weather-exposed work, each major subcontractor, design-dependent packages — and set a moderate positive correlation within each group. The S-curve widens at both ends, the P80 moves right, and the analysis stops pretending your risks have agreed to take turns.

Fig 3. Ignoring correlation doesn't make the analysis neutral — it makes it optimistic in a specific, predictable direction. The widened curve is the honest one. Synthetic data, indicative.

Step 5: risk events are not duration uncertainty

Ranging durations captures background uncertainty — the ordinary variability of work you are definitely doing. It does not capture the things that might or might not happen at all: the planning judicial review, the main bearing failure on test, the supplier insolvency. Folding those into wider duration ranges smears a discrete event into a permanent fog and gets both wrong.

Model them instead as risk events: each one carries a probability of occurring and an impact range if it does, attached to the network as a fragnet-style insertion that only exists in the iterations where the dice say it fired. The simulation then handles both layers at once — every iteration samples the background ranges, and some iterations additionally suffer the events. This risk-driver approach also keeps the risk register and the schedule risk analysis honest with each other, which is rarer than it should be: if a register risk can't be expressed as probability, impact and a point of attack in the network, it is a worry, not a risk.

Step 6: a QSRA is a forecast, not a gate deliverable

The least respected step. A risk analysis run once, at sanction, and filed alongside the business case is a photograph of what the team believed before reality started voting. Ranges should tighten as design matures and work completes; risk events should fire, retire or escalate; the completion distribution should narrow update by update, and if it doesn't, that is a finding in itself. Treat the QSRA as part of the update cycle — re-run it when the schedule is statused, and trend the P-levels the same way you trend everything else against a properly managed baseline. A P80 that drifts three weeks to the right across two updates is the earliest, cheapest delay warning you will ever get.

Running the workshop without harvesting nonsense

Finally, the elicitation craft itself — because most bad inputs are manufactured in a single optimistic afternoon:

Interview disciplines separately. In the plenary workshop, the piling contractor will not give a 28-day maximum while the client's project manager maintains eye contact. Small sessions produce wider, truer ranges; the plenary is for reconciling them.
Challenge symmetry on sight. A ±10% range is a default, not an estimate. Ask what the worst realised outcome on comparable work was, and make the range explain why this time is different.
Record the basis of estimate. One line per range: actuals from project X, supplier quote, engineering judgement. Six months later, when the P80 is being argued about, the ranges with a written basis survive and the ones without get re-litigated from scratch.

Input	Good practice	Smell test
Network	14-point check passed; constraints justified; delay propagates end to end	S-curve goes vertical at one date — you've simulated a constraint
Ranges	Reference-class actuals first; asymmetric by default; basis recorded	Every range is ±10% — anchoring, not estimating
Coverage	Risk-class bands plus bespoke ranges on the activities that matter	4,000 identical copy-pasted ranges
Correlation	Groups for weather, subcontractors, design maturity	None applied — distribution suspiciously narrow
Risk events	Probability × impact, fragnet-style, mapped from the live register	Register risks "covered" by fatter duration ranges
Cadence	Re-run each update; trend P50/P80 over time	Last run dated the same month as the gate review

A workable minimum. If the full programme above is out of reach, do three things: fix the network, widen the maximums until they cover the worst comparable outcome you can actually name, and add correlation groups for weather and your two biggest subcontractors. That alone moves you out of decorative-S-curve territory.

Key takeaways

The network comes first: open ends leak risk out of the analysis and constraints clamp the distribution. No range survives broken logic.
Ranges from reference-class actuals beat workshop guesses; where judgement is needed, fight anchoring — honest ranges are wide and asymmetric, more −5%/+50% than ±10%.
Triangular vs PERT matters far less than range honesty. Band big networks by risk class instead of ranging everything.
Independent sampling falsely narrows the output. Simple correlation groups beat none.
Model discrete risks as probability × impact events, separate from background uncertainty — and re-run the analysis every update cycle, not once per gate.

Range it, run it, read it — in your browser

ScheduleInsight's Monte Carlo QSRA runs on your P6 XER or MS Project file entirely in the browser: three-point ranges, S-curve, P-levels and tornado, nothing uploaded. The tutorials walk through a full analysis.

Try it on a sample schedule Read the QSRA tutorial

← PreviousQSRA explained: what Monte Carlo actually does to your schedule Next →Earned schedule: the fix for EVM's broken clock