Building a risk-loaded schedule: three-point estimates that mean something
A Monte Carlo engine will produce a beautiful S-curve from any inputs whatsoever. Whether the P80 it prints deserves to be believed depends entirely on work done before the first iteration runs: the state of the network, the honesty of the ranges, and a handful of modelling decisions most workshops skip. This is the practical companion to our explainer on what the simulation actually does.
Step 1: fix the network before you range anything
The simulation recalculates your network exactly as built, several thousand times. Every structural defect is therefore repeated several thousand times, and the defects are not neutral — they are systematically flattering.
An activity with no successor can absorb any sampled overrun without moving the completion date: its risk simply vanishes from the analysis. An activity with no predecessor starts on the data date in every iteration, however late the work feeding it would really run. And constraints are worse, because they don't just leak risk — they clamp the distribution. A Must-Finish-On milestone caps every iteration at the pinned date; the S-curve rises and then goes obediently vertical, and the analysis reports a near-certain finish that is an artefact of the constraint, not a property of the project.
So the first step of a credible QSRA contains no probability at all: run the DCMA 14-point check, close the open ends, justify or delete the constraints, and confirm the network actually transmits delay end to end. If the deterministic float values look wrong — and float is where broken logic shows up first — the simulation will be wrong in the same places, with more decimal places.
Step 2: three-point estimates done honestly
The three-point estimate — minimum, most likely, maximum — is where most QSRAs are quietly won or lost. The mechanics are trivial; the honesty is not.
Where ranges should come from
Reference-class data beats workshop intuition, every time you can get it. If your organisation has actuals from previous projects — how long piling packages of this size really took, across ten jobs — the spread of those actuals is your range, and no further debate is required. The workshop's job is then to argue for adjustments ("this site has better ground conditions"), not to invent numbers from a standing start. Most teams have more historical data than they think; they just store it in the heads of people who are about to retire.
The too-narrow-range disease
Where workshop judgement is unavoidable, it arrives with a well-documented defect: people anchor on the deterministic duration and offer a polite ribbon around it. Ask an engineer for a range and you will get ±10%, almost regardless of the activity. Compare claimed ranges against realised outcomes on almost any portfolio and the truth is closer to −5%/+50%: the downside is short because work rarely finishes much early (and when it can, it isn't reported), while the upside runs long because there are a hundred ways for an activity to go wrong and three ways for it to go right.
Three consequences for how you set the points:
- Asymmetry is the normal case. A symmetric range on a non-trivial activity should be challenged by default. If the most likely is 20 days, "18–22" is an admission that nobody thought about it; "18–32" at least looks like the real world.
- The minimum is not the best case ever recorded. It is a duration that would plausibly occur on this project, with this team, perhaps one time in twenty. Heroic minimums stretch the left tail and quietly drag the P50 earlier.
- The maximum is where the courage goes. Ask "what happened the last time this went badly?" rather than "how bad could it get?" — the first question retrieves a memory, the second invites a negotiation.
Triangular or PERT? Mostly: it doesn't matter
The triangular distribution takes your three points literally and linearly; BetaPERT fits a smooth curve that concentrates weight near the most likely and thins the tails, so the same three points produce a slightly tighter spread and a mean pulled less towards the extremes. People argue about the choice with surprising energy. They shouldn't: the difference between triangular and PERT on the same three points is dwarfed by the difference between an honest range and an anchored one. Pick PERT if your maximums are genuinely rare worst cases, triangular if you want the extremes to carry real weight, and spend the saved meeting time on the ranges themselves.
Step 3: don't range everything
On a 4,000-activity network, eliciting 4,000 bespoke three-point estimates is neither possible nor useful. The standard answer is banding: group activities by risk class — trade, design maturity, procurement route, weather exposure — and assign each band a percentage range (say, −10%/+35% for groundworks, −5%/+15% for proven M&E installation). Reserve bespoke estimates for the twenty or thirty activities that the deterministic critical path, the float profile and your own judgement nominate as the ones that matter. A QSRA with eight honest bands and thirty considered ranges beats one with four thousand copy-pasted ±10%s, and takes a tenth of the time.
Step 4: correlation, the honesty multiplier
Here is the quiet scandal of cheap risk analysis: sampling every activity independently makes the output narrower than the truth. If each duration is drawn separately, one activity's overrun is forever being cancelled by another's early finish, and the extremes average away. But real overruns travel in packs — they share causes. A wet winter hits every earthworks activity at once. A weak subcontractor is slow on all of their packages. Immature design bleeds into every fabrication duration downstream. When the bad versions happen together, the project's bad tail is much worse than independence predicts.
Modelling this properly is a research topic; modelling it adequately is an afternoon. Define correlation groups for the obvious common causes — weather-exposed work, each major subcontractor, design-dependent packages — and set a moderate positive correlation within each group. The S-curve widens at both ends, the P80 moves right, and the analysis stops pretending your risks have agreed to take turns.
Step 5: risk events are not duration uncertainty
Ranging durations captures background uncertainty — the ordinary variability of work you are definitely doing. It does not capture the things that might or might not happen at all: the planning judicial review, the main bearing failure on test, the supplier insolvency. Folding those into wider duration ranges smears a discrete event into a permanent fog and gets both wrong.
Model them instead as risk events: each one carries a probability of occurring and an impact range if it does, attached to the network as a fragnet-style insertion that only exists in the iterations where the dice say it fired. The simulation then handles both layers at once — every iteration samples the background ranges, and some iterations additionally suffer the events. This risk-driver approach also keeps the risk register and the schedule risk analysis honest with each other, which is rarer than it should be: if a register risk can't be expressed as probability, impact and a point of attack in the network, it is a worry, not a risk.
Step 6: a QSRA is a forecast, not a gate deliverable
The least respected step. A risk analysis run once, at sanction, and filed alongside the business case is a photograph of what the team believed before reality started voting. Ranges should tighten as design matures and work completes; risk events should fire, retire or escalate; the completion distribution should narrow update by update, and if it doesn't, that is a finding in itself. Treat the QSRA as part of the update cycle — re-run it when the schedule is statused, and trend the P-levels the same way you trend everything else against a properly managed baseline. A P80 that drifts three weeks to the right across two updates is the earliest, cheapest delay warning you will ever get.
Running the workshop without harvesting nonsense
Finally, the elicitation craft itself — because most bad inputs are manufactured in a single optimistic afternoon:
- Interview disciplines separately. In the plenary workshop, the piling contractor will not give a 28-day maximum while the client's project manager maintains eye contact. Small sessions produce wider, truer ranges; the plenary is for reconciling them.
- Challenge symmetry on sight. A ±10% range is a default, not an estimate. Ask what the worst realised outcome on comparable work was, and make the range explain why this time is different.
- Record the basis of estimate. One line per range: actuals from project X, supplier quote, engineering judgement. Six months later, when the P80 is being argued about, the ranges with a written basis survive and the ones without get re-litigated from scratch.
| Input | Good practice | Smell test |
|---|---|---|
| Network | 14-point check passed; constraints justified; delay propagates end to end | S-curve goes vertical at one date — you've simulated a constraint |
| Ranges | Reference-class actuals first; asymmetric by default; basis recorded | Every range is ±10% — anchoring, not estimating |
| Coverage | Risk-class bands plus bespoke ranges on the activities that matter | 4,000 identical copy-pasted ranges |
| Correlation | Groups for weather, subcontractors, design maturity | None applied — distribution suspiciously narrow |
| Risk events | Probability × impact, fragnet-style, mapped from the live register | Register risks "covered" by fatter duration ranges |
| Cadence | Re-run each update; trend P50/P80 over time | Last run dated the same month as the gate review |
Key takeaways
- The network comes first: open ends leak risk out of the analysis and constraints clamp the distribution. No range survives broken logic.
- Ranges from reference-class actuals beat workshop guesses; where judgement is needed, fight anchoring — honest ranges are wide and asymmetric, more −5%/+50% than ±10%.
- Triangular vs PERT matters far less than range honesty. Band big networks by risk class instead of ranging everything.
- Independent sampling falsely narrows the output. Simple correlation groups beat none.
- Model discrete risks as probability × impact events, separate from background uncertainty — and re-run the analysis every update cycle, not once per gate.
Range it, run it, read it — in your browser
ScheduleInsight's Monte Carlo QSRA runs on your P6 XER or MS Project file entirely in the browser: three-point ranges, S-curve, P-levels and tornado, nothing uploaded. The tutorials walk through a full analysis.