← Back to blog
June 23, 2026Colm Byrne

'It Would Have Been Warmer If...': Counterfactual Explanations From Robot State

A teacher in a special education classroom watches the companion robot interact with a new student. The student approaches. The robot dims its LED, reduces servo amplitude, and stays still. The teacher asks: "Why is it being so reserved? It was friendly with the other students."

You could show her the causation packet. Twelve fields. C_ctx = 0.09 for this student's context. Phase I. Low output envelope. Technically correct, but it does not answer her real question. Her question is not "what are the numbers." Her question is: "what would need to be different for it to be warmer?"

This is a counterfactual question. And it requires a different mechanism than the causation trail alone can provide.

The counterfactual explanation system combines two CCF components: the causation packet (Claims AL-AO, sections [E7-0004] through [E7-0006] of US Provisional 64/039,655) and the shadow simulator (Claims AM-AN, sections [E7-0007] through [E7-0008]). Together, they produce differential explanations -- precise statements about what would have changed if one factor had been different, grounded in the actual state of the system at the moment of action.

The Shadow Simulator

The shadow simulator is a read-only copy of the CCF state. It does not affect the live system. It does not accumulate trust. It does not produce motor commands. It exists solely to answer "what if" questions.

The process:

Step 1. Take the causation packet for a specific event. This packet contains all 12 fields that determined the robot's behaviour at that moment -- the context key, the coherence values, the bridge state, the conflict resolution, the output envelope. The packet is a complete snapshot of the decision mechanism.

Step 2. Copy the relevant CCF state into the shadow workspace. This includes the coherence accumulators, the active bridge structures, the compiled routine registry, and the current sensor quantisation tables. The copy is a deep clone -- modifications to the shadow workspace do not affect the live system.

Step 3. Modify ONE factor in the shadow workspace. Change C_ctx to a higher value. Or remove the sponsor bridge. Or change the social phase boundary. Or add a compiled routine. Only one factor changes. Everything else remains identical.

Step 4. Re-evaluate the behaviour. Run the shadow state through the same computation graph: minimum gate, bridge adjustment, phase classification, conflict resolution, envelope computation. The shadow workspace produces a hypothetical output envelope.

Step 5. Compare. The actual output and the hypothetical output differ by a specific amount, attributable to the single factor that was changed. This difference IS the counterfactual explanation.

For the teacher's question, the shadow simulator modifies C_ctx:

ACTUAL STATE:
  C_ctx = 0.09, C_inst = 0.55, C_eff = min(0.55, 0.09) = 0.09
  output: {motor: 0.09, LED: 0.12, audio: 0.08, verbal: 0.06}

SHADOW STATE (C_ctx raised to 0.45):
  C_ctx = 0.45, C_inst = 0.55, C_eff = min(0.55, 0.45) = 0.45
  output: {motor: 0.45, LED: 0.50, audio: 0.42, verbal: 0.38}

The differential:

delta_motor = 0.45 - 0.09 = +0.36
delta_LED   = 0.50 - 0.12 = +0.38
delta_audio = 0.42 - 0.08 = +0.34
delta_verbal = 0.38 - 0.06 = +0.32

The natural-language translation: "The robot stayed reserved because active-context familiarity was low. With current environmental stability held fixed, a higher familiarity value would have enabled a warmer greeting."

This is not a generated rationalisation. It is a state-based re-evaluation. The shadow workspace performed the same computation with one input changed. The difference in output is the causal contribution of that input.

The Discard Guarantee

The shadow workspace is destroyed after use. This is not a policy. It is a structural constraint. The shadow workspace operates on cloned state, not on references to live state. When the counterfactual computation completes, the cloned accumulators, bridges, and routines are deallocated. No trust was accumulated in the shadow. No coherence was transferred. No bridge was created or decayed. The live system continued running during the shadow computation and was not affected by it.

This matters because trust contamination is the central risk of counterfactual systems. If exploring "what would happen with higher trust" actually increased trust, the system would be gaming itself. The discard guarantee prevents this. The shadow simulator is a pure function: state in, hypothetical output out, state destroyed.

For the mathematical proof that trust cannot be created from nothing in the live system, see The Trust Farming Impossibility Result.

Five Types of Counterfactual Explanation

The shadow simulator supports single-factor modifications across five categories. Each produces a different class of explanation.

1. Coherence Counterfactual

Modify C_ctx. Hold everything else fixed. The explanation takes the form: "The robot was reserved because familiarity with this context was X. With familiarity at Y, the behaviour would have been Z."

This is the most common explanation request. When someone asks "why isn't it warmer?", the answer is almost always: not enough accumulated trust. The counterfactual quantifies exactly how much trust is needed for the desired behaviour.

threshold_for_Phase_II = 0.35 + hysteresis / 2
current_C_ctx = 0.09
interactions_needed = (threshold_for_Phase_II - current_C_ctx) / avg_delta_per_interaction

For a typical learning rate and interaction frequency, this might be: "Approximately 50-60 positive interactions over 3-5 days would raise familiarity to the Phase II threshold." The teacher now has a concrete timeline. Not "it will get better eventually." This many interactions, this many days.

2. Bridge Counterfactual

Add or remove a sponsor bridge. The explanation: "The robot's hesitation was partly compensated by the sponsor bridge from [sponsor]. Without the bridge, retreat behaviour would likely have occurred."

Using the worked example from the patent (Example 4, section [E8-0005a]):

WITH BRIDGE:
  C_S = 0.82 (sponsor coherence)
  C_T = 0.05 (target coherence)
  q_event = 0.85 (verbal introduction + sustained co-presence)
  b_0 = min(0.30, 0.50 * 0.85 * 0.82 * 0.95) = min(0.30, 0.331) = 0.30
  C_eff_adjusted = C_eff + bridge_contribution = 0.14 + 0.072 = 0.212
  output: {motor: 0.21, LED: 0.25, audio: 0.18, verbal: 0.15}

WITHOUT BRIDGE (shadow):
  C_eff = 0.14
  output: {motor: 0.14, LED: 0.17, audio: 0.12, verbal: 0.10}

DIFFERENCE:
  The sponsor bridge added 0.072 to effective coherence.
  Motor output increased by 50%.
  The bridge is responsible for the difference between
  "barely moving" and "cautiously approaching."

The bridge counterfactual is particularly useful in eldercare contexts, where a caregiver wants to understand whether their introduction of the robot to a new person is having an effect. The answer is not qualitative. It is: the bridge contributed exactly this much to the robot's willingness to engage. For the full sponsor bridge mechanism, see A Child Introduces a New Caregiver.

3. Tension Counterfactual

Modify the tension level. Hold coherence fixed. The explanation: "The robot became protective because environmental tension was high. In a calmer environment with the same familiarity, it would have remained in companion mode."

This is useful during transitions -- a patient being moved between hospital wards, a child arriving at a new classroom, a resident being introduced to a new wing of a care facility. The tension counterfactual separates "the environment is stressful" from "the robot doesn't know this person." Often, the behaviour people attribute to unfamiliarity is actually caused by environmental instability.

4. Conflict Counterfactual

Remove the pathway conflict. The explanation: "The robot hesitated because its two processing pathways disagreed. If both had agreed, the delay would not have occurred."

For the full mechanism of reflexive-deliberative conflict, see When the Robot Hesitates. The conflict counterfactual quantifies the cost of the hesitation:

WITH CONFLICT (divergence = 0.4):
  amplitude_reduction = baseline * (1.0 - 0.4) = 0.60 * baseline
  hesitation_duration = 2.1 seconds

WITHOUT CONFLICT (shadow):
  amplitude = baseline
  hesitation_duration = 0 seconds

COST OF UNCERTAINTY:
  40% amplitude reduction for 2.1 seconds
  Equivalent to approximately 0.84 seconds of full-amplitude operation

5. Routine Counterfactual

Add or modify a compiled routine. The explanation: "The robot had no compiled greeting routine for this context. If a greeting had been available, it would have produced these outputs instead."

This counterfactual guides the configuration process. A deployment engineer can ask: "What would happen if I compiled a greeting routine for the therapy room?" The shadow simulator shows the hypothetical output. If the result is desirable, the engineer compiles the routine. If not, they adjust parameters before deployment.

Grounded, Not Generated

The critical distinction between CCF counterfactuals and language-model-generated explanations is groundedness. A language model asked "why did the robot back away?" produces a plausible narrative. The narrative might be accurate. It might not. There is no formal relationship between the generated text and the actual computation that produced the behaviour.

A CCF counterfactual is formally grounded in the system's state. The shadow simulator uses the same computation graph as the live system. The only difference is one input. The output difference is the exact causal contribution of that input, computed through the same equations, with the same quantisation tables, with the same phase boundaries.

The explanation is not "we think the robot backed away because of low trust." The explanation is: "C_ctx was 0.14. With C_ctx at 0.45, motor output would have been 0.45 instead of 0.14. The difference is 0.31 motor amplitude units, attributable entirely to the coherence deficit in this context."

No narrative. No plausibility assessment. No "the model probably considered." Just: this number was this. If it had been that, the output would have been that. Computed, not generated.

The Eldercare Scenario

Mrs. Hennessy's family visits on Saturdays. They have noticed that the companion robot is noticeably more animated when their mother's regular aide, Sinead, is present, but becomes reserved when the weekend aide, Roisin, arrives. The family asks: "Does the robot not like Roisin?"

The counterfactual analysis runs three shadow evaluations:

Shadow 1: Remove Sinead's coherence. Set Sinead's context coherence to 0. Result: motor output drops by 0.28. The robot is animated partly because of Sinead's accumulated trust.

Shadow 2: Raise Roisin's coherence to Sinead's level. Result: motor output increases by 0.31. If Roisin had the same familiarity as Sinead, the robot would behave nearly identically.

Shadow 3: Add a sponsor bridge from Mrs. Hennessy to Roisin. With Mrs. Hennessy's coherence at 0.79 and a verbal introduction quality of 0.85:

b_0 = min(0.30, 0.50 * 0.85 * 0.79 * 0.92) = min(0.30, 0.309) = 0.30

Result: motor output increases by 0.15 immediately. The robot would be moderately more engaged with Roisin if Mrs. Hennessy explicitly introduces them.

The explanation for the family: "The robot is not expressing a preference. It has accumulated more familiarity with Sinead through more interactions. Roisin is newer. If Mrs. Hennessy introduces Roisin to the robot while present, the robot will warm up faster. In approximately 3-4 weeks of regular weekend visits, Roisin's familiarity will approach Sinead's."

This is a specific, actionable, reassuring answer. It is not "the algorithm works in mysterious ways." It is: this number is lower, here is how to raise it, here is how long it will take.

The School Scenario

A child with autism has been working with the companion robot in a resource room. The child is transitioning to a mainstream classroom. The special education coordinator wants to know: will the robot's behaviour change in the new room?

Shadow evaluation: copy the current CCF state. Replace the context key with the mainstream classroom's sensor signature (brighter lights, more ambient noise, more proximity events from other children).

RESOURCE ROOM:
  C_inst = 0.78 (quiet, consistent environment)
  C_ctx = 0.62 (many sessions of accumulated trust)
  C_eff = min(0.78, 0.62) = 0.62
  tension = 0.12

MAINSTREAM CLASSROOM (shadow):
  C_inst = 0.41 (noisier, more variable environment)
  C_ctx = 0.00 (new context, no accumulated trust)
  C_eff = min(0.41, 0.00) = 0.00
  tension = 0.38

The robot would start from zero in the new room. Phase I. Minimal output. The coordinator now knows: the transition needs scaffolding. Options: (1) bring the robot to the new classroom in advance for acclimatisation visits, (2) have the child introduce the robot to the new context (sponsor bridge from a trusted entity), (3) accept a 2-3 week re-familiarisation period.

The counterfactual made the transition plan concrete before the child experienced any disruption.

Combining Causation and Counterfactual

The causation packet tells you WHY. The counterfactual tells you WHAT IF. Together, they form a complete explanation system:

  1. An event occurs. The causation packet records the 12-field causal chain.
  2. A stakeholder asks "why?" The packet answers directly.
  3. The stakeholder asks "what would change this?" The shadow simulator evaluates one-factor modifications.
  4. The stakeholder receives a grounded, differential answer with specific numbers and timelines.
  5. The shadow state is discarded. No contamination.

This is the explanatory architecture described in sections [E7-0007] through [E7-0008] of the patent. It is not a bolted-on explainability layer. It is the system explaining itself, through its own computational mechanism, with formal guarantees about the accuracy of the explanation.

The full implementation is available in ccf-core on crates.io. For the causation packet structure, see Why Did the Robot Back Away?. For privacy protections on the explanation system, see Explainable But Private.


FAQ

Can the shadow simulator evaluate multiple factor changes simultaneously?

The standard protocol is single-factor modification -- change one input, measure the difference. This isolates the causal contribution of each factor. Multi-factor modifications are possible but produce interaction effects that are harder to interpret. If you change both C_ctx and tension simultaneously, the output difference is not the sum of the individual differences because the minimum gate is nonlinear. For deployment, single-factor counterfactuals are recommended. For research, multi-factor analysis can be enabled with appropriate caveats about interaction effects.

How long does a shadow evaluation take?

The shadow simulator runs the same computation graph as the live system. On an ARM Cortex-M microcontroller, a single evaluation takes under 2 milliseconds. The copy-modify-evaluate-discard cycle completes in under 5 milliseconds total. This is fast enough for real-time use -- a dashboard can display counterfactual explanations as events occur, with less than one tick of latency.

Can counterfactual explanations be generated for historical events?

Yes, provided the causation packet chain is intact. Each packet contains the full state needed for shadow evaluation. A compliance auditor reviewing an event from three months ago can load the packet, reconstruct the shadow workspace, and evaluate counterfactuals against the historical state. The cryptographic hash chain ensures the packet has not been tampered with. For the tamper-evident chain mechanism, see Why Did the Robot Back Away?.

Does the shadow simulator account for the Sinkhorn-Knopp mixing step?

Yes. The shadow workspace includes a copy of the mixing matrix. When the modified factor affects coherence values that participate in mixing (e.g., raising one context's coherence changes the doubly stochastic distribution), the shadow evaluation re-runs the Sinkhorn-Knopp projection. The counterfactual output reflects the full downstream effect of the modification, including cross-context trust redistribution.

Could someone use counterfactual analysis to find the minimum trust needed to bypass a safety constraint?

The counterfactual analysis reveals the exact coherence threshold for each phase transition. This is by design -- the thresholds are not secrets. They are published parameters of the CCF configuration. The safety guarantee is not that thresholds are hidden but that trust cannot be created from nothing. Knowing that Phase II requires C_ctx of 0.35 does not help an adversary achieve C_ctx of 0.35 any faster. The accumulation dynamics are bounded by real interactions over real time. For the impossibility proof, see The Trust Farming Impossibility Result.


Patent pending. US Provisional 64/039,655.

-- Colm Byrne, Founder -- Flout Labs, Galway, Ireland