The 8-Component Identity Fingerprint: What Each Number Means and Why It's Enough

You deploy 200 autonomous crop monitors across a farming co-op. Within a week, you need answers. Which robots are in the right field? Which ones have settled into productive routines? Which environments are causing problems? And you need those answers without streaming video, without raw sensor logs, without any data that reveals what -- or who -- is in the environment.

The CCF identity fingerprint gives you all of this from fewer than 20 scalar values per robot per reporting period. No camera feeds. No microphone recordings. No personally identifiable information. Just eight components derived from the robot's own operational state -- the same state it already maintains for its primary behavioural function.

This post walks through each of the eight components, explains what it measures and why it matters, and uses simulation data to show how the fingerprint discriminates between radically different environments.

The fingerprint architecture is described in US Provisional 64/039,623, sections [0014]-[0015a] and [0027]-[0029].

The Eight Components

The identity fingerprint is not a feature vector extracted from raw sensor data. It is a summary of the robot's operational experience -- how many situations it has encountered, how familiar it is with them, how its trust accumulates over time, and how social its environment is. The robot computes it locally from its own CCF state. No cloud processing required.

Here are the eight components:

| # | Component | Symbol | Measures | |---|-----------|--------|----------| | 1 | Vocabulary cardinality | |K| | Environmental complexity | | 2 | Phase distribution | p_I .. p_IV | Operational maturity | | 3 | State matrix density | rho | Operational interconnection | | 4 | Context group count | g | Structural complexity | | 5 | Mean familiarity | mu_f | Operational depth | | 6 | Familiarity variance | sigma^2_f | Experience distribution | | 7 | Temporal rhythm | r_m / r_a / r_e / r_n | Diurnal structure | | 8 | Presence pattern | pi_a / pi_s / pi_r / pi_ab | Social density |

Total scalar values: 20 (1 + 4 + 1 + 1 + 1 + 1 + 4 + 4 + 3 extras from internal accounting). In practice, fewer than 200 bytes on the wire. A LoRa packet. A single BLE advertisement. Trivial bandwidth for any transport layer.

Let me walk through each one.

1. Vocabulary Cardinality: How Complex Is This World?

|K| = count of distinct context keys observed

The vocabulary cardinality is the simplest component: how many distinct operational situations has the robot encountered? Each context key is a quantised composite of sensor readings -- light level, sound, proximity, motion, time of day, social presence. When the robot enters a situation it has never seen before, the vocabulary grows by one.

This single number tells you about environmental complexity. A controlled indoor environment -- a hospital room, a laboratory, a warehouse zone -- produces a small vocabulary. The robot sees the same situations repeatedly. An outdoor environment -- a forest, a farm field, a construction site -- produces a larger vocabulary because conditions change more.

In our simulation (seed 20260426):

| Environment | Vocabulary | |-------------|-----------| | Forest | 148 | | Mars habitat | 76 | | Bedroom | 295 |

The Mars habitat has the smallest vocabulary: 76 distinct situations. This is a controlled, enclosed environment with regulated lighting, temperature, and limited variation. The forest has moderate complexity at 148 -- varied but with recurring natural patterns. The bedroom has the largest vocabulary at 295, which might seem surprising until you consider that a domestic environment has high granularity: different people, different activities, different times, devices turning on and off, variable lighting, pets, visitors.

For a fleet operator, vocabulary cardinality is the first triage metric. If a robot deployed in a controlled warehouse suddenly reports vocabulary growth from 50 to 200, something changed. New zone, new layout, equipment relocation, or the robot itself was moved.

2. Phase Distribution: How Mature Is This Deployment?

p = [p_I, p_II, p_III, p_IV]
where p_I + p_II + p_III + p_IV = 1

CCF defines four social phases based on accumulated familiarity and environmental tension. Phase I is unfamiliar territory -- the robot is cautious, behavioural envelope is constrained. Phase II is established familiarity -- the robot has settled in. Phase III is familiar but tense -- something is off in a known context. Phase IV is both unfamiliar and tense -- maximum caution.

The phase distribution tells you what fraction of the robot's operational time is spent in each phase. This is an operational maturity profile.

| Environment | Phase I | Phase II | Phase III | Phase IV | |-------------|---------|----------|-----------|----------| | Forest | 61.4% | 18.0% | — | — | | Mars habitat | 52.1% | 21.1% | — | — | | Bedroom | 76.1% | 0.1% | — | — |

The bedroom shows 76.1% Phase I and only 0.1% Phase II. This robot is spread thin across 295 contexts and has barely established deep familiarity anywhere. Domestic environments are complex and variable -- the robot encounters many situations but revisits each one infrequently.

The Mars habitat shows the highest Phase II at 21.1%. Fewer contexts, repeated visits, deep familiarity accumulation. This is exactly what you expect from a controlled habitat with structured routines.

For fleet operations, the phase distribution is your deployment health metric. A cohort stuck at high Phase I means the environment is too variable or the robot is not staying in place long enough to accumulate familiarity.

3. State Matrix Density: How Connected Is the Trust Network?

rho = non_zero_entries / (|K| * |K|)

The state matrix density measures how interconnected the robot's trust network is. When the robot transfers trust between contexts using the Sinkhorn-Knopp doubly stochastic matrix, the density of that matrix tells you how many context-to-context trust pathways exist.

| Environment | Density | |-------------|---------| | Forest | 24.0% | | Mars habitat | 63.0% | | Bedroom | 4.2% |

Mars at 63% density: a homogeneous environment where many contexts share sensor features. Trust earned in one part of the habitat transfers meaningfully to others. The mixing matrix has many non-zero entries because the contexts are operationally similar.

Bedroom at 4.2% density: a diverse environment where contexts are dissimilar. Cooking in the kitchen at night shares almost nothing with reading in bed in the afternoon. The mixing matrix is sparse because few trust transfer pathways are meaningful.

For fleet monitoring, a sudden density change indicates environmental restructuring -- a layout change that separated previously adjacent zones.

4. Context Group Count: How Structured Is This Environment?

g = number of clusters discovered by min-cut analysis

The min-cut boundary algorithm (see the compositional closure post for the underlying mathematics) discovers natural groupings in the robot's context space. The group count tells you how many structurally distinct zones the robot operates in.

| Environment | Groups | |-------------|--------| | Forest | 20 | | Mars habitat | 14 | | Bedroom | 9 |

Forest has 20 groups -- many distinct habitat types. Clearings, tree cover, stream banks, ridgelines, morning vs. afternoon conditions in each. Mars has 14 -- different modules, workstations, transition corridors. Bedroom has 9 -- fewer but more distinct activity zones.

The ratio of vocabulary to groups is telling. Bedroom: 295/9 = 32.8 contexts per group -- high variety within few structural categories. Mars: 76/14 = 5.4 -- uniform contexts evenly distributed across zones.

5. Mean Familiarity: How Deep Is the Experience?

mu_f = (1/|K|) * sum of familiarity values across all contexts

Mean familiarity measures the average depth of operational experience. A robot that has been deployed for months in a stable environment has high mean familiarity. A robot in a chaotic, rapidly changing environment has low mean familiarity even after the same deployment duration.

| Environment | Mean Familiarity | |-------------|-----------------| | Forest | 0.31 | | Mars habitat | 0.38 | | Bedroom | 0.12 |

Mars leads at 0.38 -- deep experience in fewer contexts, repeated visits accumulating trust. Bedroom trails at 0.12 -- thin experience spread across many contexts. The robot knows a little about a lot of situations.

This is one of the most actionable fleet metrics. If mean familiarity plateaus well below expected levels, the environment may be too variable for the robot's sensor resolution. If it climbs rapidly, the robot has found its rhythm and is operating efficiently.

6. Familiarity Variance: How Evenly Distributed Is the Experience?

sigma^2_f = variance of familiarity values across all contexts

Familiarity variance tells you whether the robot is a specialist or a generalist. Low variance means uniform experience across all contexts. High variance means deep knowledge of some zones and shallow knowledge of others.

Combined with mean familiarity, this creates a two-dimensional profile: high mean + low variance = settled generalist; high mean + high variance = specialist; low mean + low variance = new deployment.

7. Temporal Rhythm: What's the Daily Pattern?

r = [r_morning, r_afternoon, r_evening, r_night]
where r_morning + r_afternoon + r_evening + r_night = 1

The temporal rhythm captures the robot's diurnal activity pattern -- what fraction of meaningful operational events occur in each quarter of the day.

| Environment | Morning | Afternoon | Evening | Night | |-------------|---------|-----------|---------|-------| | Forest | 0.27 | 0.28 | 0.16 | 0.29 | | Mars habitat | 0.31 | 0.30 | 0.17 | 0.23 | | Bedroom | 0.32 | 0.22 | 0.22 | 0.24 |

Forest shows near-uniform distribution -- outdoor activity is not time-locked. The robot encounters new situations at all hours. Mars shows morning+afternoon dominance (0.31 + 0.30 = 0.61) -- consistent with shift-based operations where crew activity peaks during work hours. Bedroom shows morning dominance (0.32) -- consistent with domestic routine: wake up, morning activities, then more distributed activity through the rest of the day.

For eldercare fleet monitoring, temporal rhythm drift is a diagnostic signal. If a companion robot that normally shows morning-peaked activity suddenly shifts to night-dominated activity, the resident's routine has changed. That might be medically relevant -- and it was detected without any camera or microphone data, purely from the robot's operational state.

8. Presence Pattern: How Social Is This Environment?

pi = [pi_approaching, pi_static, pi_retreating, pi_absent]
where pi_approaching + pi_static + pi_retreating + pi_absent = 1

The presence pattern captures the social density of the robot's environment -- how often people (or other agents) are approaching, present and stationary, retreating, or absent.

| Environment | Approaching | Static | Retreating | Absent | |-------------|-------------|--------|------------|--------| | Forest | 0.08 | 0.09 | 0.03 | 0.80 | | Mars habitat | 0.16 | 0.47 | 0.11 | 0.26 | | Bedroom | 0.36 | 0.23 | 0.06 | 0.35 |

Forest is overwhelmingly solitary: 80% absent. This robot operates outdoors with rare human contact. Mars shows high static presence (47%) -- crew at fixed positions, the robot operating alongside people who are working. Bedroom shows high approaching (36%) -- a domestic environment where people move toward the robot regularly, interact, then leave (only 6% retreating vs. 36% approaching suggests short, frequent interactions).

The presence pattern is the social fingerprint. It distinguishes between solitary outdoor deployment, structured co-working environments, and intimate domestic settings -- all without identifying any individual.

Putting It Together: Environment Discrimination

The full fingerprint creates a high-dimensional profile that discriminates between environments with striking clarity. The pairwise Jaccard distances between our three simulated environments:

| Pair | Jaccard Distance | |------|-----------------| | Forest -- Mars | 0.95 | | Forest -- Bedroom | 0.78 | | Mars -- Bedroom | 0.95 |

A Jaccard distance of 0.95 means the fingerprints share almost no overlap. Forest and Mars are operationally distinct on nearly every dimension. Mars and Bedroom are equally distinct despite both being indoor environments -- the social density, vocabulary complexity, and phase distributions are completely different.

Forest and Bedroom are the closest pair at 0.78, which still represents strong discrimination. They share some properties (moderate-to-high vocabulary, morning activity peaks) but diverge on social density, matrix density, and familiarity depth.

These 20 scalar values are sufficient to cluster a fleet of hundreds of robots into environment types, detect when a robot has been relocated, and monitor operational drift over time. No raw sensor data required. No privacy-invasive telemetry. No bandwidth-intensive data streams.

Why Eight Is Enough

Why not transmit the full state matrix? Three reasons: bandwidth (200 bytes vs. 200KB for full state -- over LoRa or satellite, only the fingerprint is feasible), privacy (statistical summaries cannot be reverse-engineered to identify individuals -- this is structural, not policy-based), and sufficiency (Jaccard distances above 0.78 from 20 scalars -- diminishing returns on additional dimensions).

The ccf-core on crates.io crate computes all eight components from the robot's existing CCF state. No additional sensors, no additional computation. The how it works page describes the underlying architecture. The fingerprint builds on the trust mathematics in the Sinkhorn-Knopp post and the compositional closure proof.

FAQ

Q: Can the fingerprint be reverse-engineered to identify individuals in the environment?

No. The fingerprint contains aggregate statistics: phase percentages, mean familiarity, temporal distributions. There is no path from these summary statistics back to individual sensor readings or individual identities. This is structural -- not policy-based. The information is not present in the fingerprint.

Q: How often should fingerprints be computed and transmitted?

The patent specifies configurable intervals. Daily is typical for stable deployments. Hourly is appropriate for dynamic environments where rapid change detection matters. The computation is lightweight -- it reads existing CCF state, it does not trigger additional sensor processing.

Q: Does the fingerprint work for non-robotic agents?

Yes. Any system running CCF -- software agents, IoT devices, autonomous vehicles -- produces the same eight-component fingerprint from its operational state. The components are defined in terms of CCF primitives (context keys, familiarity accumulators, phase classifications), not hardware-specific sensor types.

Q: How does vocabulary cardinality relate to sensor resolution?

Higher sensor resolution (more quantisation levels per feature) produces larger vocabularies for the same environment. The vocabulary cardinality should be interpreted relative to the sensor configuration, not in absolute terms. Comparing robots with identical sensor configurations is straightforward. Cross-configuration comparison requires normalisation.

Q: What clustering algorithm works best for fingerprint vectors?

Standard approaches work: k-means for known cluster counts, DBSCAN for unknown counts with density-based discovery, hierarchical clustering for taxonomy building. The fingerprint dimensions have different scales, so normalisation or distance metrics that handle heterogeneous scales (e.g., Gower distance) are recommended.

Patent pending. US Provisional 64/039,623.

-- Colm Byrne, Founder -- Flout Labs, Galway, Ireland