The 1,110:1 Privacy Ratio: Why This Fleet Monitoring System Can't Spy On Your Patients
Every fleet monitoring vendor will tell you they take privacy seriously. They have a privacy policy. They encrypt data in transit. They anonymise logs. They have SOC 2 Type II.
None of that matters.
Encryption protects data in transit and at rest. It does not help when the data is decrypted for analysis — which is the entire point of collecting it. Anonymisation is reversible in most real-world deployments. De-identification of time-series sensor data has been broken repeatedly in the literature. SOC 2 certifies process compliance, not mathematical impossibility.
The question is not "do you promise not to spy?" The question is "can you spy if you want to?" If the answer is yes — if the raw data exists in any form at any point in the pipeline — then the privacy guarantee depends on policy, not structure. Policies change. Employees are compromised. Subpoenas arrive. Data brokers are persistent.
The CCF identity fingerprint takes a different approach. The privacy is not a promise. It is a ratio.
The Dimensionality Argument
A robot operating in a real environment maintains an operational state with the following dimensions, as defined in patent section [0012a]:
D = |K| + |K|^2 + |K| + 1
Where |K| is the vocabulary cardinality — the number of distinct environmental contexts the robot has encountered. The terms represent:
|K|familiarity accumulator values (one per context)|K|^2pairwise context transition weights (the state matrix)|K|phase classification parameters1global operational state scalar
For our three simulated environments (seed 20260426):
| Environment | |K| | Full State D | Fingerprint d | Ratio D:d | |---|---|---|---|---| | Forest | 148 | 22,201 | ≤ 20 | 1,110:1 | | Mars | 76 | 5,929 | ≤ 20 | 296:1 | | Bedroom | 295 | 87,616 | ≤ 20 | 4,381:1 |
The forest robot's full operational state has 22,201 dimensions. The fingerprint compresses this to at most 20 scalar values. That is a 1,110-to-one compression ratio.
This compression is not lossless. It is not designed to be. The fingerprint is a many-to-one projection. It maps from a 22,201-dimensional space to a 20-dimensional space. The mapping is irreversible by construction. You cannot reconstruct 22,201 values from 20 values. Not with unlimited compute. Not with quantum computers. Not with the full algorithm source code. The information does not exist in the output.
This is the privacy guarantee. Not "we won't look." Instead: "the data to look at does not exist outside the device."
What the Fingerprint Preserves and What It Destroys
The fingerprint preserves statistical structure: how many contexts, how interconnected, how familiar, what temporal pattern. These are the properties a fleet operator needs.
The fingerprint destroys: which specific contexts, what sensor readings defined them, when specific transitions occurred, who or what was present during any specific interaction. These are the properties that constitute personal data.
This is not a design trade-off. It is a mathematical consequence of the projection. The eight fingerprint components from [0012b] are summary statistics:
Vocabulary cardinality: |K| (a count)
Phase distribution: p_I, p_II, p_III, p_IV (four proportions summing to 1)
State matrix density: rho = nnz / |K|^2 (a ratio)
Context group count: g (a count)
Mean familiarity: mu_f = (1/|K|) * sum(f_k) (an average)
Familiarity variance: sigma^2_f = (1/|K|) * sum((f_k - mu_f)^2) (a variance)
Temporal rhythm: (m, a, e, n) (four proportions)
Presence pattern: (a, s, r, ab) (four proportions)
Every one of these is an aggregate. A count, a proportion, a mean, a variance. Aggregates destroy individual records. The mean familiarity of 0.31 does not tell you the familiarity of any specific context. The phase distribution of 61.4% Phase I does not tell you which contexts are in Phase I. The temporal rhythm does not tell you what happened at 3am on Tuesday.
The Adversary Models
Patent section [0012c] defines two adversary tiers. This is how we reason about what an attacker can learn from a captured fingerprint.
Tier 1: Knows Deployment Context, Not Algorithm
The Tier 1 adversary knows that the robot is deployed in an eldercare facility. They know the facility's layout, staffing schedule, and resident population. They capture the fingerprint: |K|=148, p_I=0.614, rho=0.24, g=20, mu_f=0.31, sigma^2_f=0.04, rhythm=(0.22, 0.31, 0.28, 0.19), presence=(0.15, 0.45, 0.12, 0.28).
What can they learn? The robot has encountered 148 distinct environmental contexts. It is mostly in Phase I (low familiarity). Its environment has moderate interconnection and 20 structural clusters. These are properties of the deployment, not of any individual.
What can they NOT learn? Which rooms. Which residents. Which times residents were present. Whether Mrs. O'Brien was in the common room at 2pm or in her bedroom. The Tier 1 adversary cannot extract individual-level information because the fingerprint does not contain individual-level information.
Tier 2: Knows Algorithm, Sensor Types, Quantisation Scheme
The Tier 2 adversary is maximally informed. They have the full CCF source code. They know the sensor types (camera, microphone, infrared, touch, accelerometer). They know the quantisation scheme that maps raw sensor readings to context keys. They capture the same fingerprint.
Can they reconstruct the operational state?
No. The fingerprint provides at most 14 independent scalar constraints on the D-dimensional operational state [0012b]. For the forest environment:
Independent constraints from fingerprint: 14
Dimensions of full operational state: 22,201
Unconstrained degrees of freedom: 22,187
Twenty-two thousand one hundred and eighty-seven degrees of freedom are unconstrained by the fingerprint. The Tier 2 adversary, with full knowledge of the algorithm, can narrow the space of possible operational histories to a 22,187-dimensional subspace. Every point in that subspace is equally consistent with the observed fingerprint.
Can they determine who was present? No. The presence pattern (0.15, 0.45, 0.12, 0.28) tells them that 45% of operational time involved static nearby presence. It does not tell them who, when, or where.
Can they recover temporal sequences? No. The temporal rhythm (0.22, 0.31, 0.28, 0.19) tells them that 31% of activity was in the afternoon. It does not tell them what happened during any specific afternoon.
Can they distinguish two operational histories that produce the same fingerprint? No. The many-to-one projection means that an astronomically large number of distinct histories map to the same fingerprint. The adversary cannot select among them.
Why This Is Better Than Cryptographic Privacy
Cryptographic privacy (homomorphic encryption, secure multi-party computation, differential privacy) protects data while it is being processed. The data still exists somewhere in an uncompressed form — on the device, in an enclave, in a trusted execution environment. The protection depends on the integrity of the cryptographic implementation and the trustworthiness of the parties involved.
The fingerprint approach is categorically different. The raw operational state never leaves the device. The projection happens on-device. The fingerprint that is transmitted is not an encrypted version of the state — it is a lossy summary. There is no decryption key. There is no trusted party. There is no enclave to compromise.
This distinction matters for compliance.
HIPAA (US healthcare). The Privacy Rule applies to Protected Health Information (PHI). A robot's operational fingerprint — vocabulary cardinality, phase distribution, state matrix density — does not constitute PHI. It contains no individually identifiable health information. It cannot be used to identify a patient. A covered entity deploying CCF-equipped robots can transmit fingerprints to a fleet analytics service without triggering HIPAA obligations on the analytics service, because the fingerprint is not PHI.
GDPR (EU). The Data Minimisation principle (Article 5(1)(c)) requires that personal data be "adequate, relevant and limited to what is necessary." The fingerprint is the mathematical minimum — fewer than 20 numbers, none of which constitute personal data. The Privacy by Design principle (Article 25) requires data protection measures to be built into the system architecture. The 1,110:1 compression ratio is not a retrofit. It is the architecture.
Emerging AI governance. The EU AI Act, NIST AI RMF, and ISO/IEC 42001 all emphasise transparency and proportionality in data collection. A system that transmits 20 numbers per day from a robot operating in a patient bedroom is proportionate. A system that streams video from the same bedroom is not.
The Bedroom Problem
The hardest case is the domestic bedroom. This is where privacy expectations are highest and where the surveillance risk from raw telemetry is most acute.
From the simulation data:
Bedroom |K| = 295
Full state D = 87,616
Fingerprint d = 20
Privacy ratio: 4,381:1
A robot in a bedroom encounters 295 distinct environmental contexts — far more than a forest or Mars habitat. Domestic environments are complex: lighting changes throughout the day, multiple people come and go, furniture gets moved, the TV plays different programmes, the heating cycles on and off. Each of these generates distinct context keys.
The 4,381:1 ratio means the fingerprint captures 0.023% of the operational state. The remaining 99.977% is discarded at the point of projection. An adversary with the fingerprint, the algorithm, the sensor types, and the quantisation scheme can constrain the bedroom robot's state to a 87,602-dimensional subspace. That is not a useful amount of constraint.
The bedroom robot's fingerprint tells the fleet operator: this robot has a large vocabulary (complex environment), very low interconnection (sparse context transitions), mostly Phase I (still building familiarity), 9 structural clusters, and a strong presence pattern with 28% absence. The fleet operator learns: this is a domestic deployment that is developing normally for its age. They learn nothing about the resident.
What This Means for Fleet Procurement
If you are writing an RFP for autonomous agent fleet management, here are the questions to ask vendors:
-
What data leaves the device? If the answer includes raw sensor streams, video, audio, or individually attributable telemetry, the system is surveillance infrastructure regardless of what privacy policy wraps it.
-
What is the compression ratio? The ratio between the device's operational state and the transmitted analytics data quantifies the privacy guarantee. A 1,110:1 ratio is structural privacy. A 1:1 ratio is raw telemetry with a different name.
-
Can a Tier 2 adversary reconstruct sensor readings? If yes, the privacy depends on keeping the algorithm secret. Security through obscurity.
-
Does compliance depend on policy or architecture? A HIPAA BAA is a policy instrument. A 22,187-dimensional unconstrained subspace is a mathematical fact.
The CCF fingerprint is available through ccf-core on crates.io for evaluation. The architecture is described at /how-it-works. The patent claim structure is at /patent.
For the mathematical foundations underpinning the familiarity accumulators that generate the fingerprint, see Sinkhorn-Knopp for Trust and The Forced Convergence Theorem.
— Colm Byrne, Founder — Flout Labs, Galway, Ireland
Patent pending. US Provisional 64/039,623.
FAQ
If the fingerprint is just 20 numbers, can't an attacker brute-force all possible states that produce those numbers?
No. The operational state space has 22,201 dimensions (for a 148-context agent). Each dimension is a floating-point value. The number of possible states consistent with a given fingerprint is not just large — it is a continuous 22,187-dimensional subspace. You cannot enumerate it. This is not a combinatorial search problem with a finite number of solutions. It is a continuous inverse problem with infinitely many solutions, none distinguishable from the others using only the fingerprint.
Does differential privacy provide a stronger guarantee?
Differential privacy and the fingerprint approach solve different problems. Differential privacy adds calibrated noise to query results so that the presence or absence of any individual record cannot be detected. It requires the raw data to exist somewhere for the query to operate on. The fingerprint approach eliminates the raw data at the point of projection — there is no dataset to query, so differential privacy is not applicable. For fleet analytics, the fingerprint provides a stronger practical guarantee: the raw data literally does not exist outside the device.
What if a regulator demands the raw operational state?
The raw operational state exists only on the device. It is not transmitted, not stored externally, and not backed up to any cloud service. A regulator can request the fingerprint data (which is stored in the fleet analytics service), but the raw state is only available by physically accessing the device. This is the same posture as any on-device computation — you can demand the output, but you cannot demand data that was never transmitted.
Does the privacy ratio change over time as the robot encounters more contexts?
Yes. As vocabulary cardinality grows, the full state dimension D grows quadratically (because of the |K|^2 state matrix term), while the fingerprint dimension remains at most 20. The privacy ratio improves over time. A robot that starts with 50 contexts has a ratio of 133:1. As it matures to 148 contexts, the ratio reaches 1,110:1. At 295 contexts, it reaches 4,381:1. The longer the robot operates, the stronger the privacy guarantee becomes.
Can the fleet analytics service be subpoenaed for individual patient data?
The fleet analytics service stores only fingerprints — 20 numbers per robot per day. A subpoena for patient data directed at the analytics service would yield no patient data, because the service does not possess any. The 1,110:1 ratio is the mathematical proof of this claim. A court-appointed expert examining the fingerprint data and the algorithm can independently verify that reconstruction of individual-level information is impossible. This is a defensible position in litigation, not a policy promise.