X-40™ — runtime governance for LLMs and ML systems, backed by structural physics evidence.
X-40™ is an infrastructure layer that sits above AI outputs and enforces decision governance: when a system is safe to auto-accept and when it must require verification — with reasons, indices, and auditable traces.
X-40™ uses two evidence channels: behavioral telemetry (confidence margins, uncertainty dynamics, drift) and an independent structural evidence channel powered by QEIv15™ anchors (Φ, κ, ΔS families). This duality strengthens governance under real-world drift.
X-40 returns ACCEPT or REQUIRE_VERIFICATION, with reason codes and indices. This enables routing, blocking, redaction, or escalation in production workflows.
- Behavioral telemetry (logprobs/margins/uncertainty dynamics, drift)
- Structural evidence (QEIv15™ anchors via ResearchCore)
Two evidence channels reduce single-signal failure modes and improve auditability.
X-40 can operate in privacy-max mode where user content is not stored and only audit fields are retained (indices, reasons, hashes). This supports regulated environments.
Model support and ML integration
X-40 supports Trace Mode when token-level telemetry is available (e.g., logprobs/margins). When telemetry is not available, X-40 supports Sidecar Mode: the client calls the model and sends only the telemetry/outputs required for governance.
- OpenAI default: GPT-4.1 (validated benchmark configuration)
- Upgrade path: newer OpenAI models in telemetry-compatible mode when logprobs are required
- Other vendors / on-prem models: Sidecar mode using equivalent confidence telemetry
X-40 is not limited to chat. For ML inference, X-40 can govern outputs using confidence telemetry and drift over time.
- prediction confidence / probability
- margin between top classes
- batch drift across time windows
- stability envelopes and escalation rules
Typical use cases: risk scoring, anomaly detection, compliance triage, claims workflows, production monitoring.
Deployment modes
Call X-40 as a governance gateway. X-40 returns policy + indices + reason codes for centralized control.
Call your model directly. Send minimal telemetry/outputs to X-40 for governance and auditing — minimizing exposure.
Enterprise deployment inside the customer environment with customer-controlled keys, policies and audit controls.
Benchmark evidence
X-40 is validated using a published benchmark protocol and frozen reproducibility capsule. The key operational metric is Wrong+Accepted: incorrect outputs that were accepted and would have been shipped.
- Deterministic + facts + unknowns + attack + math
- Comparator methods (judge/self-consistency) included
- Multi-seed + large math set + messy prompts
- Worst-case shipped incidents driven to zero under the published protocol
Benchmarks validate X-40 under the published protocol. Production performance depends on prompt classes, provider behavior, and client policy configuration.
Teams onboard by calibrating baselines to prompt classes and policies, then deploying X-40 as an API gateway, privacy-max sidecar, or on-prem container.