Operational Impact

Teams can fail audits when they cannot replay a past decision with defensible evidence.
Legal and compliance review gets slower and costlier when traceability is incomplete.
Building verifiable runtime records lowers remediation cost when incidents happen.
Procurement and renewal decisions increasingly depend on demonstrated governance controls.

Problem Statement

Nondeterminism in AI systems refers to the condition where identical inputs do not produce identical outputs across executions. This property has consequences for repeatability, traceability, and audit compliance that extend beyond model performance metrics.

When an AI system produces different outputs for the same input under the same configuration, several operational requirements become difficult or impossible to satisfy:

Repeatability: The ability to reproduce a specific system decision for investigation, validation, or dispute resolution.
Traceability: The ability to establish a causal chain from input data through system processing to output decision.
Audit reconstruction: The ability to demonstrate, after the fact, that a system behaved as documented and approved.

These go beyond research concerns. They are operational, legal, and regulatory requirements that organizations must satisfy when deploying AI systems in regulated contexts.

The sources of nondeterminism are well understood. Floating-point arithmetic on parallel hardware can produce different results depending on execution order. Stochastic decoding in language models introduces intentional randomness. Caching and batching can alter execution paths. Hardware variations across deployment environments introduce additional variance.

The existence of nondeterminism is well established. The more pressing question is whether organizations have controls sufficient to satisfy their regulatory and governance obligations when nondeterminism is present.

What Regulators and Standards Bodies Explicitly Require

Multiple regulatory frameworks and standards bodies have published requirements that bear directly on system traceability, logging, and behavior reconstruction. These requirements do not use the term "determinism" uniformly, but they establish expectations that nondeterministic systems may struggle to meet.

European Union AI Act

The EU AI Act, formally Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024, establishes binding requirements for high-risk AI systems.

Article 12 mandates automatic logging capabilities:

"High-risk AI systems shall technically allow for the automatic recording of events (logs) over the lifetime of the system."

The regulation specifies that logging capabilities must enable recording of events relevant for:

Identifying situations that may result in the system presenting a risk or undergoing substantial modification
Facilitating post-market monitoring
Monitoring the operation of high-risk AI systems by deployers

Article 12 further requires that logs enable the identification of situations where the system may need to be modified and must support the reconstruction of system behavior for regulatory oversight.

According to published guidance from the European Commission's AI Act Service Desk, these logs must be retained for a minimum of six months and must be sufficient to reconstruct how AI-driven decisions were made, including who triggered them, with what data, at what time, and under whose authority.

Source: EU AI Act Article 12

Source: Regulation (EU) 2024/1689 - EUR-Lex

NIST AI Risk Management Framework

The National Institute of Standards and Technology published the AI Risk Management Framework (AI RMF 1.0) in January 2023, with supplementary guidance for generative AI (NIST-AI-600-1) released in July 2024.

The framework emphasizes traceability as a core characteristic:

"Provides traceable data to manage trade-offs among trustworthiness characteristics and inform risk management actions."

The GOVERN function of the framework requires organizations to document how AI system behaviors are monitored, how risks are identified, and how decisions are made regarding system deployment and modification. The MEASURE function requires that measurement methodologies follow scientific, legal, and ethical norms with emphasis on transparency.

The framework does not mandate specific technical implementations, but it establishes expectations that system behavior should be explainable, documented, and subject to ongoing monitoring.

Source: NIST AI RMF

Source: NIST AI 100-1 (PDF)

ISO/IEC 42001:2023

ISO/IEC 42001:2023 specifies requirements for an Artificial Intelligence Management System (AIMS). Organizations seeking certification must demonstrate controls across governance, risk management, and lifecycle monitoring.

The standard identifies traceability as a key factor, encompassing:

Data provenance
Model traceability
Explainability
Audit logs

Organizations must maintain model cards, audit logs, decision records, and compliance reports to support accountability. The standard requires that documentation enable traceability, creating a clear record of how AI projects are governed over time.

Certification requires a comprehensive audit by an accredited conformity assessment body, with annual surveillance audits to ensure ongoing compliance. AI impact assessments and threat modeling should be conducted at least annually on existing systems, and prior to the deployment of any new AI function.

Source: ISO/IEC 42001:2023

FDA Guidance on AI-Enabled Medical Devices

On December 3, 2024, the U.S. Food and Drug Administration published final guidance titled "Marketing Submission Recommendations for a Predetermined Change Control Plan for Artificial Intelligence-Enabled Device Software Functions."

The guidance addresses how manufacturers can document planned modifications to AI-enabled devices while maintaining regulatory compliance. The FDA requires that:

Study results include adequate subgroup analyses for relevant demographics
Devices demonstrate repeatability and reproducibility in performance validation
Independent datasets be used for performance validation

The guidance explicitly addresses reproducibility as a requirement for AI-enabled medical devices, noting that deployed models must be monitored for performance and that re-training risks must be managed.

Source: FDA PCCP Guidance

Source: FDA AI in Medical Devices Overview

U.S. Financial Services: SR 11-7 and SEC Examination Priorities

The Federal Reserve's Supervisory Regulation 11-7, originally published in 2011 and subsequently adopted by the OCC and FDIC, establishes model risk management expectations for banks. The guidance requires:

Independent validation by objective parties
Ongoing monitoring comparing outputs to actual outcomes
Documentation detailed enough that unfamiliar parties can understand the model's operation

While SR 11-7 predates modern AI systems, the Office of the Comptroller of the Currency has clarified that AI tools are covered under this framework.

The SEC's Division of Examinations announced in October 2024 that artificial intelligence is a priority focus for 2025 examinations. The SEC will assess whether firms have implemented adequate policies and procedures to monitor and supervise their use of AI technologies, including for fraud prevention, back-office operations, anti-money laundering, and trading functions.

The SEC expects algorithm governance to mirror the rigor applied to traditional investment processes, with additional controls given the speed and scale of automated systems. Algorithms must be tested across multiple market regimes, and testing should be documented with sufficient detail for examination.

Source: SEC 2025 Examination Priorities

Source: FDIC FIL-17-2022 on Model Risk Management

Bank of England PRA SS1/23

The Prudential Regulation Authority published Supervisory Statement 1/23 in May 2023, establishing model risk management principles for banks. The statement became effective May 17, 2024.

The five principles cover:

Model identification and model risk classification
Governance
Model development, implementation, and use
Independent model validation
Model risk mitigants

While the principles take a technology-agnostic approach, the PRA explicitly identifies AI as requiring attention under these principles. According to the Bank of England's 2024 AI survey, 84% of firms reported having an accountable person for their AI framework.

Source: Bank of England SS1/23

Source: Bank of England 2024 AI Survey

What Enterprises Report in Practice

Enterprise survey data provides insight into the gap between regulatory expectations and operational reality.

Gartner Findings

According to a Gartner survey conducted in 2024, while 80% of large organizations claim to have AI governance initiatives, fewer than half can demonstrate measurable maturity. Most lack a structured way to connect policies with practice.

A subsequent Gartner survey conducted in May-June 2025 among 360 respondents found that organizations performing regular audits and assessments of AI system performance and compliance are over three times more likely to achieve high value from generative AI than organizations that do not.

Gartner's research found that in high-maturity organizations, almost 60% of leaders said they have centralized their AI strategy, governance, data, and infrastructure capabilities.

Source: Gartner AI Governance Survey 2025

McKinsey State of AI

McKinsey's State of AI report indicates that 65% of organizations now use generative AI regularly—double the previous year's rate—but 74% still struggle to scale AI deployments from pilot to production.

McKinsey's research has found that while 75% of executives view AI as strategically critical, fewer than 25% have moved from pilots to production.

Source: McKinsey State of AI

Audit Capability Gaps

Enterprise survey data suggests significant gaps in audit capability. According to published research, 51% of organizations acknowledge they lack sufficient AI audit expertise.

The AI Incident Database recorded 233 cases of AI risks and misuse in 2024, representing more than a 50% increase from 2023.

Source: AI Incident Database

Data Limitations

The available enterprise survey data has limitations. Most surveys measure governance initiative presence rather than governance effectiveness. Few surveys specifically measure reproducibility or determinism as operational capabilities. The correlation between governance maturity and actual compliance outcomes is not well documented in public data.

Organizations should treat enterprise survey statistics as directional indicators rather than precise benchmarks.

Industry-Specific Signals

Different industries have varying exposure to nondeterminism-related governance failures. The following summarizes documented regulatory attention and industry practice.

Financial Services

Financial services has the most mature regulatory framework for model governance. SR 11-7, the PRA's SS1/23, and SEC examination priorities collectively establish expectations for model validation, documentation, and ongoing monitoring.

The specific challenge for financial services is that many AI applications—credit scoring, fraud detection, trading algorithms—operate in contexts where individual decisions may be subject to dispute, audit, or regulatory examination. A credit decision that cannot be reproduced creates liability exposure that extends beyond model performance.

The Bank of England's 2024 AI survey indicates that UK financial firms are actively implementing governance frameworks, with 84% reporting designated accountability for AI. However, the survey does not measure whether these frameworks address nondeterminism specifically.

Healthcare and Medical Devices

The FDA's guidance on predetermined change control plans establishes reproducibility as an explicit requirement for AI-enabled medical devices. The requirement for repeatability in performance validation implies that devices must produce consistent outputs for validation to be meaningful.

Healthcare AI operates under additional constraints from HIPAA, clinical trial requirements, and professional liability standards. A diagnostic AI that produces different results on repeated evaluation of the same patient data creates clinical and legal complications that extend beyond regulatory compliance.

Defense and National Security

The Department of Defense has acknowledged that test, evaluation, verification, and validation (TEVV) of AI-enabled systems requires new approaches. The National Security Commission on AI recommended that military services establish TEVV frameworks that integrate testing as a continuous part of requirements specification, development, deployment, training, and maintenance.

However, formalized DoD policy on AI testing is still under development. According to published DoD documentation, "Formally approved DTE&A policy is not yet available, with interim guidebook and emerging guidance available in the meantime."

The August 2024 Defense Science Board report concluded that a strategic shift in DoD's approach to test and evaluation is needed, moving beyond acquisition-based frameworks toward continuous development and testing.

Source: DoD DTE&A AI-Enabled Systems

Source: Defense Science Board Report August 2024 (PDF)

Critical Infrastructure

Critical infrastructure sectors—energy, transportation, telecommunications—are subject to sector-specific regulations that may not explicitly address AI but establish requirements for system reliability, incident response, and operational documentation.

Publicly available data on AI governance practices in critical infrastructure is limited. Organizations in these sectors should evaluate whether existing operational reliability frameworks adequately address AI-specific nondeterminism concerns.

Why This Is a Governance Failure

The gap between regulatory expectations and operational capability constitutes a governance failure. This failure manifests in several ways.

Audit Breakdown

When an auditor or regulator requests evidence of how a system produced a specific decision, the organization must be able to reconstruct the decision pathway. If the system is nondeterministic, reconstruction may produce a different result than the original decision.

This creates a compliance evidence gap. The organization cannot prove what actually happened. It can only demonstrate what the system does now, which may differ from what it did then.

The EU AI Act's requirement that logs enable reconstruction of system behavior implies that organizations must be able to replay system decisions. Nondeterministic systems may not satisfy this requirement even if they maintain complete input logs.

Incident Response Limitations

When an AI system produces an incorrect, harmful, or disputed output, incident response requires understanding what happened and why. This understanding informs remediation, liability assessment, and preventive measures.

If the system is nondeterministic, the incident may not be reproducible. Engineers cannot reliably reproduce the failure condition. Root cause analysis becomes speculative. The organization cannot verify that a proposed fix addresses the actual failure mode.

Engineering teams working with LLM systems report that deterministic record/replay makes incident investigation substantially faster and more reliable.

Source: Propel Engineering Blog

Compliance Evidence Gaps

Regulatory frameworks increasingly require that organizations demonstrate, not merely assert, that their AI systems operate within approved parameters. This demonstration requires evidence.

Evidence of system behavior requires either:

Complete logs of all system state sufficient to reconstruct any decision, or
Deterministic execution that guarantees the same inputs produce the same outputs

Option 1 may be technically infeasible for large-scale AI systems due to state complexity. Option 2 requires architectural choices that many production systems have not made.

Organizations that cannot provide either form of evidence face compliance gaps that may not be remediable through policy changes alone.

Missing Controls

The governance failure is rooted more in operational controls than in model design.

Organizations may have:

Documented AI policies that do not address nondeterminism
Governance frameworks that assume reproducibility without verifying it
Audit procedures that cannot detect nondeterminism-related compliance gaps
Incident response processes that assume failures are reproducible

These control gaps persist because nondeterminism is often treated as a technical implementation detail without recognizing it as a governance-relevant system property.

Implications for System Design

Organizations seeking to address nondeterminism as a governance concern should consider the following design implications.

Deterministic Execution

Deterministic execution means that identical inputs produce identical outputs across all executions, environments, and time periods. This is a strong property that requires explicit architectural choices.

For systems built on GPU-accelerated inference, deterministic execution may require:

Fixed-seed random number generation
Deterministic algorithm selection in numerical libraries
Single-threaded or order-preserving parallel execution
Version-locked model weights, tokenizers, and inference code

These choices may impose performance costs. Organizations must evaluate whether the governance benefits justify the operational tradeoffs.

Versioned Inputs and Outputs

Even without full deterministic execution, organizations can improve auditability by maintaining versioned records of inputs and outputs.

This requires:

Immutable storage of all inputs at time of processing
Cryptographic hashing or content-addressable storage for integrity verification
Timestamps with sufficient precision to establish ordering
Association between input records and output records

Versioned records do not guarantee reproducibility, but they establish what the system received and produced, which supports some audit requirements.

Replayability

Replayability means the ability to re-execute a historical decision with the same result. This requires either deterministic execution or complete capture of all stochastic state.

For systems with intentional randomness (such as language model decoding), replayability may require:

Logging of random seeds or complete random state
Capture of any external state that influenced the decision
Version-locked execution environment

Some regulatory frameworks, including the EU AI Act's logging requirements, imply an expectation of replayability. Organizations should evaluate whether their current systems can satisfy this expectation.

Verifiable Records

Where supported by system architecture, cryptographic verification of records can provide stronger audit guarantees.

This may include:

Hash chains or Merkle trees linking records in tamper-evident sequences
Signed timestamps from trusted time sources
Third-party attestation of record integrity

These mechanisms add complexity and cost. Their appropriateness depends on the regulatory context and risk profile of the application.

What Is Still Unknown or Unmeasured

Several questions relevant to nondeterminism governance remain unanswered in public data.

No public data source systematically tracks compliance failures attributable to nondeterminism. Regulatory enforcement actions rarely identify nondeterminism as a root cause, even when it may have contributed to the underlying issue.

It is not currently possible to estimate what fraction of AI governance failures involve nondeterminism as a contributing factor.

Effectiveness of Mitigation Strategies

While engineering best practices for deterministic AI execution exist, there is limited public data on their effectiveness at scale. Questions remain:

What is the actual performance cost of deterministic execution in production systems?
Do deterministic systems demonstrate better compliance outcomes in regulatory examinations?
Are current logging practices sufficient to support the reconstruction requirements of frameworks like the EU AI Act?

These questions lack definitive answers in available literature.

Regulatory Enforcement Trajectory

It is unclear how regulators will interpret requirements for behavior reconstruction when applied to inherently nondeterministic systems. The EU AI Act's Article 12 requirements are new, and enforcement practice has not yet established precedent.

Organizations should monitor regulatory guidance and enforcement actions as this area matures.

Sector-Specific Exposure

Different industries have different exposure to nondeterminism governance risks, but sector-specific data is limited. Financial services and healthcare have more developed frameworks, but actual compliance status is not publicly reported.

Organizations in less-regulated sectors should not assume that the absence of explicit AI regulation means the absence of nondeterminism-related liability.

Nondeterminism in AI systems creates measurable governance challenges. Regulatory frameworks increasingly establish expectations for traceability, logging, and behavior reconstruction that nondeterministic systems may not satisfy. Enterprise survey data suggests that most organizations have not implemented controls sufficient to demonstrate compliance.

This is a governance failure—a gap between what regulations require and what systems deliver. Addressing this gap requires elevating determinism from a technical preference to a compliance-relevant system property.

The controls exist. The frameworks exist. The question is whether organizations will implement them before regulatory examination or incident response reveals the gap.

Nondeterminism as a Governance Failure

Operational Impact

Problem Statement

What Regulators and Standards Bodies Explicitly Require

European Union AI Act

NIST AI Risk Management Framework

ISO/IEC 42001:2023

FDA Guidance on AI-Enabled Medical Devices

U.S. Financial Services: SR 11-7 and SEC Examination Priorities

Bank of England PRA SS1/23

What Enterprises Report in Practice

Gartner Findings

McKinsey State of AI

Audit Capability Gaps

Data Limitations

Industry-Specific Signals

Financial Services

Healthcare and Medical Devices

Defense and National Security

Critical Infrastructure

Why This Is a Governance Failure

Audit Breakdown

Incident Response Limitations

Compliance Evidence Gaps

Missing Controls

Implications for System Design

Deterministic Execution

Versioned Inputs and Outputs

Replayability

Verifiable Records

What Is Still Unknown or Unmeasured

Effectiveness of Mitigation Strategies

Regulatory Enforcement Trajectory

Sector-Specific Exposure

Continue Reading

All Research Notes

Research Pillars

Discuss Deployment

Nondeterminism as a Governance Failure

Operational Impact

Problem Statement

What Regulators and Standards Bodies Explicitly Require

European Union AI Act

NIST AI Risk Management Framework

ISO/IEC 42001:2023

FDA Guidance on AI-Enabled Medical Devices

U.S. Financial Services: SR 11-7 and SEC Examination Priorities

Bank of England PRA SS1/23

What Enterprises Report in Practice

Gartner Findings

McKinsey State of AI

Audit Capability Gaps

Data Limitations

Industry-Specific Signals

Financial Services

Healthcare and Medical Devices

Defense and National Security

Critical Infrastructure

Why This Is a Governance Failure

Audit Breakdown

Incident Response Limitations

Compliance Evidence Gaps

Missing Controls

Implications for System Design

Deterministic Execution

Versioned Inputs and Outputs

Replayability

Verifiable Records

What Is Still Unknown or Unmeasured

Prevalence of Nondeterminism-Related Compliance Failures

Effectiveness of Mitigation Strategies

Regulatory Enforcement Trajectory

Sector-Specific Exposure

Continue Reading

All Research Notes

Research Pillars

Discuss Deployment