Problem Statement
Nondeterminism in AI systems refers to the condition where identical inputs do not produce identical outputs across executions. This property has consequences for repeatability, traceability, and audit compliance that extend beyond model performance metrics.
When an AI system produces different outputs for the same input under the same configuration, several operational requirements become difficult or impossible to satisfy:
- Repeatability: The ability to reproduce a specific system decision for investigation, validation, or dispute resolution.
- Traceability: The ability to establish a causal chain from input data through system processing to output decision.
- Audit reconstruction: The ability to demonstrate, after the fact, that a system behaved as documented and approved.
These are not research concerns. They are operational, legal, and regulatory requirements that organizations must satisfy when deploying AI systems in regulated contexts.
The sources of nondeterminism are well understood. Floating-point arithmetic on parallel hardware can produce different results depending on execution order. Stochastic decoding in language models introduces intentional randomness. Caching and batching can alter execution paths. Hardware variations across deployment environments introduce additional variance.
The question is not whether nondeterminism exists. The question is whether organizations have controls sufficient to satisfy their regulatory and governance obligations when nondeterminism is present.
What Regulators and Standards Bodies Explicitly Require
Multiple regulatory frameworks and standards bodies have published requirements that bear directly on system traceability, logging, and behavior reconstruction. These requirements do not use the term "determinism" uniformly, but they establish expectations that nondeterministic systems may struggle to meet.
European Union AI Act
The EU AI Act, formally Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024, establishes binding requirements for high-risk AI systems.
Article 12 mandates automatic logging capabilities:
"High-risk AI systems shall technically allow for the automatic recording of events (logs) over the lifetime of the system."
The regulation specifies that logging capabilities must enable recording of events relevant for:
- Identifying situations that may result in the system presenting a risk or undergoing substantial modification
- Facilitating post-market monitoring
- Monitoring the operation of high-risk AI systems by deployers
Article 12 further requires that logs enable the identification of situations where the system may need to be modified and must support the reconstruction of system behavior for regulatory oversight.
According to published guidance from the European Commission's AI Act Service Desk, these logs must be retained for a minimum of six months and must be sufficient to reconstruct how AI-driven decisions were made, including who triggered them, with what data, at what time, and under whose authority.
Source: EU AI Act Article 12
Source: Regulation (EU) 2024/1689 - EUR-Lex
NIST AI Risk Management Framework
The National Institute of Standards and Technology published the AI Risk Management Framework (AI RMF 1.0) in January 2023, with supplementary guidance for generative AI (NIST-AI-600-1) released in July 2024.
The framework emphasizes traceability as a core characteristic:
"Provides traceable data to manage trade-offs among trustworthiness characteristics and inform risk management actions."
The GOVERN function of the framework requires organizations to document how AI system behaviors are monitored, how risks are identified, and how decisions are made regarding system deployment and modification. The MEASURE function requires that measurement methodologies follow scientific, legal, and ethical norms with emphasis on transparency.
The framework does not mandate specific technical implementations, but it establishes expectations that system behavior should be explainable, documented, and subject to ongoing monitoring.
Source: NIST AI RMF
Source: NIST AI 100-1 (PDF)
ISO/IEC 42001:2023
ISO/IEC 42001:2023 specifies requirements for an Artificial Intelligence Management System (AIMS). Organizations seeking certification must demonstrate controls across governance, risk management, and lifecycle monitoring.
The standard identifies traceability as a key factor, encompassing:
- Data provenance
- Model traceability
- Explainability
- Audit logs
Organizations must maintain model cards, audit logs, decision records, and compliance reports to support accountability. The standard requires that documentation enable traceability, creating a clear record of how AI projects are governed over time.
Certification requires a comprehensive audit by an accredited conformity assessment body, with annual surveillance audits to ensure ongoing compliance. AI impact assessments and threat modeling should be conducted at least annually on existing systems, and prior to the deployment of any new AI function.
Source: ISO/IEC 42001:2023
FDA Guidance on AI-Enabled Medical Devices
On December 3, 2024, the U.S. Food and Drug Administration published final guidance titled "Marketing Submission Recommendations for a Predetermined Change Control Plan for Artificial Intelligence-Enabled Device Software Functions."
The guidance addresses how manufacturers can document planned modifications to AI-enabled devices while maintaining regulatory compliance. The FDA requires that:
- Study results include adequate subgroup analyses for relevant demographics
- Devices demonstrate repeatability and reproducibility in performance validation
- Independent datasets be used for performance validation
The guidance explicitly addresses reproducibility as a requirement for AI-enabled medical devices, noting that deployed models must be monitored for performance and that re-training risks must be managed.
Source: FDA PCCP Guidance
Source: FDA AI in Medical Devices Overview
U.S. Financial Services: SR 11-7 and SEC Examination Priorities
The Federal Reserve's Supervisory Regulation 11-7, originally published in 2011 and subsequently adopted by the OCC and FDIC, establishes model risk management expectations for banks. The guidance requires:
- Independent validation by objective parties
- Ongoing monitoring comparing outputs to actual outcomes
- Documentation detailed enough that unfamiliar parties can understand the model's operation
While SR 11-7 predates modern AI systems, the Office of the Comptroller of the Currency has clarified that AI tools are covered under this framework.
The SEC's Division of Examinations announced in October 2024 that artificial intelligence is a priority focus for 2025 examinations. The SEC will assess whether firms have implemented adequate policies and procedures to monitor and supervise their use of AI technologies, including for fraud prevention, back-office operations, anti-money laundering, and trading functions.
The SEC expects algorithm governance to mirror the rigor applied to traditional investment processes, with additional controls given the speed and scale of automated systems. Algorithms must be tested across multiple market regimes, and testing should be documented with sufficient detail for examination.
Source: SEC 2025 Examination Priorities
Source: FDIC FIL-17-2022 on Model Risk Management
Bank of England PRA SS1/23
The Prudential Regulation Authority published Supervisory Statement 1/23 in May 2023, establishing model risk management principles for banks. The statement became effective May 17, 2024.
The five principles cover:
- Model identification and model risk classification
- Governance
- Model development, implementation, and use
- Independent model validation
- Model risk mitigants
While the principles take a technology-agnostic approach, the PRA explicitly identifies AI as requiring attention under these principles. According to the Bank of England's 2024 AI survey, 84% of firms reported having an accountable person for their AI framework.
Source: Bank of England SS1/23
Source: Bank of England 2024 AI Survey
What Enterprises Report in Practice
Enterprise survey data provides insight into the gap between regulatory expectations and operational reality.
Gartner Findings
According to a Gartner survey conducted in 2024, while 80% of large organizations claim to have AI governance initiatives, fewer than half can demonstrate measurable maturity. Most lack a structured way to connect policies with practice.
A subsequent Gartner survey conducted in May-June 2025 among 360 respondents found that organizations performing regular audits and assessments of AI system performance and compliance are over three times more likely to achieve high value from generative AI than organizations that do not.
Gartner's research found that in high-maturity organizations, almost 60% of leaders said they have centralized their AI strategy, governance, data, and infrastructure capabilities.
Source: Gartner AI Governance Survey 2025
McKinsey State of AI
McKinsey's State of AI report indicates that 65% of organizations now use generative AI regularly—double the previous year's rate—but 74% still struggle to scale AI deployments from pilot to production.
McKinsey's research has found that while 75% of executives view AI as strategically critical, fewer than 25% have moved from pilots to production.
Source: McKinsey State of AI
Audit Capability Gaps
Enterprise survey data suggests significant gaps in audit capability. According to published research, 51% of organizations acknowledge they lack sufficient AI audit expertise.
The AI Incident Database recorded 233 cases of AI risks and misuse in 2024, representing more than a 50% increase from 2023.
Source: AI Incident Database
Data Limitations
The available enterprise survey data has limitations. Most surveys measure governance initiative presence rather than governance effectiveness. Few surveys specifically measure reproducibility or determinism as operational capabilities. The correlation between governance maturity and actual compliance outcomes is not well documented in public data.
Organizations should treat enterprise survey statistics as directional indicators rather than precise benchmarks.
Industry-Specific Signals
Different industries have varying exposure to nondeterminism-related governance failures. The following summarizes documented regulatory attention and industry practice.
Financial Services
Financial services has the most mature regulatory framework for model governance. SR 11-7, the PRA's SS1/23, and SEC examination priorities collectively establish expectations for model validation, documentation, and ongoing monitoring.
The specific challenge for financial services is that many AI applications—credit scoring, fraud detection, trading algorithms—operate in contexts where individual decisions may be subject to dispute, audit, or regulatory examination. A credit decision that cannot be reproduced creates liability exposure that extends beyond model performance.
The Bank of England's 2024 AI survey indicates that UK financial firms are actively implementing governance frameworks, with 84% reporting designated accountability for AI. However, the survey does not measure whether these frameworks address nondeterminism specifically.
Healthcare and Medical Devices
The FDA's guidance on predetermined change control plans establishes reproducibility as an explicit requirement for AI-enabled medical devices. The requirement for repeatability in performance validation implies that devices must produce consistent outputs for validation to be meaningful.
Healthcare AI operates under additional constraints from HIPAA, clinical trial requirements, and professional liability standards. A diagnostic AI that produces different results on repeated evaluation of the same patient data creates clinical and legal complications that extend beyond regulatory compliance.
Defense and National Security
The Department of Defense has acknowledged that test, evaluation, verification, and validation (TEVV) of AI-enabled systems requires new approaches. The National Security Commission on AI recommended that military services establish TEVV frameworks that integrate testing as a continuous part of requirements specification, development, deployment, training, and maintenance.
However, formalized DoD policy on AI testing is still under development. According to published DoD documentation, "Formally approved DTE&A policy is not yet available, with interim guidebook and emerging guidance available in the meantime."
The August 2024 Defense Science Board report concluded that a strategic shift in DoD's approach to test and evaluation is needed, moving beyond acquisition-based frameworks toward continuous development and testing.
Source: DoD DTE&A AI-Enabled Systems
Source: Defense Science Board Report August 2024 (PDF)
Critical Infrastructure
Critical infrastructure sectors—energy, transportation, telecommunications—are subject to sector-specific regulations that may not explicitly address AI but establish requirements for system reliability, incident response, and operational documentation.
Publicly available data on AI governance practices in critical infrastructure is limited. Organizations in these sectors should evaluate whether existing operational reliability frameworks adequately address AI-specific nondeterminism concerns.
Why This Is a Governance Failure
The gap between regulatory expectations and operational capability constitutes a governance failure. This failure manifests in several ways.
Audit Breakdown
When an auditor or regulator requests evidence of how a system produced a specific decision, the organization must be able to reconstruct the decision pathway. If the system is nondeterministic, reconstruction may produce a different result than the original decision.
This creates a compliance evidence gap. The organization cannot prove what actually happened. They can only demonstrate what the system does now, which may differ from what it did then.
The EU AI Act's requirement that logs enable reconstruction of system behavior implies that organizations must be able to replay system decisions. Nondeterministic systems may not satisfy this requirement even if they maintain complete input logs.
Incident Response Limitations
When an AI system produces an incorrect, harmful, or disputed output, incident response requires understanding what happened and why. This understanding informs remediation, liability assessment, and preventive measures.
If the system is nondeterministic, the incident may not be reproducible. Engineers cannot reliably reproduce the failure condition. Root cause analysis becomes speculative. The organization cannot verify that a proposed fix addresses the actual failure mode.
According to engineering teams working with LLM systems, "When incidents happen, deterministic record/replay is a superpower. Given a request trace, you can reproduce the failure locally and verify the fix before rolling forward."
Source: Propel Engineering Blog
Compliance Evidence Gaps
Regulatory frameworks increasingly require that organizations demonstrate, not merely assert, that their AI systems operate within approved parameters. This demonstration requires evidence.
Evidence of system behavior requires either:
- Complete logs of all system state sufficient to reconstruct any decision, or
- Deterministic execution that guarantees the same inputs produce the same outputs
Option 1 may be technically infeasible for large-scale AI systems due to state complexity. Option 2 requires architectural choices that many production systems have not made.
Organizations that cannot provide either form of evidence face compliance gaps that may not be remediable through policy changes alone.
Missing Controls
The governance failure is not primarily a failure of model design. It is a failure of operational controls.
Organizations may have:
- Documented AI policies that do not address nondeterminism
- Governance frameworks that assume reproducibility without verifying it
- Audit procedures that cannot detect nondeterminism-related compliance gaps
- Incident response processes that assume failures are reproducible
These control gaps persist because nondeterminism is often treated as a technical implementation detail rather than a governance-relevant system property.
Implications for System Design
Organizations seeking to address nondeterminism as a governance concern should consider the following design implications.
Deterministic Execution
Deterministic execution means that identical inputs produce identical outputs across all executions, environments, and time periods. This is a strong property that requires explicit architectural choices.
For systems built on GPU-accelerated inference, deterministic execution may require:
- Fixed-seed random number generation
- Deterministic algorithm selection in numerical libraries
- Single-threaded or order-preserving parallel execution
- Version-locked model weights, tokenizers, and inference code
These choices may impose performance costs. Organizations must evaluate whether the governance benefits justify the operational tradeoffs.
Versioned Inputs and Outputs
Even without full deterministic execution, organizations can improve auditability by maintaining versioned records of inputs and outputs.
This requires:
- Immutable storage of all inputs at time of processing
- Cryptographic hashing or content-addressable storage for integrity verification
- Timestamps with sufficient precision to establish ordering
- Association between input records and output records
Versioned records do not guarantee reproducibility, but they establish what the system received and produced, which supports some audit requirements.
Replayability
Replayability means the ability to re-execute a historical decision with the same result. This requires either deterministic execution or complete capture of all stochastic state.
For systems with intentional randomness (such as language model decoding), replayability may require:
- Logging of random seeds or complete random state
- Capture of any external state that influenced the decision
- Version-locked execution environment
Some regulatory frameworks, including the EU AI Act's logging requirements, imply an expectation of replayability. Organizations should evaluate whether their current systems can satisfy this expectation.
Verifiable Records
Where supported by system architecture, cryptographic verification of records can provide stronger audit guarantees.
This may include:
- Hash chains or Merkle trees linking records in tamper-evident sequences
- Signed timestamps from trusted time sources
- Third-party attestation of record integrity
These mechanisms add complexity and cost. Their appropriateness depends on the regulatory context and risk profile of the application.
What Is Still Unknown or Unmeasured
Several questions relevant to nondeterminism governance remain unanswered in public data.
Prevalence of Nondeterminism-Related Compliance Failures
No public data source systematically tracks compliance failures attributable to nondeterminism. Regulatory enforcement actions rarely identify nondeterminism as a root cause, even when it may have contributed to the underlying issue.
It is not currently possible to estimate what fraction of AI governance failures involve nondeterminism as a contributing factor.
Effectiveness of Mitigation Strategies
While engineering best practices for deterministic AI execution exist, there is limited public data on their effectiveness at scale. Questions remain:
- What is the actual performance cost of deterministic execution in production systems?
- Do deterministic systems demonstrate better compliance outcomes in regulatory examinations?
- Are current logging practices sufficient to support the reconstruction requirements of frameworks like the EU AI Act?
These questions lack definitive answers in available literature.
Regulatory Enforcement Trajectory
It is unclear how regulators will interpret requirements for behavior reconstruction when applied to inherently nondeterministic systems. The EU AI Act's Article 12 requirements are new, and enforcement practice has not yet established precedent.
Organizations should monitor regulatory guidance and enforcement actions as this area matures.
Sector-Specific Exposure
Different industries have different exposure to nondeterminism governance risks, but sector-specific data is limited. Financial services and healthcare have more developed frameworks, but actual compliance status is not publicly reported.
Organizations in less-regulated sectors should not assume that the absence of explicit AI regulation means the absence of nondeterminism-related liability.
Nondeterminism in AI systems creates measurable governance challenges. Regulatory frameworks increasingly establish expectations for traceability, logging, and behavior reconstruction that nondeterministic systems may not satisfy. Enterprise survey data suggests that most organizations have not implemented controls sufficient to demonstrate compliance.
This is a governance failure—a gap between what regulations require and what systems deliver. Addressing this gap requires treating determinism not as a technical preference but as a compliance-relevant system property.
The controls exist. The frameworks exist. The question is whether organizations will implement them before regulatory examination or incident response reveals the gap.