Vibe Coding and Cyber Assurance: Why AI-Generated Software Needs a Different Kind of Security Assessment

3 Jul

The Barriers to Building Software Have Fallen. The Barriers to Trusting It Have Not.

Artificial intelligence is reshaping software development at a pace that has outrun most organisations' ability to assure what they are building. What began as autocomplete for developers has become something fundamentally different: agentic systems capable of designing architecture, generating code, writing tests and deploying applications with minimal human intervention. Over 55 per cent of new code on GitHub is now AI-assisted in some form. Non-technical founders are shipping products. Business analysts are building internal tools. The assumption that the people creating software understand what they have created no longer holds universally.

This is, in many respects, a positive development. The democratisation of software creation opens opportunities that were previously inaccessible. But the same forces that accelerate development also accelerate risk. Generating code is no longer the bottleneck. Understanding whether the resulting software is trustworthy has become the critical challenge — and the security profession has not yet caught up.

The AI Development Risk Spectrum

AI-assisted development is not binary. It exists on a continuum, and risk changes in both nature and severity as organisations move along it. This paper introduces the AI Development Risk Spectrum: a four-level framework for understanding how assurance requirements change as AI involvement increases.

Level 1 — Human-Led Development with AI Assistance. Humans define architecture, review all code, and retain control of design decisions. AI tools accelerate repetitive tasks. Existing SDLC controls remain substantially intact. Risk is relatively low and incremental.

Level 2 — AI-Augmented Engineering. AI generates significant portions of code. Humans review outputs but may not fully understand all generated logic. Development velocity increases substantially. Key risks: technical debt accumulation, hidden vulnerabilities in generated code, inconsistent application of secure coding standards.

Level 3 — AI-Driven Development. AI agents create significant components or entire features autonomously. Human review becomes selective rather than comprehensive. Development becomes orchestration and validation rather than implementation. Key risks: substantially reduced traceability between requirements and implementation, selective review creating undetected vulnerability introduction, reduced accountability for design decisions.

Level 4 — Full Vibe Coding. Users describe requirements in natural language. AI creates software. Code review is minimal or absent. The creator may not understand the resulting codebase at all. Key risks: all risks from lower levels, compounded by complete opacity of implementation, no defensible accountability chain, high probability of systemic vulnerabilities, near-certain technical debt, regulatory non-compliance.

Most organisations operating at Levels 3 and 4 have no assurance infrastructure commensurate with the risks they are running. That gap is not theoretical, it is a live regulatory exposure under the EU Cyber Resilience Act and the forthcoming UK Cyber Security and Resilience Bill.

Why Traditional Assurance Approaches Fail Here

The dominant assurance paradigm in software security is compliance-based: organisations are assessed against defined processes, controls and documentation requirements. The implicit assumption is that adherence to a defined process produces a secure product. Evidence of compliance is taken as evidence of security.

This model has limitations even in traditional development environments. In AI-assisted environments, it fails more fundamentally. Compliance frameworks assume that humans write code, that development follows predictable SDLC stages, that design decisions are documented and traceable, and that developers understand what they have built. None of these assumptions reliably hold in AI-augmented or vibe coding environments.

When software is generated rather than written, organisations frequently cannot answer basic questions that any competent security assessment would ask: what prompts were used to generate this code? Which AI models were involved? What validation was performed on the outputs? Who reviewed the result? Who accepted the architectural risk? Who is responsible for a vulnerability introduced by an AI model, not reviewed by a human, and now exploited in production?

These are not edge cases. They are characteristics of mainstream AI-assisted development practice at Levels 2, 3 and 4. An assurance framework that cannot accommodate them is not fit for purpose in the current environment. A compliance checklist designed for human developers cannot provide meaningful confidence in a product built by an AI agent. The question it answers is not the question that matters.

What Principles-Based Assurance Asks Instead

Principles-Based Assurance (PBA) starts from a different premise. Rather than asking "did you follow a defined process?", PBA asks: "can you demonstrate, through objective evidence, that the security outcome has been achieved?"

This distinction is not merely semantic. It has profound implications for what counts as evidence, who can provide assurance, and how assurance findings translate into genuine confidence in a product. Under PBA, a manufacturer that uses AI-assisted development and has strong outcome-focused controls can receive credible assurance. A manufacturer that follows a traditional SDLC but produces demonstrably insecure software cannot.

PBA evaluates security outcomes across five principles that remain constant regardless of how software was developed.

Secure Development — does the manufacturer have appropriate controls to ensure security is embedded in the development process? For AI-assisted development, this includes prompt governance frameworks, controls on which AI models can be used, review processes applied to AI-generated outputs, and AI-specific security testing.

Secure Build and Release — can the manufacturer demonstrate integrity in how software is assembled and released? This principle is largely development-method-agnostic: regardless of how code is created, the software delivered to customers should be the software that was produced and tested, with appropriate controls over the dependency chain.

Vulnerability Management — does the manufacturer have an effective capability to identify, assess and remediate vulnerabilities throughout the product lifecycle, including those introduced by AI models rather than human developers?

Security Update Capability — can the manufacturer deliver timely, effective security updates throughout the supported lifecycle, including for AI-generated components that may not be fully understood by the engineering team?

Governance and Accountability — does the manufacturer have defined accountability for security decisions and a governance framework that ensures security risks are understood and managed? As AI adoption increases, governance becomes more important, not less. The question is not whether AI was used. The question is whether the resulting risks were understood and managed by people who are accountable for the product.

The NCSC CRTF Scheme: Built for Outcomes, Not Methods

The NCSC Cyber Resilience Test Facility (CRTF) scheme provides the structured delivery mechanism for Principles-Based Assurance in the UK. Facilities operating under the scheme are accredited by UKAS as the UK national accreditation body, ensuring consistent delivery standards and methodological rigour.

The CRTF scheme was designed from the outset to be outcome-focused rather than prescriptive about development methodology. This makes it one of the few structured assurance frameworks that is genuinely applicable to AI-assisted development without requiring interpretive adaptation. A CRTF assessment evaluates whether a manufacturer can provide credible, objective evidence that security principles are being met across the product lifecycle — regardless of whether that lifecycle involved human developers, AI agents, or both.

The assurance report produced by a CRTF assessment provides independent evidence of security outcomes that can be presented to customers, regulators and procurement partners without requiring disclosure of commercially sensitive development detail. It is recognised by the NCSC and can be referenced in procurement requirements and regulatory submissions. It is also a foundation for ongoing assurance that can be updated as products evolve and development practices change — which, in an AI-assisted development environment, they will.

The Regulatory Direction

Both frameworks that will most significantly affect UK and EU product manufacturers are moving in the same direction as PBA.

The EU Cyber Resilience Act requires manufacturers to conduct security assessments, maintain vulnerability management programmes, and provide security updates throughout the supported product lifecycle. For products classified as critical, third-party conformity assessment is required. The CRA's requirements align closely with the principles assessed under the CRTF scheme.

The UK Cyber Security and Resilience Bill adopts an outcome-based approach, requiring manufacturers to demonstrate that products meet security objectives rather than prescribing the development methods used to achieve them. The Bill is expected to introduce regulatory powers that will make third-party assurance a de facto requirement for products serving critical national infrastructure and government customers.

Manufacturers using AI-assisted development without corresponding assurance infrastructure face significant regulatory exposure under both frameworks. Manufacturers that engage with CRTF-based PBA proactively will find that the evidence frameworks developed through assurance engagement directly support regulatory compliance, and that the structured assurance reports produced provide defensible documentation for regulatory inquiries.

Key Takeaways

AI-assisted development now spans a wide spectrum from augmented engineering to fully autonomous software generation. Security risk escalates significantly as human oversight diminishes.
Traditional compliance frameworks assume human-led development, documented processes and linear SDLC stages. These assumptions no longer hold in AI-augmented environments at Levels 2, 3 and 4.
Principles-Based Assurance evaluates security outcomes, not development methods — making it uniquely well-suited to a world where how software is built may be opaque, automated or AI-generated.
The NCSC CRTF scheme provides a structured, UKAS-accredited delivery mechanism for PBA. It is one of the few assurance frameworks explicitly designed to remain valid regardless of the underlying development model.
Emerging regulation — the UK Cyber Security and Resilience Bill and the EU Cyber Resilience Act — is moving toward mandatory outcome-based assurance. Organisations that engage with PBA now will be better positioned for regulatory compliance.
Manufacturers using AI-assisted development without assurance infrastructure face a live regulatory exposure, not a future one.
As AI lowers the barriers to software creation, assurance becomes not less important, but more so.

Download the Full White Paper

The full paper covers the complete AI Development Risk Spectrum with risk characterisation at each level; the core assumptions of traditional assurance that break down in AI-assisted environments; the five PBA principles and how evidence requirements evolve for AI-generated code; the CRTF scheme assessment areas mapped to AI development risk; regulatory implications for CRA, CS&R Bill, NCSC CAF and NIS2; and practical guidance for manufacturers, technology buyers and boards.

Free to download. No registration required.

Vibe Coding and Cyber Assurance: Why AI-Generated Software Needs a Different Kind of Security Assessment

Andrew Jones