Architecture for Regulatory Compliance in FSIs

You're in a board risk committee. The CRO turns to the Head of Technology with a single question.

"We've been on public cloud for three years. Can you demonstrate right now—not after a review—that our critical operations will stay within tolerance if our primary cloud provider degrades for six hours?"

The room shifts. Not because anyone doubts the architecture. Because nobody can connect what's deployed to what the regulator will ask next.

This is the gap that prudential standards expose. Not a technology gap. A translation gap between what your cloud platform does and what your institution must prove it can survive.

The obligation that reshapes everything

APRA's prudential standards don't mention Infrastructure as Code (IaC), Kubernetes, or availability zones. They don't specify encryption algorithms or prescribe how many regions you need. They care about outcomes: tolerance, continuity, evidence, accountability.

CPS 230 requires you to identify critical operations, define tolerance levels for each, maintain those operations through disruption, and prove you can do it under severe but plausible scenarios. CPS 234 requires information security controls proportionate to risk, tested with commensurate frequency, including controls operated by third parties.

No architecture is prescribed. Every architecture is constrained.

The first post in this series described a 2:17am incident. A major cloud dependency degrading. Customer channels stuttering. Payments backing up. Someone senior asked the only question that mattered: "Are we still operating within tolerance?"

The second post explored why answering that question with certainty is a leadership problem, not a technical one.

This post is about the architectural preconditions that make the answer possible. And the tooling that makes it continuous.

Prudential Obligations in Cloud Terms: What CPS 230 and CPS 234 Actually Demand

Read CPS 230 and CPS 234 not as compliance documents but as architecture requirements specifications. Both make the same foundational demand: controls must be designed, operating effectively, monitored continuously, and producing evidence without human intervention. Here's what each standard requires in cloud engineering language.

From CPS 230—Operational Resilience and Continuity

Critical operations register with defined tolerances. Every institution must maintain a register of critical operations: payments, settlements, clearing, deposit-taking, customer enquiries, and the systems supporting them. Each must have explicit tolerance levels. Maximum allowable disruption time. Maximum acceptable data loss. Minimum service level during degradation.

In cloud terms, this means your SLOs aren't aspirational targets. They're prudential commitments. Your error budgets aren't engineering conveniences. They're the operational mechanism that proves you're inside the boundary.

Business continuity tested under severe but plausible scenarios. Annual exercises across critical operations. Including scenarios involving material service providers. In cloud, your cloud provider is the material service provider. Failover that has never been tested isn't a capability. It's a hypothesis. CPS 230 requires you to prove it works—not describe how it should.

Material service provider governance. Your cloud provider isn't just a vendor. Under CPS 230, material arrangements require formal legally binding agreements with defined service levels, audit access, subcontractor transparency, and termination rights. The standard includes contractual provisions supporting APRA's access to information and ability to conduct on-site visits.

Fourth-party risk (the providers your providers rely on) must be managed. An orderly exit path must exist and be credible. You do not outsource accountability. You outsource execution and govern it as if it is yours. Because under the standard, it is.

From CPS 234—Information Security and Threat Detection

Security controls proportionate to risk. Not maximum controls. Proportionate controls. The architecture must enforce controls at the configuration level—not through documentation or periodic reviews. Encryption of data at rest and in transit. Network segmentation isolating critical workloads. Access controls limiting who can operate or modify what. All enforced structurally, not hoped for.

Controls tested with frequency commensurate to risk. Annual testing isn't enough for high-risk systems. For systems handling critical data or controlling critical operations, control effectiveness must be verified continuously. A control that was compliant at deployment and drifted last Tuesday should be flagged the same day, not discovered during an annual audit.

Threat detection and incident response. Proportionate to criticality. Systems handling critical operations require real-time detection of anomalies, unusual access patterns, or unauthorized changes. The standard doesn't prescribe how. It prescribes the outcome: capability to detect and respond to material security incidents within your response SLA.

Common Requirement Across Both Standards

Evidence that controls are operating effectively, continuously. This is where both CPS 230 and CPS 234 converge. A policy that exists in a wiki but isn't enforced by automation isn't operating effectively. A security control that depends on someone remembering to check it fails this test. A compliance check run quarterly and discovered to have drifted isn't meeting the standard's intent.

The standard demands evidence that controls work. Continuously, without human variability.

The notification clocks. Twenty-four hours after a disruption to a critical operation outside tolerance (CPS 230). Seventy-two hours after becoming aware of a material operational risk incident (CPS 230). Seventy-two hours after becoming aware of a material information security incident (CPS 234).

These aren't just incident response timelines. They're hard design constraints on your detection, classification, and escalation capability. If your observability can't meet those clocks, the architecture isn't finished.

Preventive Controls: Policy-as-Code Enforcement Across Cloud Platforms

Both CPS 230 and CPS 234 require controls that are operating effectively without human variability. The engineering answer is the same: encode your controls into the deployment pipeline so they execute automatically, every time, without exception.

This is where policy-as-code stops being a DevOps convenience and becomes a regulatory control.

Configuration policy frameworks at the organization level. All major cloud service providers offer organization-level policy enforcement mechanisms that let you define the outer boundary of what any workload or team can deploy. These policies block non-compliant actions at the API layer before resources are created. A developer cannot provision a data store with public network access, cannot skip encryption configuration, and cannot omit required logging tags.

The three-layer model works across all platforms: Define the control requirement (what must be true), create a policy expression that enforces it, and assign that policy to organizational scopes (teams, departments, workload risk tiers). A team deploying payment systems inherits stricter policy assignments than a team deploying internal analytics. A policy that says "all encryption keys must be customer-managed" applies uniformly across all deployments.

Automated remediation policies are the force multiplier. When a resource drifts from the required configuration, the platform auto-remediates it before an audit discovers the problem. When a resource violates a guardrail, the violation is logged with timestamp, requestor, and reason for rejection. Every control execution creates an immutable audit log.

Shift-left to the CI/CD pipeline. Layer policy evaluation into your deployment pipeline so developers see policy violations before they submit a pull request for review. A IaC that violates encryption requirements is rejected at build time. A container image that fails compliance checks never reaches production. Policy evaluation becomes part of the development flow, not a gate at the end.

Extend this with custom policy languages for domain-specific rules. If you define "all databases in the critical zone must have automated backups with 7-day retention," you can encode that rule once and evaluate it against every database deployment. Violations fail the build. Compliance becomes a development standard, not a quarterly audit exercise.

What happens at deployment time. When a developer provisions infrastructure, the pipeline validates encryption configuration, network isolation, access controls, and compliance tags before a single resource is created. A non-compliant deployment is rejected automatically. This isn't overhead. It's a control operating effectively, every time, at machine speed. Every rejection is logged, creating the continuous evidence trail both CPS 230 and CPS 234 demand.

Shift compliance left enough and it stops being a gate. It becomes the path.

Detective Controls: Continuous Posture Assessment and Threat Detection

Policy-as-code handles the preventive layer. Blocking bad deployments before they happen. But what about what's already running? Configuration drift. Emerging vulnerabilities. Misconfigured resources that passed validation at deploy time but changed afterwards. Threats emerging from behavior patterns or anomalous access.

This is the detective layer. Cloud Security Posture Management (CSPM) and Cloud-Native Application Protection Platforms (CNAPP) earn their place in the regulatory architecture. Not as dashboards for the security team. But as the continuous evidence engine that CPS 230 and CPS 234 require.

Continuous posture assessment. Cloud service providers or specialised 3rd party technologies offer tools that scan your environment continuously against baseline configurations. They assess resources against industry benchmarks (CIS, NIST, ISO 27001). They detect misconfigurations by severity. They map findings to compliance frameworks, creating a live compliance posture score.

The power is in the aggregation and correlation. Threat detection scans for unusual access patterns, anomalous data access, privilege escalation attempts. Vulnerability scanning identifies patches or deprecated components. Compliance evaluation detects drift from golden-image configurations. All three streams feed into a single console, with findings cross-referenced and prioritised by actual exploitability and business impact.

Attack path analysis goes further. It doesn't just flag individual misconfigurations. It identifies combinations of low-severity findings that together create an attack path to a high-value resource. An overly permissive role, combined with a public-facing API, combined with a missing encryption setting. Individually low-severity. Together, exploitable. Detection identifies the chain, not just the links.

Event-driven incident detection. Real-time analysis of your audit logs and network flow data. Unusual access patterns are detected within minutes, not hours. Privilege escalation attempts trigger immediate alerts. Unusual data access is caught the same day. For CPS 234's threat detection requirement, this isn't a quarterly exercise. It's continuous.

What matters is the operating model it enables.

When CSPM runs continuously, you don't prepare for audits. You export them. A control that was compliant at deployment and drifted last Tuesday gets flagged the same day, not discovered during an annual review. Misconfiguration findings route to incident management, creating automatic remediation workflows. When a vulnerability is published, your platform scans for it across all systems within hours, not weeks. Compliance posture is a live score, not a point-in-time snapshot.

CPS 234's requirement for security controls "tested with frequency commensurate to risk" stops being a scheduling problem. It becomes a platform capability. CPS 230's requirement for incident detection within the notification window (72 hours for material incidents) becomes achievable when detection runs continuously.

The combination of preventive controls (policy-as-code blocking bad deployments) and detective controls (CSPM catching drift and emerging risk, CNAPP catching runtime threats) creates a closed loop. The continuous compliance architecture.

Continuous Compliance Architecture — from regulatory obligations through preventive and detective controls to evidence generation

Figure 1: Continuous Compliance Architecture. Regulatory obligations (CPS 230 operational resilience, CPS 234 information security) drive tolerance definitions and SLOs. Preventive controls (policy-as-code at organization level) block non-compliant deployments at API time. Detective controls (CSPM continuous posture assessment, CNAPP threat detection, audit log analysis) provide real-time monitoring and detection. Evidence and response feed back into regulatory reporting—closing the loop.

This isn't a future state. Every major hyperscaler has the primitives deployed today. The gap isn't tooling. It's connecting the tooling to the regulatory obligation it satisfies and operating it as a control, not a dashboard.

Three myths that get architects in trouble

"If we follow CSP best practice, we'll be compliant." Hyperscaler best practices optimise for capability, not accountability. They'll get you a well-architected platform. They won't get you a platform that can survive a supervisory conversation about tolerance breaches. CSP's Frameworks are excellent engineering guidance but none of them map to CPS 230 obligations. Best practice is necessary. It is not sufficient.

"Compliance is the compliance team's problem." CPS 230 makes the Board ultimately accountable for operational risk management. The cloud architect doesn't carry that accountability directly. But the architectural decisions determine whether the accountability can be discharged. What's encrypted, what's logged, what's isolated, what can be tested, what evidence is produced automatically. Those are architecture decisions with regulatory consequences.

Architecture is the mechanism. Compliance is the obligation. Separate them and you get a platform that passes audits but can't answer the CRO's question in real time.

"We'll add governance after the platform is built." Governance retrofitted onto architecture creates friction. Governance designed into architecture creates speed. Policy-as-code that blocks non-compliant deployments at plan time is faster than a manual review gate. Automated evidence collection is cheaper than quarterly audit scrambles. CSPM that flags drift continuously is less disruptive than a remediation sprint before the next regulatory review.

The platform teams that build governance into the architecture from day one ship faster in year two because they're not fighting their own controls.

How to read a prudential standard like an architect

You don't need to become a compliance expert. You need a translation method. For every obligation in a prudential standard, ask three questions:

What must be structurally true about my architecture for this obligation to be met? Not "what documentation do I need?" But "what property must exist in the platform itself?" When CPS 234 requires cyber controls operating effectively, the structural answer is policy-as-code enforced at the management group or organisation level. Not a wiki page describing intended controls.

How would I prove it's true right now, without a human doing anything? If the answer requires someone to run a script or pull a report manually, the evidence architecture is incomplete. CSPM compliance scores, policy evaluation results, drift detection alerts. These are the automated proof that CPS 230 requires.

What would break this property, and would I know within my notification window? If your CSPM detects a misconfiguration but routes it to a dashboard nobody checks until Monday, you've built a detective control with no operational response. The 24-hour and 72-hour clocks demand detection and classification and escalation within those windows.

Apply these three questions to every significant obligation and you'll produce architecture requirements that are more precise, more testable, and more defensible than anything a generic compliance checklist will give you.

The real test isn't the audit. It's the question you can't rehearse.

There's a pattern in regulated cloud engineering. The teams that treat prudential standards as a cost centre build platforms that pass audits and slow down delivery. The teams that treat them as design constraints build platforms that are more observable, more testable, more resilient. And faster. Because the constraints forced clarity that most organisations avoid until it's too late.

Consider where your platform sits right now.

Can you name your critical operations and state the tolerance level for each? Specific numbers, not a document nobody has read since it was written.

Do your SLOs map directly to those prudential tolerance levels, or do they measure platform convenience?

Does your platform produce regulatory evidence automatically through CSPM and policy-as-code, or does someone have to go find it before the next review?

Are your security controls proportionate to workload risk structurally—through landing zone tiering, network segmentation, and policy inheritance—or just on paper?

Can you test your resilience without a maintenance window?

And when something breaks, will your detection and classification capability meet the notification clocks? Twenty-four hours for a tolerance breach. Seventy-two hours for a material security incident.

If you can answer all of those with confidence, you've built a regulated platform. If you can't, the gap between those answers and your current architecture is your real risk posture.

The question is whether you discover that gap in a design review or in a supervisory conversation. One is an engineering problem. The other is an institutional one.

What's the one regulatory obligation that changed how you actually design cloud architecture, not just how you document it?

This is post three of a series on cloud engineering and leadership in financial institutions. Post one covered why cloud in a bank is nothing like cloud anywhere else—the operating model that prudential obligations demand. Post two explored what nobody tells you when you become a cloud lead—the identity shift from technical expert to governance leader. Next: a maturity model for banking cloud—the five layers that separate a platform that is deployed from one that is operated.

APRA Doesn't Care About Your Cloud Architecture — Until It Does

The obligation that reshapes everything

Prudential Obligations in Cloud Terms: What CPS 230 and CPS 234 Actually Demand

Preventive Controls: Policy-as-Code Enforcement Across Cloud Platforms

Detective Controls: Continuous Posture Assessment and Threat Detection

Three myths that get architects in trouble

How to read a prudential standard like an architect

The real test isn't the audit. It's the question you can't rehearse.

Comments

Cloud Velocity Without Compromise in FSI

The Senior Cloud Engineer Who Just Became a Cloud Lead: What Nobody Tells You

More from this blog

The Senior Cloud Engineer Who Just Became a Cloud Lead: What Nobody Tells You

Why Cloud in a Bank Is Nothing Like Cloud Anywhere Else

Command Palette

The obligation that reshapes everything

Prudential Obligations in Cloud Terms: What CPS 230 and CPS 234 Actually Demand

Preventive Controls: Policy-as-Code Enforcement Across Cloud Platforms

Detective Controls: Continuous Posture Assessment and Threat Detection

Three myths that get architects in trouble

How to read a prudential standard like an architect

The real test isn't the audit. It's the question you can't rehearse.

Comments

Cloud Velocity Without Compromise in FSI

The Senior Cloud Engineer Who Just Became a Cloud Lead: What Nobody Tells You

More from this blog