AI Safety and DevOps Strategy

AI Safety

How We Summit Mountains uses AI safely and responsibly for development purposes.

What are the dangers behind AI for development?

AI as a development tool can be effective and can speed developers up tremendously, but using AI for development without safety controls is irresponsible. We Summit Mountains has developed sophisticated environment controls to give power to our team to use AI while removing all risks we are aware of.

The biggest risks we have identified so far:

Unprompted Actions In Unauthorized Environments

An AI agent acting beyond the scope of what was asked of it, inside an org or system it should never have touched, is the single most dangerous failure mode. We restrict AI work to a Dev Sandbox under direct human direction, with no write path into production, so this class of mistake is always contained.

Sensitive or Protected Data Transfer to Unauthorized or Uncontrolled Locations

Customer data, PII, and proprietary configuration must never leave the boundaries of the system holding it. Our pipeline ensures AI agents only ever read from production through controlled, read-only channels, never write, so sensitive data cannot be exfiltrated or copied into an environment that is not approved to hold it.

Bad Code, Errors, Problematic Solutions

AI-generated code is not infallible. Without safety nets, a bad solution can compound into a costly cleanup. Smoke testing inside the Dev Sandbox, merge testing in a shared environment, and a final human-controlled production gate all combine to catch errors long before any client sees them.

Dev Ops With AI

Our process for keeping humans in control of deployment so that no AI mistake can negatively impact production.

Pipeline Overview

Our DevOps pipeline is built on two safety principles:

AI tools are unable to change production.
Humans must validate and QA all changes (Human and AI) before deployment.

Every stage is engineered so that an AI mistake cannot silently propagate into a client-facing system. Depending on the client, the pipeline can look different, but the ideal pipeline runs across these sequential stages:

Development (Dev Sandbox)
Merge Testing (ideally in a Dev Sandbox, but could also be performed inside of a user testing environment that is not production)
Training & Fit Testing (User Testing Environment like Partial Sandboxes or Full Copy Sandboxes)
Production Deployment (100% Human Deployment)
Production Feedback

All deployments other than deployment to production can be orchestrated by our DevOps AI agent, ASPEN.

Work moves between Salesforce orgs using GitHub as the system of record, which gives us a complete, auditable, and reversible history of every change made by either a human or an AI.

Stage 1: Development (Dev Sandbox)

The Development stage is where work can be done either by a human developer or by an AI agent directed by a human. All of that work happens inside a Dev Sandbox, which is the only environment where new metadata or new code is authored. The Dev Sandbox is fully isolated from production, so anything that goes wrong here is contained.

Smoke testing happens at the end of this stage so that broken work never leaves the Dev Sandbox. Whether the change was authored by a person or by an AI under human direction, it has to pass smoke testing before it is eligible to move into Merge Testing.

Stage 2: Merge Testing

Once a change has been smoke-tested in the Dev Sandbox, it needs to be merged with everything else that is in flight and validated as a whole. Merge testing is ideally performed in a Dev Sandbox dedicated to merging, but it can also be performed inside of a user testing environment (that is not production !).

The goal is simple: confirm that all of the changes work together if there's any crossover between the two different development features. Anything failures are patched in the originating Dev Sandbox, so the GitHub source of truth always reflects reality. ASPEN can orchestrate this merge and the deployment into the merge environment, because the target is never production.

Stage 3: Demos, Training, & FIT Testing

Once the merge is clean, the combined release is deployed into a User Testing Environment (typically a Partial Copy Sandbox or a Full Copy Sandbox). This is where real users from the client team validate the release against their actual workflows, with real-shaped data, before anything is allowed to approach production.

Two kinds of testing happen here:

Training

Client users are trained against the new functionality in an environment that mirrors what production will look like after deployment. This makes go-live a non-event for the team using the system.

FIT Testing (User Acceptance Testing)

Client users validate that the technical merge actually produces the right business outcome. Anything that does not fit gets routed back to the appropriate Dev Sandbox for rework.

ASPEN can orchestrate the deployment from merge testing into the User Testing Environment, because the User Testing Environment is still not production.

Stage 4: Production Deployment (100% Human)

This is the deliberate choke point in our pipeline. Once Training & Fit Testing has signed off, the move to production is performed by a human, every time. ASPEN does not deploy to production. ASPEN does not have credentials to deploy to production. By design.

The deployment uses the same project GitHub repo as the source of truth, deploying the validated package set into the production org. If anything goes wrong, the GitHub history makes the rollback easy.

Stage 5: Production Feedback

Once a release is live, real users provide feedback against real production data. That feedback shapes the next round of development work back at Stage 1, closing the loop. The AI agent has no write path into the production org at this stage or any other, so there is no way for production feedback to be acted on by an AI without first passing back through human-validated Dev, Merge, and Fit Testing stages.

Read-Only Production Access For AI

Our development AI agents are never allowed to change any data or metadata inside of a production org. That is the single most important rule in our entire DevOps strategy.

However, AI is most useful for debugging when it can see what is actually happening in production. To support that without compromising the rule above, we expose production to AI through a dedicated MCP server that is hard-wired to be read only. Through that read-only channel, an AI agent can:

Read debug logs from production
Read Flow errors and Flow execution history
Read Apex errors and exception emails
Read object and field metadata for diagnosis

What the MCP server cannot do, by construction:

Update, insert, or delete records
Modify metadata or deploy code
Run anonymous Apex or invoke Flows
Grant itself any permission it does not already have

The read-only constraint is enforced at the MCP server boundary, not just by policy. There is no API surface, no credential, and no code path by which the AI can write to production.

Why This Pipeline Keeps AI Safe

The safety properties of our DevOps pipeline are not accidents. They are the explicit goal of the architecture:

AI tools are unable to change production. Period.
Humans validate and QA every change, AI-authored or human-authored, before it deploys.
AI agents only operate inside a Dev Sandbox and only under direct human direction.
Every change is smoke-tested in the Dev Sandbox, merge-tested together, and fit-tested with real users before it can reach production.
GitHub is the system of record, giving full auditability and reversibility for every change.
Production deployment is 100% human. ASPEN has no credentials to push to prod.
AI access to production is read-only and enforced at the MCP server boundary, not just by policy.

The result is a development pipeline that captures the speed of AI-assisted development while ensuring that no AI mistake, and no AI hallucination, can ever directly impact a clientΓÇÖs production Salesforce org.