Techincal
Prodvana Architecture Part 4: Runtime Interface & Overall Conclusion
Naphat Sanguansin
In our previous posts, we explored how the Prodvana Compiler builds a desired state and how the Prodvana Convergence Engine uses the desired state to decide what changes to make. This post will examine changes are applied across the numerous backends that must be supported for brownfield application.
Kubernetes, while dominant, is not the only type of workload runner. This is evident by the continued success of AWS ECS and serverless solutions such as Google Cloud Run and AWS Lambda.
Deployment systems that are backend-specific end up creating technical and cultural silos, which in turn leads to organizational inefficiency. We built the Prodvana Runtime Interface to ensure that Prodvana is backend-agnostic and minimizes migration costs for various Runtime types.
Each Runtime needs to be able to satisfy at least two interfaces:
fetch
- return the current state of a Service in the Runtime.apply
- runs the command(s) to bring the Service to a desired state
Additionally, because the Prodvana Runtime Interface gives us access to user environments, it must also be designed with a trustworthy security model.
The Kubernetes Runtime Interface
Prodvana Kubernetes Agent - Establishing a Secure Connection
To implement support for Kubernetes, we first have to establish a connection. We do so by having users run the Prodvana Agent inside their Kubernetes cluster. The Prodvana Agent securely connects to the Prodvana API and, after a mutual handshake, establishes a secure connection between Prodvana and the Kubernetes API server.
The Agent architecture ensures our connection to the user clusters is secure and trustworthy.
No credentials are exchanged or stored on Prodvana.
The Prodvana APIs the Agent communicates with are behind a user-specific IP address and can be permitted as needed.
By deleting the Prodvana Agent, users can terminate all operations from Prodvana.
Satisfying the Runtime Interface
Once the connection is established, we must implement the Runtime Interface.
fetch
fetch
is implemented via the Kubernetes API client. Because the Prodvana Convergence Engine continuously polls the Runtime about Services running in it, we must avoid overloading the Kubernetes API server. This is done via the use of watchers.
apply
apply
is implemented by calling out to kubectl apply
. This ensures that apply
actions taken by Prodvana match exactly what users would do on their own and can be replicated for debugging purposes. Additionally, kubectl apply
wraps various Kubernetes API calls in a non-trivial way that would be fragile to replicate.
Custom Runtimes
Kubernetes satisfies many user workloads - but not all of them. We found ourselves with many Runtime types to implement, each with unique challenges, different security models, and diminishing user base sizes. Additionally, we expect to need to support users with entirely in-house Runtime implementations.
These two requirements led us to a key insight: make it possible for Runtimes to be built outside the Prodvana codebase and use that interface to implement first-class Runtimes as we detect commonalities between users.
We call this class of Runtimes “Custom Runtimes” to denote that they are built outside the Prodvana codebase.
Kubernetes Jobs as a Building Block
Kubernetes gives us a solid foundation to build on: the Job resource. With a Kubernetes Job, we can enable anyone to implement a Runtime as long as they can create a Docker image. We can even provide an optional configuration interface to abstract away the complexity of Kubernetes jobs for simple commands.
Building on Kubernetes Jobs means that users will need a Kubernetes cluster, even if they only have non-Kubernetes workloads. This is a tradeoff we accept today based on conversations with users, but one we can remove by adding new in-codebase Runtime implementations that can serve as job runners.
Security Model
Because Custom Runtimes run as Kubernetes Jobs, they can access the same secrets that users already use for application-level code. For example, users can use Kubernetes Secrets or a third-party secret vendor and grant access via a Kubernetes Service Account. No credentials are exchanged or stored on Prodvana.
For Custom Runtimes we implement, we further ensure the use of the best-in-class secrets model for that Runtime. For example, we use role-based credentials instead of service-account-based ones for ECS.
Minimizing Migration Cost: an Incremental Approach
To minimize migration costs onto Prodvana for users with Custom Runtimes, we need to make it simple to define them. To that end, we take a tiered, incremental approach.
apply
-only
Users with non-Kubernetes workloads usually already have commands to update the workloads. These commands function exactly like apply
in the Runtime Interface, so we make it possible to define a Custom Runtime with just an apply
command.
When a Custom Runtime only defines apply
, Prodvana will run the apply
command once and mark the Service as converged when the command succeeds. Recall that this is precisely the behavior of the Prodvana Convergence Engine when an entity defines an apply
and not a fetch
.
apply
and simple fetch
Many Runtimes can detect if apply
would do any work before apply
runs. For example, Terraform has a plan
command that can determine if there are any changes to be made. We use this ability for the simple fetch
interface: run a command that exits 0 or 2, where 0 indicates no work to be done and 2 indicates a drift. The choice of exit code 2 is intentional here, as 1 is commonly used as an unexpected error by various CLIs.
When a Custom Runtime defines both apply
and fetch
, apply
will only run if fetch
indicates a drift. This can save expensive, unnecessary work and allow Prodvana to skip Release Channels in the convergence of a Service.
apply
and structured fetch
Output
Lastly, some Runtimes, like Kubernetes, allow workloads to be annotated. For these Runtimes, we allow fetch
to return a JSON explaining exactly what is running at what version.
In this mode, Custom Runtimes function like any natively implemented Runtimes, with the output of fetch
being compared to the desired state to determine if apply
should run.
In the above example, if the desired state is for version svc-1
, then there is no work to be done, and apply
would not run. If the desired state is for version svc-2
, then apply
will run.
A Tiered Approach
Notice that each tier of Custom Runtimes is increasingly more complex to implement. We expect most users only to implement apply
, some to implement simple fetch
, and very few to implement structured fetch
output.
However, by ensuring that the Custom Runtime interface is sufficiently robust, we can implement first-party Runtimes as Custom Runtimes while still providing a first-class experience to our users.
Additionally, because each tier is incrementally built upon the previous tier, users can start simple and “upgrade” by investing in the Custom Runtime as they see fit.
Parameterizing Custom Runtimes
Without a way to parametrize Custom Runtimes jobs, the Custom Runtimes would not be able to differentiate between different Services.
Custom Runtimes accept parameters just like Services do:
Parameters are then passed in from the Service Configuration when using Custom Runtimes:
Additionally, a default set of environment variables is injected to both apply and fetch with Service-level information:
PVN_SERVICE
PVN_SERVICE_ID
PVN_APPLICATION
PVN_APPLICATION_ID
PVN_RELEASE_CHANNEL
PVN_RELEASE_CHANNEL_ID
PVN_SERVICE_VERSION
First-Class Custom Runtime Implementations
We have built the following Custom Runtimes as first-class in Prodvana.
Terraform and Pulumi Runners
Terraform Runner is a Custom Runtime that executes Terraform modules.
simple
fetch
- Run terraform plan, return drifted if plan indicates there is work to be done.apply
- Run terraform apply.
The Pulumi runner is implemented similarly to Terraform.
Source Code: Terraform, Pulumi
ECS
The ECS Runtime allows users to use Prodvana to manage services on ECS.
structured
fetch
output - Use AWS CLI to determine the number of running replicas and their versions based on AWS tags. Return a single Runtime object of type ECSService.apply
- Use AWS CLI to create/reuse task definition with tags for Prodvana Service ID and version, create the ECS service if it does not exist, and update its task definition.
Source Code: ECS Runtime
Google Cloud Run
The Google Cloud Run Runtime allows users to use Prodvana to manage services on Google Cloud Run.
structured
fetch
output - Use gCloud CLI to determine the number of running replicas and their versions based on annotations. Return a single Runtime object of type CloudRun.apply
- Use gCloud CLI to apply the Cloud Run config with added annotations for Prodvana Service ID and version.
Source Code: Google Cloud Run
Learnings
The initial Kubernetes Runtime implementation required users to create and store credentials on Prodvana. This did not meet our bar for security, as it relied on static credentials (or we would have to implement cloud-provider-specific credentials rotation) and would not support clusters with private IPs. As a result, we rewrote the implementation to be agent-based. Additionally, requiring credentials was an unintuitive operation that often failed during onboarding. Agent-based connection meant that users only had to ensure they had
kubectl
authenticated with permission to create resources, which most already do.Engineering organizations always have more than one runtime and usually have multiple runtime types.
Results
Our Runtime implementation has allowed users to manage cloud-native and legacy compute workloads in one system.
Teams have used Custom Runtimes to orchestrate non-compute workloads, such as build systems and static content pushing.
For ourselves, we have been able to implement new types of Runtimes, such as ECS, in hours, not days or weeks.
Overall Conclusion
Prodvana’s Dynamic Delivery addresses the challenge of coordinating applications and infrastructure for platform teams looking to unify workflows and support sophisticated architectures without needing large migrations.
Prodvana offers a powerful solution for a wide range of architectures by embracing intent-based requirements, adaptability, and real-world changes. Users have seen 50% greater deployment frequency, increases of >20% in discovering issues before production, and increased user satisfaction.
If you’ve found this deep dive interesting and see similar challenges in your organizations, please contact me! We love feedback and learning about other ways platform teams build abstractions.