Industry Thoughts
CD’s complexities: A Product Engineer’s perspective
Rohit Khanwani
As a Product Engineer, I’ve seen firsthand how CD systems can slow down product development in growing engineering teams.
Operator complexity is one of CD’s fatal flaws.
In a growing org, the complexity of your delivery system will expand as the team grows, pulling resources away from product development to address the problems you’ll inevitably face. From what the team at Prodvana has seen, this experience is common in growing companies. Here are the three stages of deployment evolution in a growing engineering org:
All is merry and bright in a still world.
When I joined a small startup ( < 10 engineers) as a Product Engineer, I had the intention of building full-stack product features. In my first couple of months on the job, I did a lot of classic product-feature development work—writing APIs, building UI components, etc. Once my code was committed, it magically ended up in production! As an engineer, naturally, I was curious, I wanted to know what was this magical process that happened!
What I found out was that the delivery workflow was run by one person (the person who built it). Every few days or once a week, this person would handle deployments. Delivery was bottlenecked on this single person, but the team was small enough that it didn’t impact velocity too much. Development velocity was good, and product engineers could focus on feature work.
One step forward, half a step back
When the organization scaled and added a few more product engineers, we decided it was worth the investment to remove the delivery bottleneck. We set a goal to enable product engineers to own and deliver features independently. Engineers didn’t necessarily want to deploy their own code, but they didn’t want to be bottlenecked.
However, being a small startup we were resource constrained and had a requirement to not invest too much (i.e. we couldn’t go build a full platform and designate full time engineers to operate it). The idea was to pull a few people off (~15% of the engineering team) to build something in a week or two, then have them go back to building product features. I had some infra experience from a previous role, so I helped with the project.
Given the constraints, we decided to go with the most obvious choice: build CD pipelines into our CI system. However, the generic nature of CI systems resulted in a verbose system that was complex to use. Some real problems we faced were:
Promotions were branch-based (main, staging, production etc.) which led to engineers having to deal with Git complexities to manage the environments. Like if they wanted a change to go from dev to staging, they’d have to rebase main onto staging.
Workflows were not incremental, so the time taken to promote dev to staging was the same as deploying to dev because the CI system would rebuild with every step.
Product engineers had to learn how to use this system, which wasn’t easy because of the complexity. For example, to add a service, an engineer would have to copy & paste configs to get the new service into the CD pipeline. Usually they didn’t understand the configs they were copying, or they would have to spend a lot of time learning CircleCI, which was not relevant to their actual job.
It wasn’t great, but we removed the delivery bottleneck and, most importantly, we time-boxed the project so that everyone involved could get back to building product features. Unfortunately, we didn’t foresee the maintenance cost going forward. And since we built the CD pipelines… we now owned them.
Endless pit of improvements
The complexity I described above only grew as the engineering team continued to scale. As more engineers joined the team and the product grew, the complexity of our CD system scaled in lock step. Our once “time-boxed” project was now an endless pit of improvements, needing significant investment to keep up with the complexity. The “you-build-it-you-own-it” nature of engineering orgs kicked in, and I was asked to join the expanding effort to improve our delivery system.
We kept needing more pieces of pipeline. New services needed to be spun up. We needed to add continuous testing with Cypress. We decided to build a lightweight wrapper around git commands to alleviate the pain point mentioned earlier around git complexity.Immediately afterward, we hit the limit for CircleCI YAML config and had to spend time trying various things to trim the size. Then we had to invest in building a config generator using Starlark because we needed an easy way to manage all the copy/pasted config bits. And even after all this effort, rollbacks were hard, coordinating delivery between dependent services was hard, and managing migrations was hard.
Because I had spent so much time building delivery systems, when the org expanded and teams were split up by purpose, it made sense for me to move to the Platform Engineering team. I learned a lot, got to build cool stuff, and didn’t mind Platform work at all.
But the currents introduced by the operator complexity of CD systems moved me in a certain direction and I ended up far away from what I set out to do when I had first joined the company as a Product Engineer.
This CD journey is common in startups. As your engineering team grows, the complexity of your delivery system grows. Companies must make large investments and pull resources off of feature work to keep up. When I left the company, an effort was kicking off to build an entirely new CD system internally - one that would address flaws mentioned earlier around rollbacks and coordination.
Now I’m working at Prodvana, which solves the operator complexity problem so that companies don’t have to keep chasing it. Prodvana’s Dynamic Delivery Platform uses a declarative desired state for your application and then intelligently finds the best delivery path, so that you don’t have to build all the complex logic directly into pipelines.
If you want to learn more, book a demo with Prodvana to see how we can help you stay focused on product development and ship code faster.