The Future of Developer Environments at Shopify
In the previous posts we talked about our past experiences with Artisinal Systems, Boxen, and Vagrant as well as our current system using Railgun and Dev. In this post, we're going to talk about some future plans and give a few forward thoughts.
Railgun and Dev accomplish most goals most of the time. We do not allow enough room for innovation or differing tech stacks. How can we allow more teams to innovate while making the normal tech stack glaringly easy to use? We have a few ideas which we’ll discuss below.
Minikube
Minikube is a system built on top of Kubernetes, a container scheduling system. With our current operations being built on top of Kubernetes, this seemed like a good opportunity to consolidate developer and production environments, educate our engineering department about Kubernetes, and give customizability to environments.
This presented us with a unique opportunity to provide some alleviation from our previous issues while also giving a perfect spot for people to learn about Kubernetes. We rolled out a beta version of this system to a number of users to collect data, learn about weak points, and provide more overall context of the system.
Unfortunately, the beta resulted in poor performance in a number of aspects. We experienced issues in a number of areas, which are listed below:
- Previously, we mentioned that the “Broken” to “Stopped” state needs to be fast, however with Minikube we could not accomplish this. It would sometimes take far too long to return to a default state and this unfortunately happened too often. One reason for this is that Minikube used Kubenetes default behaviour to try and “self heal” while we have no issues with simply killing the system and rebooting from scratch. There were also far more failure scenarios.
- The first boot was exceptionally painful. Minikube relied a lot on external factors that we could not control and for which Mac provided little guidance. This often caused the initial install to fail and be very painful.
- Dynamic Host Configuration Protocol (DHCP) failed often and caused conflicts with other resources trying to use the same system (namely Railgun conflicted a lot).
- Pulling containers took a lot time, as did Minikube ISO images. This caused a lot of contention and would not work well with our remote developers.
In the end, we decided to call our own sunk cost fallacy and abandoned the effort. However, The system was left in place for us to use in infrastructure related experiments.
Online Developer Environment
Minikube provided us with a good entry point into Kubernetes, but failed on setup due to time constraints and flaky scenarios. Using this experience, we prototyped a remote developer environment where the services such as MySQL and Redis would run in the cloud. This experiment proved an overall success, but brings a lot of uncertainty. What is the impact on the network round trip? What happens to developers using a poor internet connection? We take away offline capabilities - though we deemed this acceptable. In the end, we kept the work as an experiment and decided to work on Railgun V2 instead.
Railgun V2
Railgun provides a lot of benefit with little downside. If we look at our goals, production equivalence is a nice to have after the other goals so we decided to work down this path some more. Railgun V2 is being designed around Docker instead of a custom virtual machine. Docker will provide us an easy way to use any containers that we build into our Google Cloud Container Registry. This means that the default environment and services will be available by default and other environments are able to be used granted they are uploaded to our container registry. So far, we even gained a few seconds in performance in boot.
Conclusion
In the end, it becomes apparent that scaling a developer environment is not a trivial task. Each system is uncontrolled and brings with it a varied setup. Going into a solution, we need to keep in mind that we need to make the experience quick and easy for the everyday developer, but keep in mind that our own maintenance burden cannot suffer either. While our current solution using Dev and Railgun is good, we can aim to be better and will be working towards a more customizable, quick, and well communicated system.