Everyone who has been around operations for a while has one of these systems. It was hand-built by someone else many months or years ago, and you’re afraid to touch it. It has been snowflaked so badly that if you run apt-get upgrade, it will probably explode.
I had one of these, and it ran our Puppet Master and Foreman instance. Not an insignificant system in our infrastructure.
Then one day, it exploded.
Given the opportunity to rebuild a system, I like to consider the right way to do things. When this particular system needed a rebuild, my team came together and talked about strategies going forward. We had several options:
- Build it the way it was. Re-create the hand-configured massive monolith
- Break this thing down into several smaller manually configured servers
- Find a way to run the server using automation, in an architecture that allowed for easy support
Though the third option sounded ideal, it also required the most time and effort. The second option would have been better than what we had before, but still hand-built with almost no automation. Option #1 would get us running faster, but would put us in the same problematic situation we had been in for several years. Classic tradeoffs. Lucky for me, the people that I work with like to think about the future, and we decided to go with the third option.
Our next task was to decide how to go about building a Puppetmaster, Foreman server, pxe server, dhcp server, and a puppetdb into a maintainable architecture.
Thinking in Services
Each of the things that were running on this server were already services, we just needed to break them apart. Sounds obvious, but it’s an important thing to think about. Every one of those things should be maintained in separate code bases, with automation to build and deploy them. Then you need a system to glue them all together. In the end, we decided to try running Docker. None of us were experts in the field of containers, and none of us had ever built something so large , but we felt it would give us the benefits that we desired from rebuilding this monster.
The first layer to consider was the operating system. The three Puppetmasters that we run rarely get updated, and when they do it’s a nightmare. This poses a lot of issues, especially in the realm of security. In addition, running Docker in Debian can be tricky (or it was at the time). We chose to look in a different direction, at an OS that is specifically designed to run containers. Now it’s called Container Linux, back then it was just called CoreOS.
CoreOS was appealing for a variety of reasons, but the first and foremost was updates. If run correctly, Container Linux will update itself based on a channel. So if you pick the stable channel, you get a tested operating system that receives an automatic update every time there’s a new stable release. To a system administrator, that’s a pipe dream.
I won’t go into the other details of why we chose Container Linux, but you should check it out when you have a chance.
Building the Containers
This is where taking a monolith and putting it into a container becomes a little nasty. You basically take that whole manual process of building a service on that monolith, and you write it down in a Dockerfile. This produces containers that are generally very large, and have quite a few hacks. Things like hosts files and systemd and cron become very ugly.
This may sound like a big problem, and it may sound like it’s contradictory to the goal at hand. I assure you, it is not. While we had to do a few hacks, and we wrote some ugly code, it’s still code. Every hack that we did was written down, checked in, and can be re-created by running a Jenkins job. Remember how that co-worker built the system and now nobody knows how it works? We just eliminated that problem. Now, the next guy that comes along can read the code (which is commented and documented), and understand what we did here.
Orchestration and Scheduling
This is the next big issue that has to be tackled when running a bunch of containers. How do we run them? How do know that they are still running?
At the time that this was happening, nobody on my team knew much about Kubernetes. I knew nothing. So when we fired up a CoreOS cluster and it had fleet running that enabled us to launch containers into a cluster that would stay alive and move between hosts, I was ecstatic. It was the best possible way to learn and understand how scheduling and orchestration work, and why they are important. I would later come to realize that fleet is not the best possible way to do this, and I will write about this in a later article.
For the time being, we use fleet, and it serves its purpose well.
Anyone who has ever used a compose file knows that Docker networking has its issues. I generally feel that Docker tried to bite off more than it could chew by trying to incorporate networking. This is where flannel comes in.
If you already have a Container Linux cluster running, then adding flannel is a trivial task. This allows each node to have a single 0.0.0.0/16 network inside of the cluster and assign addresses to each container from that pool. The overlay then allows containers to talk to each other over a private network. This eliminates a large number of the problems that are encountered when trying to use Docker networking in compose.
Now that we have containers running, and they can talk to each other, they need a way to find the services that they depend on. Puppet has no clue that its database is on a different machine, or how to find that machine. In the past, we would create a DNS entry to a static host and call it good. If we were feeling extra masochistic, we would go through the process of setting up a VIP with corosync and pacemaker.
Now that we were thinking in services, we approached this differently. Each container that runs is noticed by another service, called Registrator, that registers that service into Consul. Consul can serve up DNS names, and the services can find each other. Excellent.
As it stands, this is a very stable system. With the container builds our configurations are all in code and can be maintained easily. Fleet allows us to take down a node in the cluster without major interruptions in service. This system has been stable for several months, and is much easier to work with than it was before.
As we built this, we learned a few things.
Fleet is great. Use Kubernetes
Fleet was a great way to learn and understand basic scheduling, but now that we know these things, we understand the value that a more advanced scheduler can provide.
Moving forward, we will be building our services in Kubernetes on top of CoreOS.
Shared Storage is Key
It’s the same lesson we learned in virtualization. If you want highly available systems, some form of highly available storage is absolutely necessary.
Service Discovery is HARD
The next most important step to highly available systems and reducing manual work is service discovery. There are a thousand ways that people have solved this problem, but it seems to be one that is always solved in a different, more complicated way.
It is a difficult issue that is easy to punt down the road. Don’t do that. It will bite you later, I promise.
It may sound crazy, but putting your monolith into containers, even huge ones, has great benefits. It makes your system more manageable, provides paths to high availability, and can teach you things about the application that you never understood before.