Sismics is by any metrics a small company, but from the beginning we focused on making our infrastructure robust and scalable. On 20 working days a month, you cannot afford to spend 5 on managing your servers, installing new software, checking if everything is working, editing your reverse proxy configuration, …
Like large companies before us (Spotify, Expedia, PayPal), we are using Docker Swarm to save time and keep applications isolated.
In Swarm, your Docker servers (called nodes) are connected together to work as a single bucket for all your containers.
Each application in your Swarm is called a service, and each service can be replicated (at least once). Through constraints, each replication (called a task) is scheduled on a node, and a container is started.At Sismics, our Swarm consists of 5 servers, 4 in western Europe and since recently one in southeast Asia. We are using OVH dedicated servers, mostly with 32G of RAM and fast SSD drives. Around 100 services are started, smoothly running internal applications (Gitlab, LDAP, Harbor, …), customer applications, and some of our own services like Sismics Docs Cloud.However, Docker Swarm itself is not enough, and since we started using it, we had to develop several utilities on top:
- A web interface called SCP to replicated the paid service of Docker (the company) called Universal Control Plane. It allows us to create new projects in a few click, manage our Let’s Encrypt certificates, check the status of our Swarm services, monitor our physical servers, and more.
- A proxy on top of the Docker API, used to limit the creation of new services, handle NFS volume creation, authenticate using the LDAP, and restrict Docker usage. For example we can give access to the Docker CLI to a customer and they could only see services related to its activity.
- A reverse proxy based on HAProxy. Most services are accessible through an external URL, so the HAProxy configuration is automatically generated by using labels on services, and the incoming trafic is automatically routed to the right container through the ingress network. Other solutions now exist like Traefik.
- A backup solution based on Borg Backup and Backup Ninja to automatically backup volumes tagged with a specific label.
For example, the following configuration in a Docker stack file allows us to:
- Reverse proxy www.sismicsdocs.com and sismicsdocs.com to the right container
- Allow all “sismics” users to access this service
- Place the container in a western Europe server
- Enable the automatic backup
deploy: labels: haproxy.virtual_host_secure: "www.sismicsdocs.com" haproxy.http_redirect_from: "sismicsdocs.com" haproxy.virtual_host_port: "80" com.sismics.docker.auth: sismics backup.enable: true placement: constraints: - node.labels.region==euw
To solve the data storage issue, we are currently using NFS volume. The performance loss is quite high compared to local volumes, so we are still using local volumes and node constraints for high performance applications like databases. These NFS volume are also managed by our Docker Proxy layer.
volumes: data: labels: com.sismics.docker.auth: sismics driver: local driver_opts: type: nfs o: addr=storage.sismics.com,rw,local_lock=all device: :/nfs/storage/myapp_data
A classic workflow between code and production looks like this:
- Code committed to a “prod” branch in Gitlab
- A Gitlab CI job is started:
- Our code is compiled and automatically tested
- A Docker image is generated
- The docker image is uploaded to Harbor
- The Docker Swarm service is updated with the new image
As you can see, the path to production is massively reduced, and no manual action is necessary between the developer and the update of our production application.
To take Sismics Docs Cloud as example, each time a user sign up for a new account, a new Swarm service is started and a new application is scheduled somewhere on our servers.
In the future, we would like to:
- Use an object storage instead of NFS volumes (like minio)
- Enhance our SCP to order from OVH, configure and join to the Swarm a new server, all this in one click
- Setup a Chatops to automate even more actions
- Use the Docker healthcheck system
- Open-source most of the apps in this article
If your company is interested in a similar infrastructure, do not hesitate to contact us, we would be happy to share with you our experience.