Docker Swarm

Docker Swarm is a container orchestration tool integrated with the Docker platform that allows for the management, scaling, and deployment of multiple containers across multiple host machines in a cluster.

You use Docker Swarm to create and manage a group of Docker nodes, turning them into a single, virtual Docker engine. Docker Swarm uses the standard Docker API, meaning that any tool communicating with a Docker daemon can use Swarm to scale to multiple hosts transparently.

A Docker node can be a manager, a worker, or both. The manager node controls the “work tasks,” and workers execute them.

Docker configuration and state are held in an etcd database that is distributed to all manager nodes and always kept up to date.

Encryption, authentication of nodes, and key rotation are built-into the TLS service and required to run a Swarm.

Managers

Managers are highly available with an active-passive multi-manager model, meaning multiple managers can fail, and the Swarm will continue to function. There can only be one leader of the managers at any given time, who is also the only manager who issues commands to the Swarm. If another manager that is not the leader receives a request, then it is passed to the leader to execute. The etcd data is synchronized between all managers and updated so they can take over as leader as required.

It’s best practice only to have odd number of managers and not too many, so 3-5 managers is standard. Having an odd number reduces a split-brain scenario based on quorum votes.

Workers

Worker nodes in a Docker Swarm are the main executors of tasks, acting as the powerhouse for the Swarm. They receive and carry out tasks issued by manager nodes, such as running container instances. The worker nodes do not participate in the Swarm’s internal decision-making processes and are not privy to the Swarm’s secure internal information, making them the safer choice for adding and removing nodes to the Swarm dynamically. They report back the status of their tasks, ensuring that the state of the Swarm is continually updated and maintained.

In contrast to the manager nodes, there isn’t a specific recommended number of worker nodes - it’s largely dependent on the scale of your applications and the load they need to handle. Adding or removing worker nodes is also easier as they don’t participate in the consensus and leadership election process, making them more flexible for handling varying workload demands. Moreover, having more worker nodes can potentially provide greater fault tolerance, as failing tasks can be rescheduled on other available workers.

Services

Instead of using Docker Containers directly, they are wrapped up into a Service. A service specifies tasks to be executed, analogous to running containers on a single Docker instance. A Docker service defines not only the image to be used for the task but also the operational parameters, such as the number of replicas, network options, and update policies, allowing for efficient management and scaling across a swarm of Docker nodes.

Replicated or Global modes

Docker services can run in either replicated or global mode. Replicated means the desired replica count is used to deploy the correct number of containers. When using Docker services in global mode, you don’t need to set the replica count. In global mode, Docker Swarm automatically schedules one task for the service on each available node in the swarm. This ensures that an instance of the service is running on every node, making it useful for tasks like monitoring or logging activities on every node in your swarm.

Scale a Service

Based on desired state you can easily scale a Service up or down at will. if you deplopyed a Service with 3 replcias and you see you need more instances to cope with load you can simply run the following command which would change the desired state of the deployed service named web-fe to 5 instances.

The same is true to scale down, you simply change the desired state number to fewer instances.

Networking

When using multiple Docker nodes running the same containers (Services) you will need the applications to function as if they were on the same network. In a large setup your Docker nodes will probably be split up onto different networks. So you can create an overlay network to allow the Docker nodes that are members of the same Swarm to communicate to each other on the same network address space regardless of what the Docker host is configured to use.

When you create the Service you deploy it to the overlay network specifically using the following command. This also implies that every node in the Swarm will listen on the specified port even if its not hosting the Service instance. This is called ingress mode and nodes not hosting the Service simply procy the traffic onto the nearest node that does. The alternate is to only listen on the port number if the node hosts a Service instance which is called host mode.

Zero-downtime updates

When you need to update a Docker Service you can take advantage of the multiple replcias and issue a command that performs a rolling update across the Swarm. This means you maintain 100% uptime as not all instances are affected at the same time. If also affords the opportunity to cancel and rollback midway through the update if you encounter errors when some of the newly deployed instances are used.

Last modified July 21, 2024: update (e2ae86c)