Docker Storage

Storage in Docker refers to mechanisms and options to persist and manage data generated and used by Docker containers, enabling it to survive container lifecycle events or be shared among multiple containers.

Storage in Docker is essential to managing and persisting data created by and used within Docker containers. Unlike a traditional virtual machine, a Docker container does not keep any form of persistent storage. When a container is deleted, all changes to the filesystem within the container are lost. This is where Docker’s storage options come in. Docker provides several methods for storing data, including volumes, bind mounts, and tmpfs mounts, each with its use cases and trade-offs.

These are the four methods of using storage in Docker from a simple stateless container up for full persistent storage.

Container layer: A thin, writable layer that Docker adds to the read-only image layers when a container is launched, used for storing temporary data and changes during the container’s lifecycle.

tmpfs mounts: A temporary filesystem storage held in memory and never written to the host system’s filesystem, useful for sensitive data and high-speed, temporary storage.

Bind mounts: A simple storage option that involves mounting directories or files from the host’s filesystem directly into a container, allowing data sharing and persistence beyond the container’s lifecycle.

Volumes: Docker-managed storage areas, independent of the container’s lifecycle, that can be used to persist data and share it among multiple containers, offering a complex and robust solution for data storage needs.

Container with no persistent storage (Container Layer)

When you launch a Docker container from an image, Docker creates a thin writable layer on top of the image, often called the “container layer”. All changes made to the running container - such as writing new files, modifying existing files, and deleting files - are written to this thin writable container layer. This layer and the changes it contains exist as long as the container exists.

The container interacts with its writable layer like a regular Linux file system. Processes can create, modify, and delete files and directories. However, these changes are not persistent: if the container is stopped and removed, all changes to the container layer are lost. If a new container is launched from the same image, it starts with a fresh container layer - it does not inherit any changes made in the container layer of a previous container.

This setup works well for stateless applications, which do not need to save data between sessions. Examples include a web server serving static pages or a computation done on-the-fly that does not need to be saved.

Container with no persistent storage (tmpfs mounts)

tmpfs mounts are a Docker storage option stored in the Docker host system’s memory only and are never written to the host system’s filesystem. This can be particularly useful for sensitive information you want to avoid persisting after the container is removed or for temporary data you don’t need to keep for a long time.

The data in a tmpfs mount is temporary and will be deleted when the container is removed, or the host is rebooted. This differs from volumes and bind mounts, where the data persists after removing the container.

tmpfs mounts are very fast because they reside in memory. This can be useful for applications requiring high-speed data access or when you want to reduce disk I/O.

However, because tmpfs mounts are stored in memory, they consume a part of the available RAM, which can be a limited resource, especially on systems with many containers or limited memory.

tmpfs is similar to the container’s writable layer in that it does not persist data beyond the container’s lifecycle. However, as tmpfs mounts reside in memory rather than on disk, they offer faster read and write operations.

Here are a couple of scenarios where tmpfs could be beneficial over the container layer storage:

Sensitive data: If your application handles sensitive data like passwords or API keys, you might not want these to be written to disk, where they could potentially be accessed later. Storing this data in a tmpfs mount will be removed as soon as the container stops, helping to maintain the security of sensitive information.

High-speed intermediate data: Suppose you have an application that processes a lot of data, but this data is transient and does not need to be persisted. For example, a data processing application might read a large dataset from a database or data lake, perform calculations or transformations, and then write the results back to the database. The intermediate data used during processing doesn’t need to be kept once the processing is complete. Storing this intermediate data in a tmpfs mount could significantly speed up the processing time because reads and writes to memory are much faster than reads and writes to disk.

Please note that tmpfs mounts are available only on Linux-based Docker hosts. They are not available on Windows or macOS Docker hosts.

Container that stores data on its Docker host (Bind Mounts)

Bind mounts are a simple and powerful method for handling Docker storage, but they can also be somewhat risky due to their level of system access. A bind mount maps a host file or directory onto a container file or directory. In other words, it allows a directory or file from the host system’s filesystem to be ‘mounted’ into a container.

The file or directory is referenced by its full or relative path on the host machine. This means the file or directory does not need to exist on the Docker host, and it is created on demand if it does not exist.

Unlike volumes, which Docker manages, bind mounts depend on the directory structure of the host machine. This makes them less portable because moving them between different host systems can result in errors if the directory structure differs.

A key aspect of bind mounts is allowing both read and write operations. If the container modifies the content in the mounted directory, those changes will be reflected on the host filesystem and vice versa.

Bind mounts are not configured as part of the image in the Dockerfile. Instead they are configured as part of the runtime. You specify bind mounts when you run a Docker container:

docker run -v /host/path:/container/path -d my-image
  • /host/path: The directory on your host machine that you want to share with the Docker container.

  • /container/path: The directory inside the Docker container where you want to make the host’s directory available.

One potential downside of bind mounts is their level of access: a container with a bind mount has direct access to important system files and directories on the host. This could lead to security vulnerabilities if the container is compromised, as it could then change the host system.

In general, it’s recommended to use Docker volumes for most use cases, as they are more portable and offer better encapsulation and control of data. However, bind mounts can be helpful in specific instances where you need to share files between the host and the container or for development purposes where you want to edit code on the host and see the changes live in the container.

Containers that need persitent data (Volumes)

Docker volumes offer a mechanism for persistent data storage, separate from the container lifecycle. They survive the stopping, deleting, or replacing containers, which is ideal for maintaining databases, application logs, and other data types that require preservation.

Docker manages these volumes, hiding underlying complexities and presenting a simplified interface. This abstracted management is advantageous over bind mounts.

When you specify a volume in Docker, you’re essentially creating a persistent storage space that your application can use. The application itself determines the file structure within that volume based on its needs. When you bind the volume to a container in the Docker Compose file (or via the docker run command), you specify a path in the container where the volume will be mounted. Any data the application writes to that path will be stored in the volume.

For example, if your Docker Compose file looks like this:

version: '3'
services:
  app2:
    image: app-image
    volumes:
      - app-data:/data

volumes:
  app-data:

In this case, the application inside the app container can read from and write to the /data directory. All changes made to the /data directory will be stored in the app-data volume and will persist across container restarts and even if the container is completely removed. But the specific files and directories within /data (and therefore within the app-data volume) are entirely up to the application. Docker does not impose any specific file structure within the volume; it simply provides the storage space and ensures the data is preserved.

Containers can share Docker volumes, promoting data exchange between them. Attach the same volume to different containers during the container creation to enable this feature. This capability is valuable when multiple containers must access or contribute to the same dataset.

Backup and migration of data also become simplified tasks with Docker volumes. Docker provides built-in commands to back up a volume or migrate it to another machine, a clear advantage over handling files or directories directly on the host system.

Performance-wise, Docker volumes generally match the performance of bind mounts, making them a suitable choice in most use cases.

Security is another significant advantage of Docker volumes. Because Docker manages these volumes, you can leverage various Docker security mechanisms like AppArmor, SELinux, GRSEC, etc., which provide an additional layer of safety to your data.

To create a Docker volume, you can use the docker volume create command:

docker volume create my-volume

In this example, my-volume is the name of the volume.

To run a Docker container with the volume attached. You can do this using the docker run command with the -v or --mount option:

docker run -v my-volume:/container/path -d my-image

In this example:

  • my-volume:/container/path attaches my-volume to the container. The data in the volume will be available in the /container/path directory inside the container.
  • my-image is the name of the Docker image you’re running.

Another more explicit way to attach a volume when running a Docker container is with the --mount flag:

docker run --mount type=volume,source=my-volume,target=/container/path -d my-image

This command has the same effect as the previous docker run command but uses the more explicit --mount syntax.

Remember that the container will have read and write access to the volume, and the data in the volume will persist even after the container is removed.

When you create a Docker volume using the docker volume create command, Docker manages the storage of this volume. The actual data of the volume is stored in a directory on the Docker host, but the exact location depends on your Docker configuration and the platform you are using.

On a Linux system, Docker stores the data in a directory under /var/lib/docker/volumes/. For a volume named my-volume, the data would typically be in /var/lib/docker/volumes/my-volume/_data.

However, as a user, you generally don’t need to know or care about this location, because Docker manages it for you. When you use the -v or --mount option with the docker run command to mount the volume into a container, Docker takes care of making the data in the volume available at the specified mount point in the container.

Remember, it’s generally best practice not to directly manipulate or interact with the data in the /var/lib/docker/volumes/ directory. Instead, use Docker commands to interact with your volumes to avoid data corruption or loss.

When using Docker Swarm

When you use Docker Swarm, you still use volumes, but their behavior changes if service containers are distributed across Swarm nodes.

If you use local volumes, each volume is node-specific. Each replica writes to a separate volume on its node for services with multiple replicas. This means that each service replica sees different data - which usually isn’t desirable for shared data.

To share a volume between containers possibly running on different Docker hosts, you must use a volume driver that supports shared storage. For example, you might use docker volume create -d flocker or a plugin from a cloud provider. Any node in the swarm can access this type of volume.

Bind mounts require the specified paths to exist on every node in the swarm. However, similar to local volumes, each bind mount on each node is separate. So, a container writing data to a bind mount on one node won’t cause that data to appear for a container using the same bind mount on another node.

In a Docker Compose file that deploys a stack to a swarm, you define volumes as in a single-host setup. But for sharing data across nodes, you should use a volume driver supporting shared storage.

Docker volume drivers

Docker supports a variety of volume drivers, which are plugins that provide storage capabilities to your containers.

Volume DriverDescription
LocalThe default driver. Stores volumes on the Docker host’s filesystem.
AWS EBSA Docker certified plugin for using AWS Elastic Block Store (EBS) volumes with Docker. It allows you to create, remove, attach and detach EBS volumes to instances.
Azure DiskA plugin for using Azure Disk Storage as a Docker volume. It allows the use of existing Azure disks as well as the creation of new ones.
GCE Persistent DiskA Docker certified plugin for using Google Compute Engine persistent disks with Docker.
FlockerAn open-source, network-based volume plugin, mostly used in Docker Swarm setups.
Rex-RayAn open-source, storage-agnostic volume plugin that supports a variety of storage systems.
NFSNetwork File System (NFS) volumes allow data to be shared across a network, and can be useful in Swarm environments.
GlusterFSAn open-source, network-based volume plugin for creating distributed storage solutions.
Last modified July 21, 2024: update (e2ae86c)