Production Grade Checklist

This pattern lists the key items you need to consider when building a production grade system
StageDescriptionExample tool
ProvisionCreate the infrastructureTerraform, Cloudformation, ARM
InstallInstall the software and required binariesBash, Ansible, Docker, Packer
ConfigureConfigure the software runtime such as certificates and portsChef, Ansible
DeployDeploy the service, updatesASG, Kubernetes, ECS
High-availabilityCapability to withstand service disruptionMulti-datacetner, multi-region
ScalableScale up/down on demandAuto scaling, replication
PerformanceOptimize compute, storage, networking based on benchmarks, load testing and profilingDynatrace, Valgrind, VisualVM
NetworkingIP allocation, firewalls, DNSVPC, Virtual Network, NSG, security groups
SecurityEncryption, authorization, authentication, secrets management, server hardeningACM, KMS, Vaults
MetricsAvailability, performance, app, server, events, alertingCloudWatch, Azure Monitor, DataDog, BigPanda
LogsRotation, aggregation to centralized store, long-term availabilityElastic Stack, Sumo Logic
Data backupDatabase, cached data, replication, RTO/RPOAWS Backup, Azure Backup, Snapshots
Cost optimizationAppropriate SKU choice, spot/reserves instances, auto-scaling, cleaning up unused resourcesAuto scaling, InfraCost
DocumentationCode, IaC, peripheral services such as IdM, incident response playbooksREADME, SharePoint, Slack, Wiki
TestsAutomated tests for IaCTerratest, tflint, Open Policy Agent, InSpec
Last modified July 21, 2024: update (e2ae86c)