Skip to main content

Docker Configuration Evolution: From Scattered Containers to Managed Stacks

Previously, each service lived in a separate repository with minimal docker-compose.yml:

# Old approach: garrysmod-server/
services:
  garrysmod-server:
    image: ceifa/garrysmod:latest
    ports:
      - "27015:27015"
    volumes:
      - ./garrysmod:/data
    restart: unless-stopped

Problems:

  • ❌ No resource control - one service could consume all memory
  • ❌ No health checks - crashed services went undetected
  • ❌ No log rotation - logs grew until disk full
  • ❌ No network isolation - all services in default network
  • ❌ Manual per-service updates - high risk of human error

New Approach: Stacks with Explicit Contracts
#

Now related services are grouped into a single repository with unified configuration:

grafana-stack/
├── compose.yaml          # Unified stack: grafana, loki, oncall-*
├── stack.env             # Shared variables (extracted from code)
├── loki/
│   ├── Dockerfile        # Custom build if needed
│   └── loki.yml          # Application config
└── README.md             # Docs: ports, dependencies, deployment

Key Changes
#

1. Resource Limits (Preventing “Starvation”)
#

services:
  minecraft-server:
    cpus: "2.0" # Hard CPU limit
    mem_limit: 3g # Hard memory limit
    pids_limit: 300 # Protection against fork bombs

Why:

  • Prevents one service from blocking others
  • Enables precise load planning on the host
  • Simplifies diagnostics: if a service hits a limit, it’s immediately visible

2. Security by Default
#

services:
  grafana:
    security_opt:
      - no-new-privileges:true # Prevent privilege escalation
      - seccomp:unconfined # Only where truly needed (e.g., systemd inside)

Why no-new-privileges:

  • Container cannot gain more privileges than it started with
  • Protects against vulnerabilities exploiting setuid binaries
  • Near-zero overhead

3. Observability: Logging and Health Checks
#

services:
  grafana:
    logging:
      driver: json-file
      options:
        max-file: "3" # Keep only 3 rotated files
        max-size: 10m # Each file max 10 MB
    healthcheck:
      test: ["CMD-SHELL", "curl -f http://localhost:3000/api/health || exit 1"]
      interval: 15s # Check every 15 seconds
      timeout: 5s # Response timeout
      retries: 3 # 3 failures = unhealthy
      start_period: 30s # Ignore failures during startup

Result:

  • Logs don’t fill disk (rotation + limits)
  • Orchestrator sees service state (docker ps shows (healthy))
  • Can alert on unhealthy status

4. Network Isolation and Service Discovery
#

networks:
  prometheus:
    external: true # Shared network for all metric-exporting services
    name: prometheus
  traefik:
    external: true # Shared network for public services behind proxy
    name: traefik

services:
  grafana:
    networks: [prometheus, traefik] # Sees both metrics and web traffic
  loki:
    networks: [prometheus] # Metrics only, no public access

Benefits:

  • Services discover each other by name (http://loki:3100)
  • Public access only through services explicitly connected to traefik
  • Easy to add new service to monitoring: connect to prometheus network - it’s already visible

5. Graceful Shutdown and Signals
#

services:
  minecraft-server:
    stop_signal: SIGTERM # First, polite request to stop
    stop_grace_period: 60s # Wait up to 60s before SIGKILL
    stdin_open: true # For interactive consoles
    tty: true

Why:

  • Prevents data loss on restart (server saves world)
  • Allows apps to close connections, flush caches properly
  • SIGTERM + grace period - production deployment standard

6. Configuration via .env, Not Hardcoding
#

# compose.yaml
services:
  grafana:
    env_file: [stack.env]  # Extract sensitive data

# stack.env (in .gitignore)
GF_SECURITY_ADMIN_USER=admin
GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASS}
DOMAIN=potatoenergy.ru

Benefits:

  • One secrets file - easier to rotate, easier to audit
  • No passwords in repository
  • Easy to deploy to different environments (dev/stage/prod) with different .env

Comparison Table
#

CriterionOld ApproachNew Approach
Grouping1 service = 1 repoRelated services = 1 stack
ResourcesNo limitscpus, mem_limit, pids_limit
SecurityDocker defaultsno-new-privileges, explicit seccomp
LoggingGrow until disk fullRotation: max-file, max-size
HealthNoneHealth check with interval/timeout
NetworksAll in defaultExplicit external networks (prometheus, traefik)
ShutdownInstant SIGKILLSIGTERM + stop_grace_period
ConfigHardcoded in compose.env files, excluded from repo

Evolution in Numbers
#

MetricBeforeAfter
Average stack deployment time~15 min (manual update of 5 services)~3 min (single docker compose up -d)
Host memory consumptionUnpredictable, frequent OOMStable, limits guarantee isolation
Recovery time after failureDepends on manual interventionAuto-restart + health check detects issue
Configuration auditNeed to check 10+ reposOne compose.yaml per stack

Practical Recommendations
#

When Migrating from Old Approach
#

  1. Start with one stack (e.g., monitoring: grafana + loki + prometheus)
  2. Add limits gradually: first mem_limit, then cpus, then pids_limit
  3. Test health checks locally before deploy: docker compose up --abort-on-container-exit
  4. Extract secrets to .env and add it to .gitignore before first commit
  5. Document external networks: what each network is for, which services connect

Checklist for New Service
#

  • Resource limits: cpus, mem_limit, pids_limit
  • Security: no-new-privileges:true, explicit seccomp where needed
  • Logging: json-file driver with max-file/max-size
  • Health check: meaningful test, reasonable interval/timeout
  • Networks: explicit connection to external networks (prometheus, traefik)
  • Shutdown: stop_signal: SIGTERM, stop_grace_period for long-running services
  • Config: sensitive data in env_file, not in code

Links#

There are no articles to list here yet.