Docker swarm 平滑降级、以及容器发布更新 无法做到零当机


Docker deployments with zero downtime
https://lostintimedev.com/2017 ... .html

文章非常清晰的介绍了 Zero Downtime Deployment of Docker Swarm 方面的内容,
自己也做了一部分尝试和实验,但是发现与文章中的结果还是有些差距,
在 scale down 服务 和 update container version 上出现了问题

应用部署使用
docker service create \
--env "DEBUG=docker-deploy-test:*" \
--name "deployme" \
--endpoint-mode "vip" \
--mode "replicated" \
--replicas 6 \
--update-parallelism 1 \
--update-order start-first \
--update-delay 10s \
--stop-grace-period 5s \
--restart-condition "any" \
--restart-max-attempts 10 \
--network "deploy" \
--publish "3000:3000" \
--health-cmd "curl --fail http://localhost:3000" \
--health-interval 3s \
--health-retries 5 \
--health-timeout 2s \
localhost:5000/lostintime/docker-deploy-test:v1

降级
docker service scale --detach=false deployme=3

版本更新
docker service update --detach=false --image "localhost:5000/lostintime/docker-deploy-test:v2" deployme

版本切换
docker service update --detach=false --image "localhost:5000/lostintime/docker-deploy-test:v1" deployme

版本回滚
docker service update --detach=false --rollback deployme

ab压力测试
ab -c 2 -n 10000 -l -k "http://192.168.99.101:3000/"

出现
apr_socket_recv: Connection refused (61)
或者
apr_socket_recv: Connection reset by peer (54)

容器更新的过程并不平滑,请问是什么问题呢?

社区中有以下理解
Start rolling update for a service that is replicated with N containers
Do the following in batches of update_parallelism for those N containers:
Remove container from ingress load balancing pool for the respective service so it will not receive any more incoming traffic
Stop that container
Wait for the container to exit (ie give it time to finish any running requests), and send a SIGKILL after 10 seconds
Start new container
If the container has a healthcheck configured:
Wait for the new container to be healthy
Add container to ingress load balancing pool
Rolling update complete

官方GitHub 的 issues
Zero-downtime deployments with rolling upgrades #30321
https://github.com/moby/moby/issues/30321
已邀请:

要回复问题请先登录注册