docker swarm配置服务发现consul时候无法连接到指定端口


一:我的swarm是按照https://docs.docker.com/swarm/install-manual/来配置的;

集群架构如下:

IP 角色
192.168.16.219 swarm管理者 swarm被管理节点
192.168.16.217 swarm被管理节点
192.168.16.218 swarm被管理节点

配置完成可以查看swarm结果如下:

docker -H tcp://192.168.16.219:2376 info
Containers: 109
Images: 11
Role: primary
Strategy: spread
Filters: health, port, dependency, affinity, constraint
Nodes: 3
hz219: 192.168.16.219:2375
└ Status: Healthy
└ Containers: 108
└ Reserved CPUs: 0 / 1
└ Reserved Memory: 0 B / 1.005 GiB
└ Labels: executiondriver=native-0.2, kernelversion=3.10.0-229.el7.x86_64, operatingsystem=CentOS Linux 7 (Core), storagedriver=devicemapper
localhost.localdomain: 192.168.16.218:2375
└ Status: Healthy
└ Containers: 1
└ Reserved CPUs: 0 / 1
└ Reserved Memory: 0 B / 1.003 GiB
└ Labels: executiondriver=native-0.2, kernelversion=3.10.0-327.el7.x86_64, operatingsystem=CentOS Linux 7 (Core), storagedriver=devicemapper
localhost.localdomain: 192.168.16.217:2375
└ Status: Healthy
└ Containers: 0
└ Reserved CPUs: 0 / 1
└ Reserved Memory: 0 B / 1.003 GiB
└ Labels: executiondriver=native-0.2, kernelversion=3.10.0-327.el7.x86_64, operatingsystem=CentOS Linux 7 (Core), storagedriver=devicemapper
CPUs: 3
Total Memory: 3.01 GiB
Name: bc610a9676ff
二:我的consul配置如下:

1:安装consul

1)在hz219 hz218 hz217所有的机器上安装consul
yum -y install wget unzip
cd /usr/local/bin
wget https://dl.bintray.com/mitchel ... 4.zip && unzip *.zip && rm -f *.zip

2)在hz219上设置该机器以bootstrap模式启动
一开始学习consul, 我们需要让consul 运行起来。consul server推荐至少在3~5个之间,推荐的方法是一开始启动其中一台server,并且配置到bootstrap的模式,该模式node可以指定自己作为leader,而不用进行选举。然后再依次启动其他server,配置为非bootstrap的模式。最后把第一个serverbootstrap模式停止,重新以非bootstrap模式启动,这样server之间就可以自动选举leader;
在上面的表中,我们需要指定server 1 做为bootstrap server,可以run下面的命令,以下是启动的过程。可以看到bootstrap模式下配置自己为leader;

[root@hz219 bin]# mkdir -p /data/consul
[root@hz219 bin]# consul agent -server -bootstrap -data-dir /data/consul/
==> WARNING: Bootstrap mode enabled! Do not enable unless necessary
==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting raft data migration...
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Consul agent running!
Node name: 'hz219'
Datacenter: 'dc1'
Server: true (bootstrap: true)
Client Addr: 127.0.0.1 (HTTP: 8500, HTTPS: -1, DNS: 8600, RPC: 8400)
Cluster Addr: 192.168.16.219 (LAN: 8301, WAN: 8302)
Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
Atlas: <disabled>

==> Log data will now stream in as it occurs:

2016/01/07 18:10:37 [INFO] serf: EventMemberJoin: hz219 192.168.16.219
2016/01/07 18:10:37 [INFO] serf: EventMemberJoin: hz219.dc1 192.168.16.219
2016/01/07 18:10:37 [INFO] raft: Node at 192.168.16.219:8300 [Follower] entering Follower state
2016/01/07 18:10:37 [INFO] consul: adding server hz219 (Addr: 192.168.16.219:8300) (DC: dc1)
2016/01/07 18:10:37 [INFO] consul: adding server hz219.dc1 (Addr: 192.168.16.219:8300) (DC: dc1)
2016/01/07 18:10:37 [ERR] agent: failed to sync remote state: No cluster leader
2016/01/07 18:10:38 [WARN] raft: Heartbeat timeout reached, starting election
2016/01/07 18:10:38 [INFO] raft: Node at 192.168.16.219:8300 [Candidate] entering Candidate state
2016/01/07 18:10:38 [INFO] raft: Election won. Tally: 1
2016/01/07 18:10:38 [INFO] raft: Node at 192.168.16.219:8300 [Leader] entering Leader state
2016/01/07 18:10:38 [INFO] consul: cluster leadership acquired
2016/01/07 18:10:38 [INFO] consul: New leader elected: hz219
2016/01/07 18:10:38 [INFO] raft: Disabling EnableSingleNode (bootstrap)
2016/01/07 18:10:38 [INFO] consul: member 'hz219' joined, marking health alive
2016/01/07 18:10:39 [INFO] agent: Synced service 'consul'
==> Newer Consul version available: 0.6.0

以上输出可以看出hz219不经过选举就直接变为leader了;

3)在hz217和hz218上设置该机器以非bootstrap模式启动

hz218启动情况如下

[root@hz218 bin]# cd ~
[root@hz218 ~]# mkdir -p /data/consul
[root@hz218 bin]# consul agent -server -data-dir /data/consul/
==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting raft data migration...
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Consul agent running!
Node name: 'hz218'
Datacenter: 'dc1'
Server: true (bootstrap: false)
Client Addr: 127.0.0.1 (HTTP: 8500, HTTPS: -1, DNS: 8600, RPC: 8400)
Cluster Addr: 192.168.16.218 (LAN: 8301, WAN: 8302)
Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
Atlas: <disabled>

==> Log data will now stream in as it occurs:

2016/01/08 10:21:30 [INFO] serf: EventMemberJoin: hz218 192.168.16.218
2016/01/08 10:21:30 [INFO] serf: EventMemberJoin: hz218.dc1 192.168.16.218
2016/01/08 10:21:30 [INFO] raft: Node at 192.168.16.218:8300 [Follower] entering Follower state
2016/01/08 10:21:30 [INFO] consul: adding server hz218 (Addr: 192.168.16.218:8300) (DC: dc1)
2016/01/08 10:21:30 [INFO] consul: adding server hz218.dc1 (Addr: 192.168.16.218:8300) (DC: dc1)
2016/01/08 10:21:30 [ERR] agent: failed to sync remote state: No cluster leader
2016/01/08 10:21:31 [WARN] raft: EnableSingleNode disabled, and no known peers. Aborting election.
2016/01/08 10:21:53 [ERR] agent: failed to sync remote state: No cluster leader

hz217启动情况如下

[root@hz217 bin]# cd ~
[root@hz217 ~]# mkdir -p /data/consul
[root@hz217 ~]# consul agent -server -data-dir /data/consul/
==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Consul agent running!
Node name: 'hz217'
Datacenter: 'dc1'
Server: true (bootstrap: false)
Client Addr: 127.0.0.1 (HTTP: 8500, HTTPS: -1, DNS: 8600, RPC: 8400)
Cluster Addr: 192.168.16.217 (LAN: 8301, WAN: 8302)
Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
Atlas: <disabled>

==> Log data will now stream in as it occurs:

2016/01/07 05:24:30 [INFO] serf: EventMemberJoin: hz217 192.168.16.217
2016/01/07 05:24:30 [INFO] serf: EventMemberJoin: hz217.dc1 192.168.16.217
2016/01/07 05:24:30 [INFO] raft: Node at 192.168.16.217:8300 [Follower] entering Follower state
2016/01/07 05:24:30 [INFO] consul: adding server hz217 (Addr: 192.168.16.217:8300) (DC: dc1)
2016/01/07 05:24:30 [INFO] consul: adding server hz217.dc1 (Addr: 192.168.16.217:8300) (DC: dc1)
2016/01/07 05:24:30 [ERR] agent: failed to sync remote state: No cluster leader
2016/01/07 05:24:32 [WARN] raft: EnableSingleNode disabled, and no known peers. Aborting election.

从输出可以看到,hz218 hz217都处于寻找leader的状态,而且还没有找到leader

4)将hz217和hz218加入到hz219机器所在的cluster中(hz219机器上操作)
[root@hz219 ~]# consul join 192.168.16.217 192.168.16.218
Successfully joined cluster by contacting 2 nodes.

通过以下命令查看当前集群中的机器

[root@hz219 ~]# consul members
Node Address Status Type Build Protocol DC
hz219 192.168.16.219:8301 alive server 0.5.2 2 dc1
hz217 192.168.16.217:8301 alive server 0.5.2 2 dc1
hz218 192.168.16.218:8301 alive server 0.5.2 2 dc1

5)hz219上执行以下操作
当3台机子都加入到集群,我们需要配置这3台机子为同等的server,并且让它们自己选择leader,这时可以停止第一台的consul,然后再用以下命令启动

停止consul

[root@hz219 ~]# kill -9 ps -ef | grep consul | grep -v grep | awk '{print $2}'

以非bootstrap方式启动consul

[root@hz219 ~]# consul agent -server -data-dir /data/consul/
==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting raft data migration...
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Consul agent running!
Node name: 'hz219'
Datacenter: 'dc1'
Server: true (bootstrap: false)
Client Addr: 127.0.0.1 (HTTP: 8500, HTTPS: -1, DNS: 8600, RPC: 8400)
Cluster Addr: 192.168.16.219 (LAN: 8301, WAN: 8302)
Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
Atlas: <disabled>

==> Log data will now stream in as it occurs:

2016/01/07 18:41:05 [INFO] serf: EventMemberJoin: hz219 192.168.16.219
2016/01/07 18:41:05 [INFO] serf: EventMemberJoin: hz219.dc1 192.168.16.219
2016/01/07 18:41:05 [INFO] raft: Node at 192.168.16.219:8300 [Follower] entering Follower state
2016/01/07 18:41:05 [INFO] serf: Attempting re-join to previously known node: hz217: 192.168.16.217:8301
2016/01/07 18:41:05 [WARN] serf: Failed to re-join any previously known node
2016/01/07 18:41:05 [INFO] consul: adding server hz219 (Addr: 192.168.16.219:8300) (DC: dc1)
2016/01/07 18:41:05 [INFO] consul: adding server hz219.dc1 (Addr: 192.168.16.219:8300) (DC: dc1)
2016/01/07 18:41:05 [ERR] agent: failed to sync remote state: No cluster leader
2016/01/07 18:41:05 [INFO] serf: EventMemberJoin: hz217 192.168.16.217
2016/01/07 18:41:05 [INFO] serf: EventMemberJoin: hz218 192.168.16.218
2016/01/07 18:41:05 [INFO] serf: Re-joined to previously known node: hz217: 192.168.16.217:8301
2016/01/07 18:41:05 [INFO] consul: adding server hz217 (Addr: 192.168.16.217:8300) (DC: dc1)
2016/01/07 18:41:05 [INFO] consul: adding server hz218 (Addr: 192.168.16.218:8300) (DC: dc1)
2016/01/07 18:41:05 [INFO] consul: New leader elected: hz217
2016/01/07 18:41:07 [WARN] raft: Heartbeat timeout reached, starting election
2016/01/07 18:41:07 [INFO] raft: Node at 192.168.16.219:8300 [Candidate] entering Candidate state
2016/01/07 18:41:07 [INFO] raft: Node at 192.168.16.219:8300 [Follower] entering Follower state
2016/01/07 18:41:08 [ERR] agent: failed to sync remote state: No cluster leader
2016/01/07 18:41:09 [WARN] raft: Heartbeat timeout reached, starting election
2016/01/07 18:41:09 [INFO] raft: Node at 192.168.16.219:8300 [Candidate] entering Candidate state
2016/01/07 18:41:10 [WARN] raft: Election timeout reached, restarting election
2016/01/07 18:41:10 [INFO] raft: Node at 192.168.16.219:8300 [Candidate] entering Candidate state
2016/01/07 18:41:11 [WARN] raft: Election timeout reached, restarting election
2016/01/07 18:41:11 [INFO] raft: Node at 192.168.16.219:8300 [Candidate] entering Candidate state
2016/01/07 18:41:13 [INFO] raft: Node at 192.168.16.219:8300 [Follower] entering Follower state
2016/01/07 18:41:13 [WARN] raft: Failed to get previous log: 45 log not found (last: 40)
2016/01/07 18:41:13 [INFO] consul: New leader elected: hz217
==> Newer Consul version available: 0.6.0
2016/01/07 18:41:15 [INFO] agent: Synced service 'consul'

从以上输出可以看出,hz219刚加入的时候集群中没有leader,当该机器加入以后,通过选举,hz217变为了leader,其他机器转为follower状态,当进行选举的时候,每个集群都转入Candidate状态;

通过以下命令查看当前集群中的节点

[root@hz219 bin]# consul members
Node Address Status Type Build Protocol DC
hz219 192.168.16.219:8301 alive server 0.5.2 2 dc1
hz217 192.168.16.217:8301 alive server 0.5.2 2 dc1
hz218 192.168.16.218:8301 alive server 0.5.2 2 dc1

三:按照官网配置swarm 服务发现consul

参考: https://docs.docker.com/swarm/discovery/

执行以下命令:
[root@hz217 ~]# docker run swarm join --advertise=127.0.0.1:2375 consul://localhost/v1/kv/web
time="2016-01-07T17:39:25Z" level=info msg="Registering on the discovery service every 1m0s..." addr="127.0.0.1:2375" discovery="consul://localhost/v1/kv/web"
time="2016-01-07T17:39:25Z" level=error msg="Get http://localhost/v1/kv/v1/kv/w ... 2375: dial tcp [::1]:80: getsockopt: connection refused"

报错,如何解决?

使用8500端口错误依旧
[root@hz217 ~]# docker run swarm join --advertise=127.0.0.1:2375 consul://localhost:8500/v1/kv/web
time="2016-01-07T17:33:19Z" level=info msg="Registering on the discovery service every 1m0s..." addr="127.0.0.1:2375" discovery="consul://localhost:8500/v1/kv/web"
time="2016-01-07T17:33:19Z" level=error msg="Get http://localhost:8500/v1/kv/v1 ... 2375: dial tcp [::1]:8500: getsockopt: connection refused"
已邀请:

FanLin - Docker&CoreOS爱好者

赞同来自:


看起来是Consul在8500端口的服务有问题,检查一下是不是直接连这个端口就会拒绝呢?
ncat -v localhost 8500
或者
nc -v localhost 8500
如果这样连还是『connection refused』说明Consul服务有问题。

要回复问题请先登录注册