#打卡不停更# calico BGP Full Mesh 跨节点通信 原创

whale_life
发布于 2022-9-27 11:23
浏览
0收藏

简介

Full Mesh - 全互联模式,启用了 BGP 之后,Calico 的默认行为是在每个节点彼此对等的情况下创建完整的内部 BGP(iBGP)连接,这使 Calico 可以在任何 L2 网络(无论是公有云还是私有云)上运行,或者说(如果配了 IPIP)可以在任何不禁止 IPIP 流量的网络上作为 overlay 运行。对于 vxlan overlay,Calico 不使用 BGP。

Full-mesh 模式对于 100 个以内的工作节点或更少节点的中小规模部署非常有用,但是在较大的规模上,Full-mesh 模式效率会降低,较大规模情况下,Calico 官方建议使用 Route reflectors。

BGP 是增量更新的方式,不是全量更新。
BGP是应用层协议。
calico-bgp-full-mesh
#打卡不停更# calico BGP Full Mesh 跨节点通信-鸿蒙开发者社区

安装部署

BGP Full Mesh 就是在 calico-ipip 的基础上修改如下参数:
其实就是关闭 IPIP 的封装,默认会使用 BGP Full Mesh

# 将 Always 修改为 Never
            - name: CALICO_IPV4POOL_IPIP
              value: "Always"

#打卡不停更# calico BGP Full Mesh 跨节点通信-鸿蒙开发者社区

可以通过 calicoctl 查看 ippool
#打卡不停更# calico BGP Full Mesh 跨节点通信-鸿蒙开发者社区

[root@master ~]# calicoctl get ippool -o wide 
NAME                  CIDR            NAT    IPIPMODE   VXLANMODE   DISABLED   DISABLEBGPEXPORT   SELECTOR   
default-ipv4-ippool   10.244.0.0/16   true   Never      Never       false      false              all()   

查看 calico BGP Full Mesh 状态

部署完以后,我们可以通过 calicoctl
node1 192.168.0.81
node2 192.168.0.82
已经建立 Full mesh,状态为 Established
#打卡不停更# calico BGP Full Mesh 跨节点通信-鸿蒙开发者社区

[root@master ~]# calicoctl node status
Calico process is running.

IPv4 BGP status
+--------------+-------------------+-------+----------+-------------+
| PEER ADDRESS |     PEER TYPE     | STATE |  SINCE   |    INFO     |
+--------------+-------------------+-------+----------+-------------+
| 192.168.0.81 | node-to-node mesh | up    | 13:17:26 | Established |
| 192.168.0.82 | node-to-node mesh | up    | 13:17:26 | Established |
+--------------+-------------------+-------+----------+-------------+

IPv6 BGP status
No IPv6 peers found.

查看 calico BGP AS-number

calico-bgp-as-number
默认情况下,所有 Calico 节点都使用 64512 作为自治域,除非已为节点指定了 per-node AS。

# 查看 AS number 为 64512
[root@master ~]# calicoctl get node -o wide 
NAME               ASN       IPV4              IPV6   
master.whale.com   (64512)   192.168.0.80/24          
node1.whale.com    (64512)   192.168.0.81/24          
node2.whale.com    (64512)   192.168.0.82/24   

BGP 模式抓包测试

pod1 10.244.42.65
node1 192.168.0.81
pod2 10.244.103.65
node2 192.168.0.82
#打卡不停更# calico BGP Full Mesh 跨节点通信-鸿蒙开发者社区

[root@master ~]# kubectl create deploy cni-test --image=burlyluo/nettoolbox --replicas=2

[root@master ~]# kubectl get pod -o wide 
NAME                        READY   STATUS    RESTARTS   AGE   IP              NODE              NOMINATED NODE   READINESS GATES
cni-test-777bbd57c8-ggfsp   1/1     Running   0          26s   10.244.42.65    node1.whale.com   <none>           <none>
cni-test-777bbd57c8-gxv5q   1/1     Running   0          22s   10.244.103.65   node2.whale.com   <none>           <none>
[root@master ~]# kubectl get pod -o wide -A
NAMESPACE     NAME                                       READY   STATUS    RESTARTS      AGE    IP               NODE               NOMINATED NODE   READINESS GATES
kube-system   calico-kube-controllers-846cc9f754-zqz8h   1/1     Running   0             59m    10.244.152.128   master.whale.com   <none>           <none>
kube-system   calico-node-5xplj                          1/1     Running   0             59m    192.168.0.82     node2.whale.com    <none>           <none>
kube-system   calico-node-bntq7                          1/1     Running   0             59m    192.168.0.81     node1.whale.com    <none>           <none>
kube-system   calico-node-cqc5t                          1/1     Running   0             59m    192.168.0.80     master.whale.com   <none>           <none>


[root@master ~]# kubectl exec -it cni-test-777bbd57c8-ggfsp -- ping -c 1 10.244.103.65

pod1.cap

通过网卡的对应关系,我们并没有发现 在 node 节点上的诸如 vxlan 和 ipip 的封装设备,说明是通过的 node 的路由表来进行通信的。

[root@master ~]#  kubectl exec -it cni-test-777bbd57c8-ggfsp -- ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 4E:DB:29:E6:AC:A8  
          inet addr:10.244.42.65  Bcast:0.0.0.0  Mask:255.255.255.255
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:11 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:810 (810.0 B)  TX bytes:364 (364.0 B)

[root@master ~]# kubectl exec -it cni-test-777bbd57c8-ggfsp -- ethtool -S eth0
NIC statistics:
     peer_ifindex: 7
     
[root@node1 ~]# ip link show 
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether 00:0c:29:d8:6c:fb brd ff:ff:ff:ff:ff:ff
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default 
    link/ether 02:42:ef:24:ce:c2 brd ff:ff:ff:ff:ff:ff
7: cali2009c1121bd@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default 
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 0

#打卡不停更# calico BGP Full Mesh 跨节点通信-鸿蒙开发者社区

pod1-node.cap

[root@master ~]# kubectl -n kube-system exec -it calico-node-bntq7 -- bash
Defaulted container "calico-node" out of: calico-node, upgrade-ipam (init), install-cni (init)
[root@node1 /]# birdcl 
BIRD v0.3.3+birdv1.6.8 ready.
bird> show route 
0.0.0.0/0          via 192.168.0.1 on ens33 [kernel1 13:17:25] * (10)
10.244.152.128/26  via 192.168.0.80 on ens33 [Mesh_192_168_0_80 13:17:26] * (100/0) [i]
192.168.0.0/24     dev ens33 [direct1 13:17:24] * (240)
10.244.103.64/26   via 192.168.0.82 on ens33 [Mesh_192_168_0_82 13:34:33] * (100/0) [i]
172.17.0.0/16      dev docker0 [direct1 13:17:24] * (240)
10.244.42.64/26    blackhole [static1 13:34:33] * (200)
10.244.42.65/32    dev cali2009c1121bd [kernel1 13:42:00] * (10)
bird> show route for 10.244.103.64/26 all
10.244.103.64/26   via 192.168.0.82 on ens33 [Mesh_192_168_0_82 13:34:33] * (100/0) [i]
	Type: BGP unicast univ
	BGP.origin: IGP
	BGP.as_path: 
	BGP.next_hop: 192.168.0.82
	BGP.local_pref: 100
[root@node1 /]# cat /etc/calico/confd/config/bird.cfg
function apply_communities ()
{
}

# Generated by confd
include "bird_aggr.cfg";
include "bird_ipam.cfg";

router id 192.168.0.81;

# Configure synchronization between routing tables and kernel.
protocol kernel {
  learn;             # Learn all alien routes from the kernel
  persist;           # Don't remove routes on bird shutdown
  scan time 2;       # Scan kernel routing table every 2 seconds
  import all;
  export filter calico_kernel_programming; # Default is export none
  graceful restart;  # Turn on graceful restart to reduce potential flaps in
                     # routes when reloading BIRD configuration.  With a full
                     # automatic mesh, there is no way to prevent BGP from
                     # flapping since multiple nodes update their BGP
                     # configuration at the same time, GR is not guaranteed to
                     # work correctly in this scenario.
  merge paths on;    # Allow export multipath routes (ECMP)
}

# Watch interface up/down events.
protocol device {
  debug { states };
  scan time 2;    # Scan interfaces every 2 seconds
}

protocol direct {
  debug { states };
  interface -"cali*", -"kube-ipvs*", "*"; # Exclude cali* and kube-ipvs* but
                                          # include everything else.  In
                                          # IPVS-mode, kube-proxy creates a
                                          # kube-ipvs0 interface. We exclude
                                          # kube-ipvs0 because this interface
                                          # gets an address for every in use
                                          # cluster IP. We use static routes
                                          # for when we legitimately want to
                                          # export cluster IPs.
}


# Template for all BGP clients
template bgp bgp_template {
  debug { states };
  description "Connection to BGP peer";
  local as 64512;
  multihop;
  gateway recursive; # This should be the default, but just in case.
  import all;        # Import all routes, since we don't know what the upstream
                     # topology is and therefore have to trust the ToR/RR.
  export filter calico_export_to_bgp_peers;  # Only want to export routes for workloads.
  add paths on;
  graceful restart;  # See comment in kernel section about graceful restart.
  connect delay time 2;
  connect retry time 5;
  error wait time 5,30;
}

# ------------- Node-to-node mesh -------------





# For peer /host/master.whale.com/ip_addr_v4
protocol bgp Mesh_192_168_0_80 from bgp_template {
  neighbor 192.168.0.80 as 64512;
  source address 192.168.0.81;  # The local address we use for the TCP connection
}



# For peer /host/node1.whale.com/ip_addr_v4
# Skipping ourselves (192.168.0.81)



# For peer /host/node2.whale.com/ip_addr_v4
protocol bgp Mesh_192_168_0_82 from bgp_template {
  neighbor 192.168.0.82 as 64512;
  source address 192.168.0.81;  # The local address we use for the TCP connection
  passive on; # Mesh is unidirectional, peer will connect to us.
}



# ------------- Global peers -------------
# No global peers configured.


# ------------- Node-specific peers -------------

# No node-specific peers configured.




[root@node1 ~]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.0.1     0.0.0.0         UG    100    0        0 ens33
10.244.42.64    0.0.0.0         255.255.255.192 U     0      0        0 *  # 黑洞路由
10.244.42.65    0.0.0.0         255.255.255.255 UH    0      0        0 cali2009c1121bd
10.244.103.64   192.168.0.82    255.255.255.192 UG    0      0        0 ens33 # 目的地址路由
10.244.152.128  192.168.0.80    255.255.255.192 UG    0      0        0 ens33
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0
192.168.0.0     0.0.0.0         255.255.255.0   U     100    0        0 ens33

查看 pod1 对应node 节点的 calico-node 的配置
#打卡不停更# calico BGP Full Mesh 跨节点通信-鸿蒙开发者社区
查看 IBGP 路由配置
#打卡不停更# calico BGP Full Mesh 跨节点通信-鸿蒙开发者社区
查看对应的配置文件内容
#打卡不停更# calico BGP Full Mesh 跨节点通信-鸿蒙开发者社区
#打卡不停更# calico BGP Full Mesh 跨节点通信-鸿蒙开发者社区
#打卡不停更# calico BGP Full Mesh 跨节点通信-鸿蒙开发者社区

pod2.cap

[root@master ~]# kubectl exec -it cni-test-777bbd57c8-gxv5q -- ethtool -S eth0
NIC statistics:
     peer_ifindex: 7

[root@node2 ~]# ip link show 
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether 00:0c:29:9f:1b:88 brd ff:ff:ff:ff:ff:ff
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default 
    link/ether 02:42:ff:bb:4a:42 brd ff:ff:ff:ff:ff:ff
7: cali9a5d1678aea@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default 
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 0

#打卡不停更# calico BGP Full Mesh 跨节点通信-鸿蒙开发者社区

pod2-node.cap

node2 节点的配置同 node1 节点的配置,所以node2 就参考 node1 即可

[root@node2 ~]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.0.1     0.0.0.0         UG    100    0        0 ens33
10.244.42.64    192.168.0.81    255.255.255.192 UG    0      0        0 ens33
10.244.103.64   0.0.0.0         255.255.255.192 U     0      0        0 *
10.244.103.65   0.0.0.0         255.255.255.255 UH    0      0        0 cali9a5d1678aea
10.244.152.128  192.168.0.80    255.255.255.192 UG    0      0        0 ens33
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0
192.168.0.0     0.0.0.0         255.255.255.0   U     100    0        0 ens33

#打卡不停更# calico BGP Full Mesh 跨节点通信-鸿蒙开发者社区
#打卡不停更# calico BGP Full Mesh 跨节点通信-鸿蒙开发者社区
#打卡不停更# calico BGP Full Mesh 跨节点通信-鸿蒙开发者社区

由于抓包时间过短,只抓到一个 node2 -> master 的 BGP 的 KEEPALIVER 报文,不过这也说明了,类似于在实际环境中一样的 BGP 模式
#打卡不停更# calico BGP Full Mesh 跨节点通信-鸿蒙开发者社区

©著作权归作者所有,如需转载,请注明出处,否则将追究法律责任
分类
2
收藏
回复
举报
回复
    相关推荐