【我和openGauss的故事】openGauss修改pg_hba导致节点无法启动及cm主备切换

老老老JR老北
发布于 2023-8-29 15:40
浏览
0收藏

一、状态正常

[omm@Euler1 ~]$ gs_om -t status --detail
[  CMServer State   ]

node                node_ip         instance                             state
--------------------------------------------------------------------------------
1  Euler1  172.16.220.45   1    /database/opengauss/cm/cm_server Primary
2  Euler2 172.16.220.201  2    /database/opengauss/cm/cm_server Standby
3  Euler3 172.16.220.221  3    /database/opengauss/cm/cm_server Standby

[   Cluster State   ]

cluster_state   : Normal
redistributing  : No
balanced        : Yes
current_az      : AZ_ALL

[  Datanode State   ]

node                node_ip         instance                     state            
----------------------------------------------------------------------------------
1  Euler1  172.16.220.45   6001 /database/opengauss/data P Primary Normal
2  Euler2 172.16.220.201  6002 /database/opengauss/data S Standby Normal
3  Euler3 172.16.220.221  6003 /database/opengauss/data S Standby Normal

二、修改pg_hba

[omm@Euler1 data]$ vi pg_hba.conf
host    all    all    172.16.221.6       sha256
host    all    all    172.16.221.118     sha256

三、关闭

[omm@Euler1 data]$ gs_om -t stop
Stopping cluster.
=========================================
Successfully stopped cluster.
=========================================
End stop cluster.
[omm@Euler1 data]$ gs_om -t start
Starting cluster.
======================================================================

^CTraceback (most recent call last):
File "/database/opengauss/tool/script/gs_om", line 837, in <module>
    main()
  File "/database/opengauss/tool/script/gs_om", line 806, in main
    impl.doStart()
  File "/database/opengauss/tool/script/impl/om/OmImpl.py", line 88, in doStart
    self.doStartCluster()
  File "/database/opengauss/tool/script/impl/om/OLAP/OmImplOLAP.py", line 183, in doStartCluster
    self.doStartClusterByCm()
  File "/database/opengauss/tool/script/impl/om/OLAP/OmImplOLAP.py", line 169, in doStartClusterByCm
    self.dataDir)
  File "/database/opengauss/tool/script/gspylib/component/CM/CM_OLAP/CM_OLAP.py", line 279, in startCluster
    result_set = CmdUtil.retryGetstatusoutput(cmd, retry_time=retry_times)
  File "/database/opengauss/tool/script/base_utils/os/cmd_util.py", line 566, in retryGetstatusoutput
    (status, output) = subprocess.getstatusoutput(cmd)
  File "/usr/lib64/python3.7/subprocess.py", line 611, in getstatusoutput
    data = check_output(cmd, shell=True, text=True, stderr=STDOUT)
  File "/usr/lib64/python3.7/subprocess.py", line 411, in check_output
    **kwargs).stdout
  File "/usr/lib64/python3.7/subprocess.py", line 490, in run
    stdout, stderr = process.communicate(input, timeout=timeout)
  File "/usr/lib64/python3.7/subprocess.py", line 951, in communicate
    stdout = self.stdout.read()
KeyboardInterrupt

重启,卡住

[omm@Euler1 data]$ gs_om -t status --detail
[  CMServer State   ]

node                node_ip         instance                             state
--------------------------------------------------------------------------------
1  Euler1  172.16.220.45   1    /database/opengauss/cm/cm_server Primary
2  Euler2 172.16.220.201  2    /database/opengauss/cm/cm_server Standby
3  Euler3 172.16.220.221  3    /database/opengauss/cm/cm_server Standby

[   Cluster State   ]

cluster_state   : Degraded
redistributing  : No
balanced        : No
current_az      : AZ_ALL

[  Datanode State   ]

node                node_ip         instance                     state            
----------------------------------------------------------------------------------
1  Euler1  172.16.220.45   6001 /database/opengauss/data P Pending Starting
2  Euler2 172.16.220.201  6002 /database/opengauss/data S Primary Normal
3  Euler3 172.16.220.221  6003 /database/opengauss/data S Standby Normal

其他节点已经启动,本节点一直处于Pending Starting

[omm@Euler1 data]$ gs_om -t stop -h Euler1
Stopping node.
=========================================
Successfully stopped node.
=========================================
End stop node.
[omm@Euler1 data]$ gs_om -t start -h Euler1
Starting node.
======================================================================
Successfully started node.
======================================================================
End start node.
Successfully started node.
[omm@Euler1 data]$ gs_om -t status --detail
[  CMServer State   ]

node                node_ip         instance                             state
--------------------------------------------------------------------------------
1  Euler1  172.16.220.45   1    /database/opengauss/cm/cm_server Standby
2  Euler2 172.16.220.201  2    /database/opengauss/cm/cm_server Primary
3  Euler3 172.16.220.221  3    /database/opengauss/cm/cm_server Standby

[   Cluster State   ]

cluster_state   : Degraded
redistributing  : No
balanced        : No
current_az      : AZ_ALL

[  Datanode State   ]

node                node_ip         instance                     state            
----------------------------------------------------------------------------------
1  Euler1  172.16.220.45   6001 /database/opengauss/data P Pending Starting
2  Euler2 172.16.220.201  6002 /database/opengauss/data S Primary Normal
3  Euler3 172.16.220.221  6003 /database/opengauss/data S Standby Normal

关闭、重启本节点,状态一直是Pending Starting,并且由于故障切换至另外节点

四、修复

仔细观察发现是pg_hba.conf中添加的ip地址格式错误,ip地址后面没有写掩码,修改完成.

[omm@Euler1 data]$ gs_om -t stop -h Euler1
Stopping node.
=========================================
Successfully stopped node.
=========================================
End stop node.
[omm@Euler1 data]$ gs_om -t start -h Euler1
Starting node.
======================================================================
Successfully started node.
======================================================================
End start node.
Successfully started node.
[omm@Euler1 data]$ gs_om -t status --detail
[  CMServer State   ]

node                node_ip         instance                             state
--------------------------------------------------------------------------------
1  Euler1  172.16.220.45   1    /database/opengauss/cm/cm_server Standby
2  Euler2 172.16.220.201  2    /database/opengauss/cm/cm_server Primary
3  Euler3 172.16.220.221  3    /database/opengauss/cm/cm_server Standby

[   Cluster State   ]

cluster_state   : Normal
redistributing  : No
balanced        : No
current_az      : AZ_ALL

[  Datanode State   ]

node                node_ip         instance                     state            
----------------------------------------------------------------------------------
1  Euler1  172.16.220.45   6001 /database/opengauss/data P Standby Normal
2  Euler2 172.16.220.201  6002 /database/opengauss/data S Primary Normal
3  Euler3 172.16.220.221  6003 /database/opengauss/data S Standby Normal

修改完成后可以正常启动,切换回来。

[omm@Euler1 data]$ gs_ctl switchover -D /database/opengauss/data
[2023-07-13 14:05:22.942][2167163][][gs_ctl]: gs_ctl switchover ,datadir is /database/opengauss/data 
[2023-07-13 14:05:22.942][2167163][][gs_ctl]: switchover term (1)
[2023-07-13 14:05:22.947][2167163][][gs_ctl]: waiting for server to switchover........
[2023-07-13 14:05:27.975][2167163][][gs_ctl]: done
[2023-07-13 14:05:27.975][2167163][][gs_ctl]: switchover completed (/database/opengauss/data)
[omm@Euler1 data]$ gs_om -t status -detail
[GAUSS-50000] : Unrecognized parameter: -d.
[omm@Euler1 data]$ gs_om -t status --detail
[  CMServer State   ]

node                node_ip         instance                             state
--------------------------------------------------------------------------------
1  Euler1  172.16.220.45   1    /database/opengauss/cm/cm_server Standby
2  Euler2 172.16.220.201  2    /database/opengauss/cm/cm_server Primary
3  Euler3 172.16.220.221  3    /database/opengauss/cm/cm_server Standby

[   Cluster State   ]

cluster_state   : Normal
redistributing  : No
balanced        : Yes
current_az      : AZ_ALL

[  Datanode State   ]

node                node_ip         instance                     state            
----------------------------------------------------------------------------------
1  Euler1  172.16.220.45   6001 /database/opengauss/data P Primary Normal
2  Euler2 172.16.220.201  6002 /database/opengauss/data S Standby Normal
3  Euler3 172.16.220.221  6003 /database/opengauss/data S Standby Normal
[omm@Euler1 data]$ gs_om -t refreshconf
Generating dynamic configuration file for all nodes.
Successfully generated dynamic configuration file.

存在一个问题,Datanode切换成功,但是CMServer的Primary节点依然在2节点。

五、CM切换

目前没有发现可以手工切换cm的命令,cm_ctl依然是切换数据库主备命令,不过cm只是故障转移组件,不影响。对于强迫症患者,可以考虑将主节点全部切换到同一个节点或者在2节点复现上面的错误由cm自动去切换。
既然可以通过触发故障切换实现切换效果,那么当然也可以通过kill dn进程触发切换

[omm@Euler2 ~]$ ps ux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
omm         1627  0.0  0.0  20056  9968 ?        Ss   Jun14   3:42 /usr/lib/systemd/systemd --user
omm         1629  0.0  0.0  24124  2812 ?        S    Jun14   0:00 (sd-pam)
omm         2422  0.4  0.0  18564 13608 ?        S    Jun14 172:21 /database/opengauss/app/bin/om_monitor -L /database/opengauss/log/omm/cm/om_monitor
omm      2396593  0.0  0.0 214116  3676 pts/0    S+   11:53   0:00 -bash
omm      2611562  2.0  0.0 799340 24364 ?        Sl   14:18   4:04 /database/opengauss/app/bin/cm_agent
omm      2611575  4.7  0.3 6505600 400692 ?      Sl   14:18   9:23 /database/opengauss/app/bin/cm_server
omm      2611586  3.0  2.8 47986344 3774748 ?    Sl   14:18   6:02 /database/opengauss/app/bin/gaussdb -D /database/opengauss/data -M pending
omm      2611593  0.0  0.0 1401088 75952 ?       Sl   14:18   0:00 gaussdb fenced UDF master process
omm      2908296  0.0  0.0  15020  4772 ?        S    17:37   0:00 sshd: omm@pts/1
omm      2908297  0.0  0.0 214088  3752 pts/1    Ss   17:37   0:00 -bash
omm      2908380  0.0  0.0 215868  3228 pts/1    R+   17:37   0:00 ps ux
[omm@Euler2 ~]$ kill -9 2611575

在二节点找到cm_server进程,kill进程,kill后om会自动重新拉起

[omm@Euler2 ~]$ gs_om -t status --detail
[  CMServer State   ]

node                node_ip         instance                             state
--------------------------------------------------------------------------------
1  Euler1  10.236.160.45  1    /database/opengauss/cm/cm_server Primary
2  Euler2 10.236.160.201  2    /database/opengauss/cm/cm_server Standby
3  Euler3 10.236.160.221  3    /database/opengauss/cm/cm_server Standby

[   Cluster State   ]

cluster_state   : Normal
redistributing  : No
balanced        : Yes
current_az      : AZ_ALL

[  Datanode State   ]

node                node_ip         instance                     state            
----------------------------------------------------------------------------------
1  Euler1 10.236.160.45   6001 /database/opengauss/data P Primary Normal
2  Euler2 10.236.160.201  6002 /database/opengauss/data S Standby Normal
3  Euler3 10.236.160.221  6003 /database/opengauss/data S Standby Normal

此时发现cm主备节点已经更换。




文章转载自公众号:openGauss

分类
标签
已于2023-8-29 15:40:25修改
收藏
回复
举报
回复
    相关推荐