【我和openGauss的故事】openGauss GAUSS-51400/53600 其它节点状态unknow问题处置

老老老JR老北
发布于 2023-8-29 15:36
浏览
0收藏

一、检查状态

[omm@Euler1 ~]$ gs_om -t status --detail
[   Cluster State   ]

cluster_state   : Unavailable
redistributing  : No
current_az      : AZ_ALL

[  Datanode State   ]

    node  node_ip         port      instance                state
---------------------------------------------------------------------------------
1  Euler1 172.16.220.151  26000      6001 /gauss/data/db1   P Down    Manually stopped
2  Euler2 172.16.220.152  26000      6002 /gauss/data/db1   S Unknown Unknown
3  Euler3 172.16.220.153  26000      6003 /gauss/data/db1   C Unknown Unknown

二、GAUSS-51400

[omm@Euler1 ~]$ gs_om -t start 
Starting cluster.
=========================================
omm@euler2's password: 
[GAUSS-51400] : Failed to execute the command: scp Euler3:/gauss/app_5b3e5810/bin/cluster_dynamic_config /gauss/app_5b3e5810/bin/cluster_dynamic_config_Euler3. Error:
ssh: connect to host euler3 port 22: No route to host

Euler3节点主机有问题,检查发现主机未正常启动,重启主机

三、GAUSS-53600/51400

再次启动,发现报错GAUSS-53600/51400

[omm@Euler1 ~]$ gs_om -t start 
Starting cluster.
=========================================
omm@euler2's password: 
omm@euler3's password: 
[SUCCESS] Euler1
2023-07-11 16:33:56.783 64ad13f4.1 [unknown] 140702557879360 [unknown] 0 dn_6001_6002_6003 01000  0 [BACKEND] WARNING:  could not create any HA TCP/IP sockets
2023-07-11 16:33:56.785 64ad13f4.1 [unknown] 140702557879360 [unknown] 0 dn_6001_6002_6003 01000  0 [BACKEND] WARNING:  Failed to initialize the memory protect for g_instance.attr.attr_storage.cstore_buffers (16 Mbytes) or shared memory (1000 Mbytes) is larger.
=========================================
[GAUSS-53600]: Can not start the database, the cmd is source /home/omm/.bashrc; python3 '/gauss/om/script/local/StartInstance.py' -U omm -R /gauss/app -t 300 --security-mode=off,  Error:
[GAUSS-51400] : Failed to execute the command: source /home/omm/.bashrc; python3 '/gauss/om/script/local/StartInstance.py' -U omm -R /gauss/app -t 300 --security-mode=off. Error:
[FAILURE] Euler2:
.[GAUSS-51400] : Failed to execute the command: source /home/omm/.bashrc; python3 '/gauss/om/script/local/StartInstance.py' -U omm -R /gauss/app -t 300 --security-mode=off. Error:
[FAILURE] Euler3:

脚本执行存在问题,python太不靠谱了,关闭节点排查一下

[omm@Euler1 ~]$ gs_om -t status --detail
[   Cluster State   ]

cluster_state   : Degraded
redistributing  : No
current_az      : AZ_ALL

[  Datanode State   ]

    node  node_ip         port      instance                state
---------------------------------------------------------------------------------
1  Euler1 172.16.220.151  26000      6001 /gauss/data/db1   P Primary Normal
2  Euler2 172.16.220.152  26000      6002 /gauss/data/db1   S Unknown Unknown
3  Euler3 172.16.220.153  26000      6003 /gauss/data/db1   C Unknown Unknown
[omm@Euler1 ~]$ gs_ctl stop -D /gauss/data/db1
[2023-07-11 16:35:58.075][39021][][gs_ctl]: gs_ctl stopped ,datadir is /gauss/data/db1 
waiting for server to shut down......... done

omm@Euler1 ~]$ python
Python 3.7.4 (default, Mar  3 2022, 14:19:16) 
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 
[omm@Euler1 ~]$ 
[omm@Euler1 ~]$ 
[omm@Euler1 ~]$ which python
/usr/bin/python
[omm@Euler1 ~]$ cd /usr/bin/
[root@Euler1 bin]# ls -lsa python
0 lrwxrwxrwx 1 root root 7 Jul  4 16:33 python -> python3
[root@Euler1 bin]# rm python
rm: remove symbolic link 'python'? y

[root@Euler1 bin]# ln -s python2.7 python

删除软连接,换成python2

[omm@Euler1 ~]$ gs_om -t status --detail
[   Cluster State   ]

cluster_state   : Degraded
redistributing  : No
current_az      : AZ_ALL

[  Datanode State   ]

    node  node_ip         port      instance                state
---------------------------------------------------------------------------------
1  Euler1 172.16.220.151  26000      6001 /gauss/data/db1   P Primary Normal
2  Euler2 172.16.220.152  26000      6002 /gauss/data/db1   S Unknown Unknown
3  Euler3 172.16.220.153  26000      6003 /gauss/data/db1   C Unknown Unknown
[omm@Euler1 ~]$ gs_om -t stop
Stopping cluster.
=========================================
[GAUSS-53606]: Can not stop the database, the cmd is source /home/omm/.bashrc; python3 '/gauss/om/script/local/StopInstance.py' -U omm -R /gauss/app -t 300 -m fast,  Error:
[GAUSS-51400] : Failed to execute the command: source /home/omm/.bashrc; python3 '/gauss/om/script/local/StopInstance.py' -U omm -R /gauss/app -t 300 -m fast. Error:
[FAILURE] Euler1:
[FAILURE] Euler2:
[FAILURE] Euler3:
..
[omm@Euler1 ~]$ ls -lsa /gauss/om/script/local/StopInstance.py
8 -rwx------ 1 omm dbgrp 4719 Nov 12  2022 /gauss/om/script/local/StopInstance.py
[omm@Euler1 ~]$ chmod 777 /gauss/om/script/local/StopInstance.py
[omm@Euler1 ~]$ gs_om -t stop
Stopping cluster.
=========================================
[GAUSS-53606]: Can not stop the database, the cmd is source /home/omm/.bashrc; python3 '/gauss/om/script/local/StopInstance.py' -U omm -R /gauss/app -t 300 -m fast,  Error:
[GAUSS-51400] : Failed to execute the command: source /home/omm/.bashrc; python3 '/gauss/om/script/local/StopInstance.py' -U omm -R /gauss/app -t 300 -m fast. Error:
[FAILURE] Euler1:
[FAILURE] Euler2:
[FAILURE] Euler3:

再次关闭,依然报错,重新修改权限,依然报错

[omm@Euler1 ~]$ ls -lsa /gauss/om/script/local/StopInstance.py
8 -rwxrwxrwx 1 omm dbgrp 4719 Nov 12  2022 /gauss/om/script/local/StopInstance.py
[omm@Euler1 ~]$ python3 /gauss/om/script/local/StopInstance.py -U omm -R /gauss/app -t 300 -m fast

[omm@Euler1 ~]$ gs_om -t status --detail
[   Cluster State   ]

cluster_state   : Unavailable
redistributing  : No
current_az      : AZ_ALL

[  Datanode State   ]

    node  node_ip         port      instance                state
---------------------------------------------------------------------------------
1  Euler1 172.16.220.151  26000      6001 /gauss/data/db1   P Down    Manually stopped
2  Euler2 172.16.220.152  26000      6002 /gauss/data/db1   S Unknown Unknown
3  Euler3 172.16.220.153  26000      6003 /gauss/data/db1   C Unknown Unknown

手工执行脚本,可以执行,看来是上面需要将python3换成python2。可以启动,但是需要手工输入其它节点omm用户密码,看来互信失效,同时发现其它节点state状态均为Unknown。

四、解决互信

重新补充互信,这里利用oracle自带sshUserSetup.sh脚本添加互信

[root@Euler1 ~]# ./sshUserSetup.sh -user omm  -hosts "Euler1 Euler2 Euler3" -advanced -exverify –confirm
The output of this script is also logged into /tmp/sshUserSetup_2023-07-11-16-51-24.log
Hosts are Euler1 Euler2 Euler3
user is omm
Platform:- Linux 
Checking if the remote hosts are reachable
PING Euler1 (172.16.220.151) 56(84) bytes of data.
64 bytes from Euler1 (172.16.220.151): icmp_seq=1 ttl=64 time=0.244 ms
64 bytes from Euler1 (172.16.220.151): icmp_seq=2 ttl=64 time=0.032 ms
64 bytes from Euler1 (172.16.220.151): icmp_seq=3 ttl=64 time=0.043 ms
64 bytes from Euler1 (172.16.220.151): icmp_seq=4 ttl=64 time=0.024 ms
64 bytes from Euler1 (172.16.220.151): icmp_seq=5 ttl=64 time=0.036 ms

--- Euler1 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4126ms
rtt min/avg/max/mdev = 0.024/0.075/0.244/0.084 ms
PING Euler2 (172.16.220.152) 56(84) bytes of data.
64 bytes from Euler2 (172.16.220.152): icmp_seq=1 ttl=64 time=0.369 ms
64 bytes from Euler2 (172.16.220.152): icmp_seq=2 ttl=64 time=0.313 ms
64 bytes from Euler2 (172.16.220.152): icmp_seq=3 ttl=64 time=0.210 ms
64 bytes from Euler2 (172.16.220.152): icmp_seq=4 ttl=64 time=0.143 ms
64 bytes from Euler2 (172.16.220.152): icmp_seq=5 ttl=64 time=0.356 ms

--- Euler2 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4093ms
rtt min/avg/max/mdev = 0.143/0.278/0.369/0.087 ms
PING Euler3 (172.16.220.153) 56(84) bytes of data.
64 bytes from Euler3 (172.16.220.153): icmp_seq=1 ttl=64 time=0.309 ms
64 bytes from Euler3 (172.16.220.153): icmp_seq=2 ttl=64 time=0.180 ms
64 bytes from Euler3 (172.16.220.153): icmp_seq=3 ttl=64 time=0.258 ms
64 bytes from Euler3 (172.16.220.153): icmp_seq=4 ttl=64 time=0.184 ms
64 bytes from Euler3 (172.16.220.153): icmp_seq=5 ttl=64 time=0.237 ms

--- Euler3 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4094ms
rtt min/avg/max/mdev = 0.180/0.233/0.309/0.048 ms
Remote host reachability check succeeded.
The following hosts are reachable: Euler1 Euler2 Euler3.
The following hosts are not reachable: .
All hosts are reachable. Proceeding further...
firsthost Euler1
numhosts 3
The script will setup SSH connectivity from the host Euler1 to all
the remote hosts. After the script is executed, the user can use SSH to run
commands on the remote hosts or copy files between this host Euler1
and the remote hosts without being prompted for passwords or confirmations.

NOTE 1:
As part of the setup procedure, this script will use ssh and scp to copy
files between the local host and the remote hosts. Since the script does not
store passwords, you may be prompted for the passwords during the execution of
the script whenever ssh or scp is invoked.

NOTE 2:
AS PER SSH REQUIREMENTS, THIS SCRIPT WILL SECURE THE USER HOME DIRECTORY
AND THE .ssh DIRECTORY BY REVOKING GROUP AND WORLD WRITE PRIVILEDGES TO THESE
directories.

Do you want to continue and let the script make the above mentioned changes (yes/no)?
yes

The user chose yes
Please specify if you want to specify a passphrase for the private key this script will create for the local host. Passphrase is used to encrypt the private key and makes SSH much more secure. Type 'yes' or 'no' and then press enter. In case you press 'yes', you would need to enter the passphrase whenever the script executes ssh or scp. 
The estimated number of times the user would be prompted for a passphrase is 6. In addition, if the private-public files are also newly created, the user would have to specify the passphrase on one additional occasion. 
Enter 'yes' or 'no'.
yes

The user chose yes
The files containing the client public and private keys already exist on the local host. The current private key may or may not have a passphrase associated with it. In case you remember the passphrase and do not want to re-run ssh-keygen, press 'no' and enter. If you press 'no', the script will not attempt to create any new public/private key pairs. If you press 'yes', the script will remove the old private/public key files existing and create new ones prompting the user to enter the passphrase. If you enter 'yes', any previous SSH user setups would be reset. If you press 'change', the script will associate a new passphrase with the old keys.
Press 'yes', 'no' or 'change'
yes
The user chose yes
Creating .ssh directory on local host, if not present already
Creating authorized_keys file on local host
Changing permissions on authorized_keys to 644 on local host
Creating known_hosts file on local host
Changing permissions on known_hosts to 644 on local host
Creating config file on local host
If a config file exists already at /root/.ssh/config, it would be backed up to /root/.ssh/config.backup.
Removing old private/public keys on local host
Running SSH keygen on local host
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Generating public/private rsa key pair.
Your identification has been saved in /root/.ssh/id_rsa
Your public key has been saved in /root/.ssh/id_rsa.pub
The key fingerprint is:
SHA256:IkRGcQs/1NPx/5lOqkpvd5rMgqBq9sWYO11HCpu2dGQ root@Euler1
The key's randomart image is:
+---[RSA 1024]----+
|   .*.o. ...     |
|   o = .o ..     |
|    . +  .  .    |
|   .   o E . .   |
|    . . S o   .  |
|     .+B + .   .o|
|     o=o=.o    +.|
|   o ooo...oo.=. |
|  o.oo.  .oo+*o. |
+----[SHA256]-----+
Creating .ssh directory and setting permissions on remote host Euler1
THE SCRIPT WOULD ALSO BE REVOKING WRITE PERMISSIONS FOR group AND others ON THE HOME DIRECTORY FOR omm. THIS IS AN SSH REQUIREMENT.
The script would create ~omm/.ssh/config file on remote host Euler1. If a config file exists already at ~omm/.ssh/config, it would be backed up to ~omm/.ssh/config.backup.
The user may be prompted for a password here since the script would be running SSH on host Euler1.
Warning: Permanently added 'euler1,172.16.220.151' (ECDSA) to the list of known hosts.
omm@euler1's password: 
Done with creating .ssh directory and setting permissions on remote host Euler1.
Creating .ssh directory and setting permissions on remote host Euler2
THE SCRIPT WOULD ALSO BE REVOKING WRITE PERMISSIONS FOR group AND others ON THE HOME DIRECTORY FOR omm. THIS IS AN SSH REQUIREMENT.
The script would create ~omm/.ssh/config file on remote host Euler2. If a config file exists already at ~omm/.ssh/config, it would be backed up to ~omm/.ssh/config.backup.
The user may be prompted for a password here since the script would be running SSH on host Euler2.
Warning: Permanently added 'euler2,172.16.220.152' (ECDSA) to the list of known hosts.
omm@euler2's password: 
Done with creating .ssh directory and setting permissions on remote host Euler2.
Creating .ssh directory and setting permissions on remote host Euler3
THE SCRIPT WOULD ALSO BE REVOKING WRITE PERMISSIONS FOR group AND others ON THE HOME DIRECTORY FOR omm. THIS IS AN SSH REQUIREMENT.
The script would create ~omm/.ssh/config file on remote host Euler3. If a config file exists already at ~omm/.ssh/config, it would be backed up to ~omm/.ssh/config.backup.
The user may be prompted for a password here since the script would be running SSH on host Euler3.
Warning: Permanently added 'euler3,172.16.220.153' (ECDSA) to the list of known hosts.
omm@euler3's password: 
Done with creating .ssh directory and setting permissions on remote host Euler3.
Copying local host public key to the remote host Euler1
The user may be prompted for a password or passphrase here since the script would be using SCP for host Euler1.
omm@euler1's password: 
Done copying local host public key to the remote host Euler1
Copying local host public key to the remote host Euler2
The user may be prompted for a password or passphrase here since the script would be using SCP for host Euler2.
omm@euler2's password: 
Done copying local host public key to the remote host Euler2
Copying local host public key to the remote host Euler3
The user may be prompted for a password or passphrase here since the script would be using SCP for host Euler3.
omm@euler3's password: 
Done copying local host public key to the remote host Euler3
Creating keys on remote host Euler1 if they do not exist already. This is required to setup SSH on host Euler1.
Generating public/private rsa key pair.
Your identification has been saved in .ssh/id_rsa
Your public key has been saved in .ssh/id_rsa.pub
The key fingerprint is:
SHA256:/dVBP3a7CaZGbr0BUJ2ImHi+f4TMRnwdM6vG4oZq+lo omm@Euler1
The key's randomart image is:
+---[RSA 1024]----+
|      . o ..o .. |
|     . + ... *. .|
|      o ..  . =+o|
|       . +.. o..=|
|        S =o.o...|
|       . *o*=.. o|
|     E  = +=.o o |
|    .. . +o.  o  |
|   o=o. . .  .   |
+----[SHA256]-----+
Creating keys on remote host Euler2 if they do not exist already. This is required to setup SSH on host Euler2.
Generating public/private rsa key pair.
Your identification has been saved in .ssh/id_rsa
Your public key has been saved in .ssh/id_rsa.pub
The key fingerprint is:
SHA256:cXWZSBiArd+L338IqBBH4Brj+gL+Vq0UZTefF3/j18M omm@Euler2
The key's randomart image is:
+---[RSA 1024]----+
|      .+...+o..o |
|     ..ooo...oo  |
|    o +oo + . o  |
|   . =o .o o . o.|
|    o o+S. .. o +|
|.  . o... o .  Eo|
|... o .. o . . .o|
| ..o .  o ..  . .|
|  oo.    .. .... |
+----[SHA256]-----+
Creating keys on remote host Euler3 if they do not exist already. This is required to setup SSH on host Euler3.
Generating public/private rsa key pair.
Your identification has been saved in .ssh/id_rsa
Your public key has been saved in .ssh/id_rsa.pub
The key fingerprint is:
SHA256:vxBPdYALuU88XOamxa/GrMaOd+oWgXo+PCEOMrWU3/U omm@Euler3
The key's randomart image is:
+---[RSA 1024]----+
|         . ..    |
|        o . o.   |
|     .   * *. .  |
|    +   o O.=.   |
|   o o oS+.B .   |
|  o o + +=+ E .  |
|   o o =.oo+ .   |
|      . =o=.*    |
|        .O*=     |
+----[SHA256]-----+
Updating authorized_keys file on remote host Euler1
Updating known_hosts file on remote host Euler1
The script will run SSH on the remote machine Euler1. The user may be prompted for a passphrase here in case the private key has been encrypted with a passphrase.
Updating authorized_keys file on remote host Euler2
Updating known_hosts file on remote host Euler2
The script will run SSH on the remote machine Euler2. The user may be prompted for a passphrase here in case the private key has been encrypted with a passphrase.
Updating authorized_keys file on remote host Euler3
Updating known_hosts file on remote host Euler3
The script will run SSH on the remote machine Euler3. The user may be prompted for a passphrase here in case the private key has been encrypted with a passphrase.
SSH setup is complete.

------------------------------------------------------------------------
Verifying SSH setup
===================
The script will now run the date command on the remote nodes using ssh
to verify if ssh is setup correctly. IF THE SETUP IS CORRECTLY SETUP,
THERE SHOULD BE NO OUTPUT OTHER THAN THE DATE AND SSH SHOULD NOT ASK FOR
PASSWORDS. If you see any output other than date or are prompted for the
password, ssh is not setup correctly and you will need to resolve the
issue and set up ssh again.
The possible causes for failure could be:
1. The server settings in /etc/ssh/sshd_config file do not allow ssh
for user omm.
2. The server may have disabled public key based authentication.
3. The client public key on the server may be outdated.
4. ~omm or ~omm/.ssh on the remote host may not be owned by omm.
5. User may not have passed -shared option for shared remote users or
may be passing the -shared option for non-shared remote users.
6. If there is output in addition to the date, but no password is asked,
it may be a security alert shown as part of company policy. Append the
additional text to the <OMS HOME>/sysman/prov/resources/ignoreMessages.txt file.
------------------------------------------------------------------------
--Euler1:--
Running /usr/bin/ssh -x -l omm Euler1 date to verify SSH connectivity has been setup from local host to Euler1.
IF YOU SEE ANY OTHER OUTPUT BESIDES THE OUTPUT OF THE DATE COMMAND OR IF YOU ARE PROMPTED FOR A PASSWORD HERE, IT MEANS SSH SETUP HAS NOT BEEN SUCCESSFUL. Please note that being prompted for a passphrase may be OK but being prompted for a password is ERROR.
The script will run SSH on the remote machine Euler1. The user may be prompted for a passphrase here in case the private key has been encrypted with a passphrase.
Tue Jul 11 16:51:58 CST 2023
------------------------------------------------------------------------
--Euler2:--
Running /usr/bin/ssh -x -l omm Euler2 date to verify SSH connectivity has been setup from local host to Euler2.
IF YOU SEE ANY OTHER OUTPUT BESIDES THE OUTPUT OF THE DATE COMMAND OR IF YOU ARE PROMPTED FOR A PASSWORD HERE, IT MEANS SSH SETUP HAS NOT BEEN SUCCESSFUL. Please note that being prompted for a passphrase may be OK but being prompted for a password is ERROR.
The script will run SSH on the remote machine Euler2. The user may be prompted for a passphrase here in case the private key has been encrypted with a passphrase.
Tue Jul 11 16:51:58 CST 2023
------------------------------------------------------------------------
--Euler3:--
Running /usr/bin/ssh -x -l omm Euler3 date to verify SSH connectivity has been setup from local host to Euler3.
IF YOU SEE ANY OTHER OUTPUT BESIDES THE OUTPUT OF THE DATE COMMAND OR IF YOU ARE PROMPTED FOR A PASSWORD HERE, IT MEANS SSH SETUP HAS NOT BEEN SUCCESSFUL. Please note that being prompted for a passphrase may be OK but being prompted for a password is ERROR.
The script will run SSH on the remote machine Euler3. The user may be prompted for a passphrase here in case the private key has been encrypted with a passphrase.
Tue Jul 11 16:51:58 CST 2023
------------------------------------------------------------------------
------------------------------------------------------------------------
Verifying SSH connectivity has been setup from Euler1 to Euler1
------------------------------------------------------------------------
IF YOU SEE ANY OTHER OUTPUT BESIDES THE OUTPUT OF THE DATE COMMAND OR IF YOU ARE PROMPTED FOR A PASSWORD HERE, IT MEANS SSH SETUP HAS NOT BEEN SUCCESSFUL.
Tue Jul 11 16:51:59 CST 2023
------------------------------------------------------------------------
------------------------------------------------------------------------
Verifying SSH connectivity has been setup from Euler1 to Euler2
------------------------------------------------------------------------
IF YOU SEE ANY OTHER OUTPUT BESIDES THE OUTPUT OF THE DATE COMMAND OR IF YOU ARE PROMPTED FOR A PASSWORD HERE, IT MEANS SSH SETUP HAS NOT BEEN SUCCESSFUL.
Tue Jul 11 16:51:59 CST 2023
------------------------------------------------------------------------
------------------------------------------------------------------------
Verifying SSH connectivity has been setup from Euler1 to Euler3
------------------------------------------------------------------------
IF YOU SEE ANY OTHER OUTPUT BESIDES THE OUTPUT OF THE DATE COMMAND OR IF YOU ARE PROMPTED FOR A PASSWORD HERE, IT MEANS SSH SETUP HAS NOT BEEN SUCCESSFUL.
Tue Jul 11 16:52:00 CST 2023
------------------------------------------------------------------------
-Verification from Euler1 complete-
------------------------------------------------------------------------
Verifying SSH connectivity has been setup from Euler2 to Euler1
------------------------------------------------------------------------
IF YOU SEE ANY OTHER OUTPUT BESIDES THE OUTPUT OF THE DATE COMMAND OR IF YOU ARE PROMPTED FOR A PASSWORD HERE, IT MEANS SSH SETUP HAS NOT BEEN SUCCESSFUL.
Tue Jul 11 16:52:00 CST 2023
------------------------------------------------------------------------
------------------------------------------------------------------------
Verifying SSH connectivity has been setup from Euler2 to Euler2
------------------------------------------------------------------------
IF YOU SEE ANY OTHER OUTPUT BESIDES THE OUTPUT OF THE DATE COMMAND OR IF YOU ARE PROMPTED FOR A PASSWORD HERE, IT MEANS SSH SETUP HAS NOT BEEN SUCCESSFUL.
Tue Jul 11 16:52:00 CST 2023
------------------------------------------------------------------------
------------------------------------------------------------------------
Verifying SSH connectivity has been setup from Euler2 to Euler3
------------------------------------------------------------------------
IF YOU SEE ANY OTHER OUTPUT BESIDES THE OUTPUT OF THE DATE COMMAND OR IF YOU ARE PROMPTED FOR A PASSWORD HERE, IT MEANS SSH SETUP HAS NOT BEEN SUCCESSFUL.
Tue Jul 11 16:52:01 CST 2023
------------------------------------------------------------------------
-Verification from Euler2 complete-
------------------------------------------------------------------------
Verifying SSH connectivity has been setup from Euler3 to Euler1
------------------------------------------------------------------------
IF YOU SEE ANY OTHER OUTPUT BESIDES THE OUTPUT OF THE DATE COMMAND OR IF YOU ARE PROMPTED FOR A PASSWORD HERE, IT MEANS SSH SETUP HAS NOT BEEN SUCCESSFUL.
Tue Jul 11 16:52:02 CST 2023
------------------------------------------------------------------------
------------------------------------------------------------------------
Verifying SSH connectivity has been setup from Euler3 to Euler2
------------------------------------------------------------------------
IF YOU SEE ANY OTHER OUTPUT BESIDES THE OUTPUT OF THE DATE COMMAND OR IF YOU ARE PROMPTED FOR A PASSWORD HERE, IT MEANS SSH SETUP HAS NOT BEEN SUCCESSFUL.
Tue Jul 11 16:52:02 CST 2023
------------------------------------------------------------------------
------------------------------------------------------------------------
Verifying SSH connectivity has been setup from Euler3 to Euler3
------------------------------------------------------------------------
IF YOU SEE ANY OTHER OUTPUT BESIDES THE OUTPUT OF THE DATE COMMAND OR IF YOU ARE PROMPTED FOR A PASSWORD HERE, IT MEANS SSH SETUP HAS NOT BEEN SUCCESSFUL.
Tue Jul 11 16:52:02 CST 2023
------------------------------------------------------------------------
-Verification from Euler3 complete-
SSH verification complete.

五、检查状态

[omm@Euler1 ~]$ gs_om -t status --detail
[   Cluster State   ]

cluster_state   : Unavailable
redistributing  : No
current_az      : AZ_ALL

[  Datanode State   ]

    node  node_ip         port      instance                state
---------------------------------------------------------------------------------
1  Euler1 172.16.220.151  26000      6001 /gauss/data/db1   P Down    Manually stopped
2  Euler2 172.16.220.152  26000      6002 /gauss/data/db1   S Standby Need repair(Connecting)
3  Euler3 172.16.220.153  26000      6003 /gauss/data/db1   C Standby Need repair(Connecting)
再做恢复,让子弹飞一会

[omm@Euler1 ~]$ gs_om -t status --detail
[   Cluster State   ]

cluster_state   : Normal
redistributing  : No
current_az      : AZ_ALL

[  Datanode State   ]

    node  node_ip         port      instance                state
---------------------------------------------------------------------------------
1  Euler1 172.16.220.151  26000      6001 /gauss/data/db1   P Primary Normal
2  Euler2 172.16.220.152  26000      6002 /gauss/data/db1   S Standby Normal
3  Euler3 172.16.220.153  26000      6003 /gauss/data/db1   C Standby Normal

Euler3节点state为C Standby Normal,正常应该是Cascade,级别不对

六、修复级别状态

[omm@Euler3 ~]$ gs_ctl stop -D /gauss/data/db1
[2023-07-11 17:13:56.050][73024][][gs_ctl]: gs_ctl stopped ,datadir is /gauss/data/db1 
waiting for server to shut down.... done
server stopped

登录三节点关闭该节点数据库

[omm@Euler3 ~]$ gs_ctl start -D /gauss/data/db1 -M cascade_standby
[2023-07-11 17:14:27.234][73207][][gs_ctl]: gs_ctl started,datadir is /gauss/data/db1 
[2023-07-11 17:14:27.257][73207][][gs_ctl]: waiting for server to start...
.0 LOG:  [Alarm Module]can not read GAUSS_WARNING_TYPE env.

0 LOG:  [Alarm Module]Host Name: Euler3 

0 LOG:  [Alarm Module]Host IP: 172.16.220.153 

0 LOG:  [Alarm Module]Cluster Name: gscluster 

0 LOG:  [Alarm Module]Invalid data in AlarmItem file! Read alarm English name failed! line: 57

0 WARNING:  failed to open feature control file, please check whether it exists: FileName=gaussdb.version, Errno=2, Errmessage=No such file or directory.
0 WARNING:  failed to parse feature control file: gaussdb.version.
0 WARNING:  Failed to load the product control file, so gaussdb cannot distinguish product version.
The core dump path is an invalid directory
2023-07-11 17:14:27.310 64ad1d73.1 [unknown] 139936913943616 [unknown] 0 dn_6001_6002_6003 DB010  0 [REDO] LOG:  Recovery parallelism, cpu count = 1, max = 4, actual = 1
2023-07-11 17:14:27.310 64ad1d73.1 [unknown] 139936913943616 [unknown] 0 dn_6001_6002_6003 DB010  0 [REDO] LOG:  ConfigRecoveryParallelism, true_max_recovery_parallelism:4, max_recovery_parallelism:4
2023-07-11 17:14:27.314 64ad1d73.1 [unknown] 139936913943616 [unknown] 0 dn_6001_6002_6003 00000  0 [BACKEND] LOG:  [Alarm Module]can not read GAUSS_WARNING_TYPE env.

2023-07-11 17:14:27.314 64ad1d73.1 [unknown] 139936913943616 [unknown] 0 dn_6001_6002_6003 00000  0 [BACKEND] LOG:  [Alarm Module]Host Name: Euler3 

2023-07-11 17:14:27.314 64ad1d73.1 [unknown] 139936913943616 [unknown] 0 dn_6001_6002_6003 00000  0 [BACKEND] LOG:  [Alarm Module]Host IP: 172.16.220.153 

2023-07-11 17:14:27.314 64ad1d73.1 [unknown] 139936913943616 [unknown] 0 dn_6001_6002_6003 00000  0 [BACKEND] LOG:  [Alarm Module]Cluster Name: gscluster 

2023-07-11 17:14:27.314 64ad1d73.1 [unknown] 139936913943616 [unknown] 0 dn_6001_6002_6003 00000  0 [BACKEND] LOG:  [Alarm Module]Invalid data in AlarmItem file! Read alarm English name failed! line: 57

2023-07-11 17:14:27.316 64ad1d73.1 [unknown] 139936913943616 [unknown] 0 dn_6001_6002_6003 00000  0 [BACKEND] LOG:  loaded library "security_plugin"
2023-07-11 17:14:27.317 64ad1d73.1 [unknown] 139936913943616 [unknown] 0 dn_6001_6002_6003 01000  0 [BACKEND] WARNING:  could not create any HA TCP/IP sockets
2023-07-11 17:14:27.318 64ad1d73.1 [unknown] 139936913943616 [unknown] 0 dn_6001_6002_6003 00000  0 [BACKEND] LOG:  InitNuma numaNodeNum: 1 numa_distribute_mode: none inheritThreadPool: 0.
2023-07-11 17:14:27.318 64ad1d73.1 [unknown] 139936913943616 [unknown] 0 dn_6001_6002_6003 01000  0 [BACKEND] WARNING:  Failed to initialize the memory protect for g_instance.attr.attr_storage.cstore_buffers (16 Mbytes) or shared memory (1000 Mbytes) is larger.
2023-07-11 17:14:27.329 64ad1d73.1 [unknown] 139936913943616 [unknown] 0 dn_6001_6002_6003 00000  0 [CACHE] LOG:  set data cache  size(12582912)
2023-07-11 17:14:27.330 64ad1d73.1 [unknown] 139936913943616 [unknown] 0 dn_6001_6002_6003 00000  0 [CACHE] LOG:  set metadata cache  size(4194304)
2023-07-11 17:14:27.415 64ad1d73.1 [unknown] 139936913943616 [unknown] 0 dn_6001_6002_6003 00000  0 [SEGMENT_PAGE] LOG:  Segment-page constants: DF_MAP_SIZE: 8156, DF_MAP_BIT_CNT: 65248, DF_MAP_GROUP_EXTENTS: 4175872, IPBLOCK_SIZE: 8168, EXTENTS_PER_IPBLOCK: 1021, IPBLOCK_GROUP_SIZE: 4090, BMT_HEADER_LEVEL0_TOTAL_PAGES: 8323072, BktMapEntryNumberPerBlock: 2038, BktMapBlockNumber: 25, BktBitMaxMapCnt: 512
2023-07-11 17:14:27.447 64ad1d73.1 [unknown] 139936913943616 [unknown] 0 dn_6001_6002_6003 00000  0 [BACKEND] LOG:  gaussdb: fsync file "/gauss/data/db1/gaussdb.state.temp" success
2023-07-11 17:14:27.447 64ad1d73.1 [unknown] 139936913943616 [unknown] 0 dn_6001_6002_6003 00000  0 [BACKEND] LOG:  create gaussdb state file success: db state(STARTING_STATE), server mode(Cascade Standby), connection index(1)
2023-07-11 17:14:27.447 64ad1d73.1 [unknown] 139936913943616 [unknown] 0 dn_6001_6002_6003 00000  0 [BACKEND] LOG:  max_safe_fds = 974, usable_fds = 1000, already_open = 16
The core dump path is an invalid directory
.
[2023-07-11 17:14:29.268][73207][][gs_ctl]:  done
[2023-07-11 17:14:29.268][73207][][gs_ctl]: server started (/gauss/data/db1)

重建模式

[omm@Euler3 ~]$ gs_om -t status --detail
[   Cluster State   ]

cluster_state   : Normal
redistributing  : No
current_az      : AZ_ALL

[  Datanode State   ]

    node  node_ip         port      instance                state
---------------------------------------------------------------------------------
1  Euler1 172.16.220.151  26000      6001 /gauss/data/db1   P Primary Normal
2  Euler2 172.16.220.152  26000      6002 /gauss/data/db1   S Standby Normal
3  Euler3 172.16.220.153  26000      6003 /gauss/data/db1   C Cascade Normal

状态正常

[omm@Euler3 ~]$ gs_om -t refreshconf
Generating dynamic configuration file for all nodes.
Successfully generated dynamic configuration file.

保存状态

[omm@Euler1 ~]$ gs_om -t status --detail
[   Cluster State   ]

cluster_state   : Normal
redistributing  : No
current_az      : AZ_ALL

[  Datanode State   ]

    node  node_ip         port      instance                state
---------------------------------------------------------------------------------
1  Euler1 172.16.220.151  26000      6001 /gauss/data/db1   P Primary Normal
2  Euler2 172.16.220.152  26000      6002 /gauss/data/db1   S Standby Normal
3  Euler3 172.16.220.153  26000      6003 /gauss/data/db1   C Cascade Normal
[omm@Euler1 ~]$

登录一节点验证集群状态,一切正常

七、总结

openGauss集群中omm用户的互信很重要,互信出现问题会出现报错现象,openGauss操作对python依赖较为严重,鉴于python不同版本差距较大,向下兼容较差,安装时主机配置python环境。





文章转载自公众号:openGauss

分类
标签
已于2023-8-29 15:36:50修改
收藏
回复
举报
回复
    相关推荐