提示:文章写完后,目录可以自动生成,如何生成可参考右边的帮助文档

文章目录

前言一、DMASMSVR服务启动失败的问题原因二、dsc故障节点清理总结

前言

本人在搭建完毕dsc两节点的集群后,暂且称呼前两台节点为A和B。于是尝试进行动态扩展节点的操作,由于在C节点上启动dmcss服务时出现了一些问题,于是本人将A和B的虚拟机快照还原到了刚开始搭建的DSC两节点环境。但是当我重启dmasmsvr服务时缺报错,报错内容如下 内容是找不到我C节点的ASM2信息,但是快照后我检查了各项配置文件,文件里均只有ASM0跟ASM1的信息,未发现ASM2的信息,那么这个ASM2的信息是从哪来的?为什么我快照还原了后会去读取一个我没有的信息文件。 那么我现在怎么样才能将初始dsc两节点环境复原回来成功启动呢,于是经过我的研究,将此故障问题做个记录,给大家做个参考

[root@czk1 bin]# ./DmASMSvrServicesvr start

Starting DmASMSvrServicesvr: Last login: Mon May 16 11:21:34 CST 2022

[ FAILED ]

instance(ASM2) mal config not found in /opt/dmdbms/data/DAMENG/dmasvrmal.ini

mal cfg sys init error, code:[-9501], desc:[MAL sys has not configured or server is not enterprise version].

然后这是查看dmcssm服务获取到的信息,我们需要做的就是将css2,ASM2,以及dsc2信息清除掉

p: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts

2022-05-16 13:02:21 CSS0 0 9341 Control Node OPEN WORKING OK TRUE 1054675793 1054676229

2022-05-16 13:02:21 CSS1 1 9343 Normal Node OPEN WORKING OK TRUE 1054684694 1054685105

2022-05-16 13:02:21 CSS2 2 9344 Normal Node SHUTDOWN UNKNOWN OK FALSE 0 0

=================== group[name = GRP_ASM, seq = 1, type = ASM, Control Node = 0] ========================================

n_ok_ep = 2

ok_ep_arr(index, seqno):

(0, 0)

(1, 1)

sta = OPEN, sub_sta = STARTUP

break ep = NULL

recover ep = NULL

crash process over flag is TRUE

ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts

2022-05-16 13:02:21 ASM0 0 9349 Control Node OPEN WORKING OK TRUE 1054791798 1054791865

2022-05-16 13:02:21 ASM1 1 9351 Normal Node OPEN WORKING OK TRUE 1054798516 1054798565

2022-05-16 13:02:21 ASM2 2 9352 Normal Node SHUTDOWN UNKNOWN ERROR FALSE 0 0

=================== group[name = GRP_DSC, seq = 2, type = DB, Control Node = 255] ========================================

n_ok_ep = 3

ok_ep_arr(index, seqno):

(0, 0)

(1, 1)

(2, 2)

sta = OPEN, sub_sta = STARTUP

break ep = NULL

recover ep = NULL

crash process over flag is FALSE

ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts

2022-05-16 13:02:21 DSC0 0 5236 Normal Node OPEN WORKING OK FALSE 807630687 807634940

2022-05-16 13:02:21 DSC1 1 5237 Normal Node OPEN WORKING OK FALSE 807542710 807547246

2022-05-16 13:02:21 DSC2 2 5238 Normal Node SHUTDOWN UNKNOWN OK FALSE 0 0

==================================================================================================================

一、DMASMSVR服务启动失败的问题原因

其实快照还原后曝出故障的原因是因为之前我在做动态扩容节点的时候,已经将内容信息记录到了共享磁盘上,于是,当你快照还原后,只是将操作系统层面的文件进行还原了,所以配置文件里的都是正确的,只不过是共享磁盘上的残留信息导致在启动服务的时候,服务会去共享磁盘上检索一下,发现有之前的信息,就进行读取加载,然后就报错了,下面就基于这个问题,将dsc集群中的故障节点信息进行清除

二、dsc故障节点清理

1.首先登录dmasmcmd工具,将你当前DSC集群的DCR盘信息导出到dmdcr_cfg_bak.ini中

[root@czk bin]# ./dmasmcmd

DMASMCMD V8

ASM>export dcrdisk '/dev/raw/raw1' to '/opt/dmdbms/data/DAMENG/dmdcr_cfg_bak.ini'

ASMCMD export DCRDISK success.

Used time: 6.069(ms).

2.登录dmasmtool工具,将之前添加的日志文件内容进行删除 这边DSC2_log01.log跟DSC2_log02.log是我之前做拓展节点的时候新建的日志,这边进行删除(如果你还有归档的配置信息,在tool里记得删除,我这边没配,所以只需要删除日志就行)

[root@czk bin]# ./dmasmtool dcr_ini=/opt/dmdbms/data/DAMENG/dmdcr.ini

DMASMTOOL V8

ASM>ls

file : dsc0_log01.log

file : dsc0_log02.log

file : dsc1_log01.log

file : dsc1_log02.log

file : DSC2_log01.log

file : DSC2_log02.log

total count 6.

Used time: 5.116(ms).

ASM>rm -rf DSC2_log01.log

Used time: 4.959(ms).

ASM>rm -rf DSC2_log02.log

Used time: 5.512(ms).

注意,dmasmtool工具的启动需要保证dmcss跟dmasmsvr服务的正常运行,否则启动工具会报连接异常(因为我本身dmasmsvr服务启动就失败嘛,因为找不到ASM2,所以我是在dmasmsvr.ini文件中新增了一项ASM2的信息,先让服务能够起来)

[root@czk bin]# ./dmasmtool dcr_ini=/opt/dmdbms/data/DAMENG/dmdcr.ini

DMASMTOOL V8

[code : -11041] ASM连接异常

3.关闭所有服务包括数据库、css、svr服务 我这边就直接kill杀掉了

[root@czk bin]# netstat -ntulp

Active Internet connections (only servers)

Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name

tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN 1/systemd

tcp 0 0 192.168.122.1:53 0.0.0.0:* LISTEN 1897/dnsmasq

tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1396/sshd

tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 1772/master

tcp 0 0 127.0.0.1:6010 0.0.0.0:* LISTEN 2138/sshd: root@pts

tcp6 0 0 :::4236 :::* LISTEN 1547/dmap

tcp6 0 0 :::7246 :::* LISTEN 2401/dmasmsvr

tcp6 0 0 :::111 :::* LISTEN 1/systemd

tcp6 0 0 :::80 :::* LISTEN 1405/httpd

tcp6 0 0 :::5236 :::* LISTEN 1542/dmserver

tcp6 0 0 :::22 :::* LISTEN 1396/sshd

tcp6 0 0 ::1:25 :::* LISTEN 1772/master

tcp6 0 0 ::1:6010 :::* LISTEN 2138/sshd: root@pts

tcp6 0 0 :::9341 :::* LISTEN 1554/dmcss

udp 0 0 0.0.0.0:5353 0.0.0.0:* 946/avahi-daemon: r

udp 0 0 0.0.0.0:1830 0.0.0.0:* 1217/dhclient

udp 0 0 0.0.0.0:53266 0.0.0.0:* 946/avahi-daemon: r

udp 0 0 192.168.122.1:53 0.0.0.0:* 1897/dnsmasq

udp 0 0 0.0.0.0:67 0.0.0.0:* 1897/dnsmasq

udp 0 0 0.0.0.0:68 0.0.0.0:* 1217/dhclient

udp 0 0 127.0.0.1:323 0.0.0.0:* 977/chronyd

udp6 0 0 :::25482 :::* 1217/dhclient

udp6 0 0 :::69 :::* 1/systemd

udp6 0 0 ::1:323 :::* 977/chronyd

[root@czk bin]# kill -9 1542 1554 2401

4.修改dmdcr_cfg_bak.ini文件 我们在第一步的时候使用cmd工具导出了一份文件,然后我们在这里进行修改(下图是我已经修改好的)修改的目的是将拓展节点的信息删除,保留源两节点的dsc信息 具体的修改内容为 将所有DCR_GRP_N_EP = 3 修改为 DCR_GRP_N_EP = 2 将所有的DCR_GRP_EP_ARR = {0,1,2} 修改为 DCR_GRP_EP_ARR = {0,1} 将所有拓展节点的信息如CSS2、ASM2、DSC2的节点信息给删除

注意!因为我是快照回来的环境,所以其他配置文件我就没去做修改,如果你们不是快照还原回来的环境,记得把dmmal.ini、dmasmsvr.ini、dmcfg机器上的配置再检查下,看看有没有多出来的节点信息

[dmdba@czk ~]$ cat /opt/dmdbms/data/DAMENG/dmdcr_cfg_bak.ini

# the file is auto-created by system, self edit is invalid!

#DCR HDR

DCR_N_GRP = 3

DCR_VTD_PATH = /dev/raw/raw2

DCR_OGUID = 63635

[GRP]

DCR_GRP_TYPE = CSS

DCR_GRP_NAME = GRP_CSS

DCR_GRP_N_EP = 2

DCR_GRP_EP_ARR = {0,1}

DCR_GRP_N_ERR_EP = 0

DCR_GRP_ERR_EP_ARR = {}

DCR_GRP_DSKCHK_CNT = 60

[GRP]

DCR_GRP_TYPE = ASM

DCR_GRP_NAME = GRP_ASM

DCR_GRP_N_EP = 2

DCR_GRP_EP_ARR = {0,1}

DCR_GRP_N_ERR_EP = 0

DCR_GRP_ERR_EP_ARR = {}

DCR_GRP_DSKCHK_CNT = 60

[GRP]

DCR_GRP_TYPE = DB

DCR_GRP_NAME = GRP_DSC

DCR_GRP_N_EP = 2

DCR_GRP_EP_ARR = {0,1}

DCR_GRP_N_ERR_EP = 0

DCR_GRP_ERR_EP_ARR = {}

DCR_GRP_DSKCHK_CNT = 60

[GRP_CSS]

DCR_EP_NAME = CSS0

DCR_EP_HOST = 192.168.17.133

DCR_EP_PORT = 9341

[GRP_CSS]

DCR_EP_NAME = CSS1

DCR_EP_HOST = 192.168.17.132

DCR_EP_PORT = 9343

[GRP_ASM]

DCR_EP_NAME = ASM0

DCR_EP_SHM_KEY = 93360

DCR_EP_SHM_SIZE = 20

DCR_EP_HOST = 192.168.17.133

DCR_EP_PORT = 9349

DCR_EP_ASM_LOAD_PATH = /dev/raw

[GRP_ASM]

DCR_EP_NAME = ASM1

DCR_EP_SHM_KEY = 93361

DCR_EP_SHM_SIZE = 20

DCR_EP_HOST = 192.168.17.132

DCR_EP_PORT = 9351

DCR_EP_ASM_LOAD_PATH = /dev/raw

[GRP_DSC]

DCR_EP_NAME = DSC0

DCR_EP_SEQNO = 0

DCR_EP_PORT = 5236

DCR_CHECK_PORT = 9741

[GRP_DSC]

DCR_EP_NAME = DSC1

DCR_EP_SEQNO = 1

DCR_EP_PORT = 5237

DCR_CHECK_PORT = 9742

5.将修改后的dmdcr_cfg_bak.ini重新初始化为DCR和VOTE盘 这一步是将信息重新导入到共享磁盘上,记得服务还是关闭状态的否则会提示如下报错

ASM>init dcrdisk '/dev/raw/raw1' from '/opt/dmdbms/data/DAMENG/dmdcr_cfg_bak.ini' identified by 'abcd'

[Trace]DG 126 alloc one extent for inodes, addr(disk_id, disk_auno, extent_no):(0,0,1).

[Trace]DG 126 allocate 4 extents for file 0xfe000002.

[Trace]DG 126 alloc 4 extents for 0xfe000002, addr(disk_id, disk_auno, extent_no):(0, 0, 2)->(0, 0, 5), need_init = 1.

init /dev/raw/raw1 from /opt/dmdbms/data/DAMENG/dmdcr_cfg_bak.ini failed!

[code: -11034], 磁盘[/dev/raw/raw1]正在使用中

初始化为DCR和vote

ASM>init dcrdisk '/dev/raw/raw1' from '/opt/dmdbms/data/DAMENG/dmdcr_cfg_bak.ini' identified by 'abcd'

[Trace]DG 126 alloc one extent for inodes, addr(disk_id, disk_auno, extent_no):(0,0,1).

[Trace]DG 126 allocate 4 extents for file 0xfe000002.

[Trace]DG 126 alloc 4 extents for 0xfe000002, addr(disk_id, disk_auno, extent_no):(0, 0, 2)->(0, 0, 5), need_init = 1.

Used time: 00:00:14.488.

ASM>int votedisk '/dev/raw/raw2' from '/opt/dmdbms/data/DAMENG/dmdcr_cfg_bak.ini'

syntax error

asmcmd parse failed!

ASM>init votedisk '/dev/raw/raw2' from '/opt/dmdbms/data/DAMENG/dmdcr_cfg_bak.ini'

[Trace]DG 125 alloc one extent for inodes, addr(disk_id, disk_auno, extent_no):(0,0,1).

[Trace]DG 125 allocate 4 extents for file 0xfd000002.

[Trace]DG 125 alloc 4 extents for 0xfd000002, addr(disk_id, disk_auno, extent_no):(0, 0, 2)->(0, 0, 5), need_init = 1.

Used time: 00:00:14.568.

6.然后将服务重新启动起来,记得两台依次启动,先启dmcss后起dmasmsvr

[root@czk bin]# ./DmCSSServicecss start

Starting DmCSSServicecss: 上一次登录:一 5月 16 14:13:31 CST 2022

[ OK ]

[root@czk bin]# ./DmASMSvrServicesvr start

Starting DmASMSvrServicesvr: 上一次登录:一 5月 16 14:27:28 CST 2022

[ OK ]

7.通过dmcssm监控器查看下dsc信息,dsc故障节点已经清理掉了,然后成功启动初始dsc集群环境

[root@czk bin]# ./dmcssm ini_path=/opt/dmdbms/data/DAMENG/dmcssm.ini

[monitor] 2022-05-16 14:31:25: CSS MONITOR V8

[monitor] 2022-05-16 14:31:25: CSS MONITOR SYSTEM IS READY.

[monitor] 2022-05-16 14:31:25: Wait CSS Control Node choosed...

[monitor] 2022-05-16 14:31:26: Wait CSS Control Node choosed succeed.

show

monitor current time:2022-05-16 14:31:35, n_group:3

=================== group[name = GRP_CSS, seq = 0, type = CSS, Control Node = 0] ========================================

[CSS0] auto check = TRUE, global info:

[ASM0] auto restart = FALSE

[DSC0] auto restart = FALSE

[CSS1] auto check = TRUE, global info:

[ASM1] auto restart = FALSE

[DSC1] auto restart = FALSE

ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts

2022-05-16 14:31:34 CSS0 0 9341 Control Node OPEN WORKING OK TRUE 1056417364 1056417608

2022-05-16 14:31:34 CSS1 1 9343 Normal Node OPEN WORKING OK TRUE 1056425684 1056425904

=================== group[name = GRP_ASM, seq = 1, type = ASM, Control Node = 0] ========================================

n_ok_ep = 2

ok_ep_arr(index, seqno):

(0, 0)

(1, 1)

sta = OPEN, sub_sta = STARTUP

break ep = NULL

recover ep = NULL

crash process over flag is TRUE

ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts

2022-05-16 14:31:34 ASM0 0 9349 Control Node OPEN WORKING OK TRUE 1056443218 1056443380

2022-05-16 14:31:34 ASM1 1 9351 Normal Node OPEN WORKING OK TRUE 1056449693 1056449837

=================== group[name = GRP_DSC, seq = 2, type = DB, Control Node = 255] ========================================

n_ok_ep = 2

ok_ep_arr(index, seqno):

(0, 0)

(1, 1)

sta = OPEN, sub_sta = STARTUP

break ep = NULL

recover ep = NULL

crash process over flag is FALSE

ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts

2022-05-16 14:31:34 DSC0 0 5236 Normal Node SHUTDOWN WORKING OK FALSE 807630687 807634940

2022-05-16 14:31:34 DSC1 1 5237 Normal Node SHUTDOWN WORKING OK FALSE 807542710 807547246

==================================================================================================================

8.然后我们再次启动dsc的dmserver服务,发现启动依旧有问题 通过查看启动日志发现,它提示DSC2log不存在,说明这部分信息还是没有清理干净

[root@czk log]# cat DmServicedsc.log

file dm.key not found, use default license!

version info: develop

DM Database Server x64 V8 1-2-94-21.11.11-150650-10038-ENT startup...

Normal of FAST

Normal of DEFAULT

Normal of RECYCLE

Normal of KEEP

Normal of ROLL

Database mode = 0, oguid = 0

+DMLOG/log/DSC2_log01.log not exist, can not startup

9.通过dmctlcvt工具将dm.ctl文件转换成文本文件进行编辑

[root@czk bin]# ./dmctlcvt type=1 src=+DMDATA/data/dsc/dm.ctl dest=/opt/dmdbms/data/DAMENG/dmctrl.txt dcr_ini=/opt/dmdbms/data/DAMENG/dmdcr.ini

DMCTLCVT V8

convert ctl to txt success!

然后vim /opt/dmdbms/data/DAMENG/dmctrl.txt 文件,找到DSC2_log01.log部分的内容,将内容DSC2_log01.log和DSC2_log02.log内容进行删除 10.通过dmctlcvt工具将文本文件转化成dm.ctl控制文件

[root@czk bin]# ./dmctlcvt type=2 src=/opt/dmdbms/data/DAMENG/dmctrl.txt dest=+DMDATA/data/dsc/dm.ctl dcr_ini=/opt/dmdbms/data/DAMENG/dmdcr.ini

DMCTLCVT V8

convert txt to ctl success!

11.重新启动dmserver服务,启动成功!

[root@czk bin]# systemctl status DmServicedsc

● DmServicedsc.service - DM Instance Service(DmServicedsc).

Loaded: loaded (/usr/lib/systemd/system/DmServicedsc.service; enabled; vendor preset: disabled)

Active: active (running) since 一 2022-05-16 16:05:29 CST; 9min ago

Process: 4202 ExecStart=/opt/dmdbms/bin/DmServicedsc start (code=exited, status=0/SUCCESS)

Main PID: 4229 (dmserver)

CGroup: /system.slice/DmServicedsc.service

└─4229 /opt/dmdbms/bin/dmserver path=/opt/dmdbms/data/DAMENG/dsc0_config/dm.ini dcr_ini=/opt/dmdbms/data/DAM...

5月 16 16:05:13 czk systemd[1]: Starting DM Instance Service(DmServicedsc)....

5月 16 16:05:14 czk DmServicedsc[4202]: Starting DmServicedsc: connnect dmasmtool successfully.

5月 16 16:05:29 czk DmServicedsc[4202]: [11B blob data]

5月 16 16:05:29 czk systemd[1]: Started DM Instance Service(DmServicedsc)..

总结

如果你还有其他问题,欢迎到达梦社区来提问~ 社区地址:https://eco.dameng.com

文章来源

评论可见,请评论后查看内容,谢谢!!!
 您阅读本篇文章共花了: