故障现象
glusterfs客户端无法连接某个节点的brick,一直出现rpc无法连接
kernel 出现hung_task_timeout_secs超时问题
故障分析 kernel层面分析 kernel出现sync flood,原因是客户端断掉后不断重新尝试导致,后面CallTrace出现 hung_task_timeout_secs关键字,该关键字含义是,异步刷新IO到磁盘,如果在hung_task_timeout_secs时间内IO没有刷完,就会报出这个错误,解决方法需要调整vm.dirty_ratio和vm.dirty_backgroud_ratio参数,两个参数分别控制脏页刷磁盘的比例,如果后端磁盘非常慢,超过操作系统默认的timeout会出现这样的calltrace.建议设置vm.dirty_ratio=5和vm.dirty_backgroud_ratio=5
客户端日志分析 日志中大量报出0-speech_vol_v2-client-55对应的brick无法建立rpc连接(通过网络telnet 主机和端口无异常),从日志来看可能是sync flood导致,客户端无法连接到这个glusterfsd
服务端日志 服务端的日志也很明显出现read/write出现告警
解决办法
调整kernel中 vm.dirty_ratio=5和 vm.dirty_backgroud_ratio=10(该问题仅仅关闭kernel报错)
重启节点的glusterd的进程或者重启节点(sync flood的问题)
glusterfs version
1 2 3 [root@ai-storage-prd-141-165-65-141.v-bj-4.vivo.lan:/root] # gluster --version glusterfs 7.2
1 2 3 4 5 [root@storage-center-prd-10-194-8-133.v-bj-4.vivo.lan:/var/log/glusterfs] # rpm -qa|grep gluster glusterfs-7.2-1.el7.x86_64 glusterfs-fuse-7.2-1.el7.x86_64 glusterfs-libs-7.2-1.el7.x86_64
cluster volume
volume info
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 # gluster volume info speech_vol_v2 Volume Name: speech_vol_v2 Type: Distributed-Replicate Volume ID: e1b28e82-0d0b-4c87-bb6a-45306ec4a568 Status: Started Snapshot Count: 0 Number of Bricks: 24 x 3 = 72 Transport-type: tcp Bricks: Brick1: 141.165.65.141:/data1/brick Brick2: 141.165.65.142:/data1/brick Brick3: 141.165.65.144:/data1/brick Brick4: 141.165.65.141:/data2/brick Brick5: 141.165.65.142:/data2/brick Brick6: 141.165.65.144:/data2/brick Brick7: 141.165.65.141:/data3/brick Brick8: 141.165.65.142:/data3/brick Brick9: 141.165.65.144:/data3/brick Brick10: 141.165.65.141:/data4/brick Brick11: 141.165.65.142:/data4/brick Brick12: 141.165.65.144:/data4/brick Brick13: 141.165.65.141:/data5/brick Brick14: 141.165.65.142:/data5/brick Brick15: 141.165.65.144:/data5/brick Brick16: 141.165.65.141:/data6/brick Brick17: 141.165.65.142:/data6/brick Brick18: 141.165.65.144:/data6/brick Brick19: 141.165.65.141:/data7/brick Brick20: 141.165.65.142:/data7/brick Brick21: 141.165.65.144:/data7/brick Brick22: 141.165.65.141:/data8/brick Brick23: 141.165.65.142:/data8/brick Brick24: 141.165.65.144:/data8/brick Brick25: 141.165.65.141:/data9/brick Brick26: 141.165.65.142:/data9/brick Brick27: 141.165.65.144:/data9/brick Brick28: 141.165.65.141:/data10/brick Brick29: 141.165.65.142:/data10/brick Brick30: 141.165.65.144:/data10/brick Brick31: 141.165.65.141:/data11/brick Brick32: 141.165.65.142:/data11/brick Brick33: 141.165.65.144:/data11/brick Brick34: 141.165.65.141:/data12/brick Brick35: 141.165.65.142:/data12/brick Brick36: 141.165.65.144:/data12/brick Brick37: 141.165.181.146:/data1/brick Brick38: 141.165.181.147:/data1/brick Brick39: 141.165.181.148:/data1/brick Brick40: 141.165.181.146:/data2/brick Brick41: 141.165.181.147:/data2/brick Brick42: 141.165.181.148:/data2/brick Brick43: 141.165.181.146:/data3/brick Brick44: 141.165.181.147:/data3/brick Brick45: 141.165.181.148:/data3/brick Brick46: 141.165.181.146:/data4/brick Brick47: 141.165.181.147:/data4/brick Brick48: 141.165.181.148:/data4/brick Brick49: 141.165.181.146:/data5/brick Brick50: 141.165.181.147:/data5/brick Brick51: 141.165.181.148:/data5/brick Brick52: 141.165.181.146:/data6/brick Brick53: 141.165.181.147:/data6/brick Brick54: 141.165.181.148:/data6/brick Brick55: 141.165.181.146:/data7/brick Brick56: 141.165.181.147:/data7/brick Brick57: 141.165.181.148:/data7/brick Brick58: 141.165.181.146:/data8/brick Brick59: 141.165.181.147:/data8/brick Brick60: 141.165.181.148:/data8/brick Brick61: 141.165.181.146:/data9/brick Brick62: 141.165.181.147:/data9/brick Brick63: 141.165.181.148:/data9/brick Brick64: 141.165.181.146:/data10/brick Brick65: 141.165.181.147:/data10/brick Brick66: 141.165.181.148:/data10/brick Brick67: 141.165.181.146:/data11/brick Brick68: 141.165.181.147:/data11/brick Brick69: 141.165.181.148:/data11/brick Brick70: 141.165.181.146:/data12/brick Brick71: 141.165.181.147:/data12/brick Brick72: 141.165.181.148:/data12/brick Options Reconfigured: network.ping-timeout: 120 cluster.read-hash-mode: 3 storage.fips-mode-rchecksum: on nfs.disable: on
volume status
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 # gluster volume status speech_vol_v2 Status of volume: speech_vol_v2 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 141.165.65.141:/data1/brick 49152 0 Y 16432 Brick 141.165.65.142:/data1/brick 49152 0 Y 16355 Brick 141.165.65.144:/data1/brick 49152 0 Y 32212 Brick 141.165.65.141:/data2/brick 49153 0 Y 20613 Brick 141.165.65.142:/data2/brick 49153 0 Y 16376 Brick 141.165.65.144:/data2/brick 49153 0 Y 32232 Brick 141.165.65.141:/data3/brick 49154 0 Y 20627 Brick 141.165.65.142:/data3/brick 49154 0 Y 16402 Brick 141.165.65.144:/data3/brick 49154 0 Y 32252 Brick 141.165.65.141:/data4/brick 49155 0 Y 10020 Brick 141.165.65.142:/data4/brick 49155 0 Y 16433 Brick 141.165.65.144:/data4/brick 49155 0 Y 32272 Brick 141.165.65.141:/data5/brick 49156 0 Y 10026 Brick 141.165.65.142:/data5/brick 49156 0 Y 16457 Brick 141.165.65.144:/data5/brick 49156 0 Y 32292 Brick 141.165.65.141:/data6/brick 49157 0 Y 20620 Brick 141.165.65.142:/data6/brick 49157 0 Y 16485 Brick 141.165.65.144:/data6/brick 49157 0 Y 32312 Brick 141.165.65.141:/data7/brick 49158 0 Y 10055 Brick 141.165.65.142:/data7/brick 49158 0 Y 16505 Brick 141.165.65.144:/data7/brick 49158 0 Y 32332 Brick 141.165.65.141:/data8/brick 49159 0 Y 15551 Brick 141.165.65.142:/data8/brick 49159 0 Y 16530 Brick 141.165.65.144:/data8/brick 49159 0 Y 32352 Brick 141.165.65.141:/data9/brick 49160 0 Y 10075 Brick 141.165.65.142:/data9/brick 49160 0 Y 16552 Brick 141.165.65.144:/data9/brick 49160 0 Y 32372 Brick 141.165.65.141:/data10/brick 49161 0 Y 10086 Brick 141.165.65.142:/data10/brick 49161 0 Y 16573 Brick 141.165.65.144:/data10/brick 49161 0 Y 32394 Brick 141.165.65.141:/data11/brick 49162 0 Y 10099 Brick 141.165.65.142:/data11/brick 49162 0 Y 16595 Brick 141.165.65.144:/data11/brick 49162 0 Y 32415 Brick 141.165.65.141:/data12/brick 49163 0 Y 10106 Brick 141.165.65.142:/data12/brick 49163 0 Y 16616 Brick 141.165.65.144:/data12/brick 49163 0 Y 32435 Brick 141.165.181.146:/data1/brick 49152 0 Y 177222 Brick 141.165.181.147:/data1/brick 49164 0 Y 215532 Brick 141.165.181.148:/data1/brick 49152 0 Y 182183 Brick 141.165.181.146:/data2/brick 49153 0 Y 177242 Brick 141.165.181.147:/data2/brick 49165 0 Y 215541 Brick 141.165.181.148:/data2/brick 49153 0 Y 182203 Brick 141.165.181.146:/data3/brick 49154 0 Y 177263 Brick 141.165.181.147:/data3/brick 49166 0 Y 215552 Brick 141.165.181.148:/data3/brick 49154 0 Y 182224 Brick 141.165.181.146:/data4/brick 49155 0 Y 177283 Brick 141.165.181.147:/data4/brick 49167 0 Y 215563 Brick 141.165.181.148:/data4/brick 49155 0 Y 182244 Brick 141.165.181.146:/data5/brick 49156 0 Y 177303 Brick 141.165.181.147:/data5/brick 49168 0 Y 215574 Brick 141.165.181.148:/data5/brick 49156 0 Y 182264 Brick 141.165.181.146:/data6/brick 49157 0 Y 177323 Brick 141.165.181.147:/data6/brick 49169 0 Y 215585 Brick 141.165.181.148:/data6/brick 49157 0 Y 182284 Brick 141.165.181.146:/data7/brick 49158 0 Y 177343 Brick 141.165.181.147:/data7/brick 49158 0 Y 176773 Brick 141.165.181.148:/data7/brick 49158 0 Y 182304 Brick 141.165.181.146:/data8/brick 49159 0 Y 177363 Brick 141.165.181.147:/data8/brick 49170 0 Y 215596 Brick 141.165.181.148:/data8/brick 49159 0 Y 182324 Brick 141.165.181.146:/data9/brick 49160 0 Y 177383 Brick 141.165.181.147:/data9/brick 49171 0 Y 215607 Brick 141.165.181.148:/data9/brick 49160 0 Y 182346 Brick 141.165.181.146:/data10/brick 49161 0 Y 177403 Brick 141.165.181.147:/data10/brick 49172 0 Y 215618 Brick 141.165.181.148:/data10/brick 49161 0 Y 182366 Brick 141.165.181.146:/data11/brick 49162 0 Y 177423 Brick 141.165.181.147:/data11/brick 49173 0 Y 215629 Brick 141.165.181.148:/data11/brick 49162 0 Y 182386 Brick 141.165.181.146:/data12/brick 49163 0 Y 177443 Brick 141.165.181.147:/data12/brick 49174 0 Y 215639 Brick 141.165.181.148:/data12/brick 49163 0 Y 182406
客户端无法连接的Brick的信息 1 2 3 4 5 6 7 8 9 10 11 12 13 14 //这些信息存在于/var/log/glusterfs/mnt-{挂载名称}.log中的Final graph: volume speech_vol_v2-client-55 type protocol/client option ping-timeout 120 option remote-host 141.165.181.147 option remote-subvolume /data7/brick option transport-type socket option transport.socket.ssl-enabled off option transport.tcp-user-timeout 0 option transport.socket.keepalive-time 20 option transport.socket.keepalive-interval 2 option transport.socket.keepalive-count 9 option send-gids true end-volume
客户端挂载Debug日志 操作系统kernel日志(/var/log/message) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 May 26 12:20:18 storage-center-prd-141-165-181-147 kernel: [4036004.156726] TCP: request_sock_TCP: Possible SYN flooding on port 24007. Sending cookies. Check SNMP counters. May 26 16:52:13 storage-center-prd-141-165-181-147 systemd[1]: Starting Cleanup of Temporary Directories... May 26 16:52:13 storage-center-prd-141-165-181-147 systemd[1]: Started Cleanup of Temporary Directories. May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.524551] INFO: task kswapd0:266 blocked for more than 120 seconds. May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.531180] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539187] kswapd0 D 0000000007496801 0 266 2 0x00000000 May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539195] ffff881ffcd3b8f0 0000000000000046 ffff881ffeab9fa0 ffff881ffcd3bfd8 May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539198] ffff881ffcd3bfd8 ffff881ffcd3bfd8 ffff881ffeab9fa0 ffff8804e82b4860 May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539200] 7fffffffffffffff ffff8804e82b4858 ffff881ffeab9fa0 0000000007496801 May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539202] Call Trace: May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539216] [<ffffffff816a94c9>] schedule+0x29/0x70 May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539228] [<ffffffff816a6fd9>] schedule_timeout+0x239/0x2c0 May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539235] [<ffffffff812fab24>] ? blk_finish_plug+0x14/0x40 May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539271] [<ffffffffc01f9354>] ? _xfs_buf_ioapply+0x334/0x460 [xfs] May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539273] [<ffffffff816a987d>] wait_for_completion+0xfd/0x140 May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539277] [<ffffffff810c4810>] ? wake_up_state+0x20/0x20 May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539292] [<ffffffffc01fb434>] ? xfs_bwrite+0x24/0x60 [xfs] May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539305] [<ffffffffc01fb026>] xfs_buf_submit_wait+0xe6/0x1d0 [xfs] May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539316] [<ffffffffc01fb434>] xfs_bwrite+0x24/0x60 [xfs] May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539330] [<ffffffffc0203b71>] xfs_reclaim_inode+0x331/0x360 [xfs] May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539346] [<ffffffffc0203e07>] xfs_reclaim_inodes_ag+0x267/0x390 [xfs] May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539348] [<ffffffff810c4593>] ? try_to_wake_up+0x183/0x340 May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539351] [<ffffffff8121d67a>] ? evict+0x10a/0x180 May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539353] [<ffffffff810c4765>] ? wake_up_process+0x15/0x20 May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539366] [<ffffffffc0204df3>] xfs_reclaim_inodes_nr+0x33/0x40 [xfs] May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539380] [<ffffffffc02146f5>] xfs_fs_free_cached_objects+0x15/0x20 [xfs] May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539383] [<ffffffff81203888>] prune_super+0xe8/0x170 May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539387] [<ffffffff81195413>] shrink_slab+0x163/0x330 May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539390] [<ffffffff811f7537>] ? vmpressure+0x87/0x90 May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539392] [<ffffffff81199081>] balance_pgdat+0x4b1/0x5e0 May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539394] [<ffffffff81199323>] kswapd+0x173/0x440 May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539398] [<ffffffff810b1910>] ? wake_up_atomic_t+0x30/0x30 May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539401] [<ffffffff811991b0>] ? balance_pgdat+0x5e0/0x5e0 May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539402] [<ffffffff810b098f>] kthread+0xcf/0xe0 May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539404] [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40 May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539408] [<ffffffff816b4f18>] ret_from_fork+0x58/0x90 May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539409] [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40 May 27 00:32:14 storage-center-prd-141-165-181-147 kernel: [4079817.259262] INFO: task kswapd0:266 blocked for more than 120 seconds.
客户端Mount Debug日志详情 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 [2020-05-27 03:47:34.303569] D [logging.c:1718:__gf_log_inject_timer_event] 0-logging-infra: Starting timer now. Timeout = 120, current buf size = 5 [2020-05-27 03:47:52.040397] D [socket.c:3053:socket_event_handler] 0-transport: EPOLLERR - disconnecting (sock:7) (non-SSL) [2020-05-27 03:47:52.040442] D [MSGID: 0] [client.c:2334:client_rpc_notify] 0-speech_vol_v2-client-55: got RPC_CLNT_DISCONNECT [2020-05-27 03:47:52.040639] D [rpc-clnt-ping.c:96:rpc_clnt_remove_ping_timer_locked] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13a)[0x7fb22321e8ea] (--> /lib64/libgfrpc.so.0(+0x12eeb)[0x7fb222fc9eeb] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x92)[0x7fb222fc5922] (--> /lib64/libgfrpc.so.0(+0xf4e8)[0x7fb222fc64e8] (--> /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fb222fc2a93] ))))) 0-: 141.165.181.147:24007: ping timer event already removed [2020-05-27 03:47:55.040804] D [name.c:171:client_fill_address_family] 0-speech_vol_v2-client-55: address-family not specified, marking it as unspec for getaddrinfo to resolve from (remote-host: 141.165.181.147) [2020-05-27 03:47:55.043219] D [MSGID: 0] [common-utils.c:532:gf_resolve_ip6] 0-resolver: returning ip-141.165.181.147 (port-24007) for hostname: 141.165.181.147 and port: 24007 [2020-05-27 03:47:55.043244] D [socket.c:3384:socket_fix_ssl_opts] 0-speech_vol_v2-client-55: disabling SSL for portmapper connection [2020-05-27 03:47:55.043408] D [MSGID: 0] [client.c:2323:client_rpc_notify] 0-speech_vol_v2-client-55: got RPC_CLNT_CONNECT [2020-05-27 03:47:55.043611] D [rpc-clnt-ping.c:96:rpc_clnt_remove_ping_timer_locked] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13a)[0x7fb22321e8ea] (--> /lib64/libgfrpc.so.0(+0x12eeb)[0x7fb222fc9eeb] (--> /lib64/libgfrpc.so.0(+0x13691)[0x7fb222fca691] (--> /lib64/libgfrpc.so.0(rpc_clnt_submit+0x3b4)[0x7fb222fc6914] (--> /usr/lib64/glusterfs/7.2/xlator/protocol/client.so(+0x135b2)[0x7fb2146675b2] ))))) 0-: 141.165.181.147:24007: ping timer event already removed [2020-05-27 03:47:55.043662] D [MSGID: 0] [client-handshake.c:1399:server_has_portmap] 0-speech_vol_v2-client-55: detected portmapper on server [2020-05-27 03:47:55.043783] D [rpc-clnt-ping.c:195:rpc_clnt_ping_cbk] 0-speech_vol_v2-client-55: Ping latency is 0ms [2020-05-27 03:47:55.043812] I [rpc-clnt.c:1963:rpc_clnt_reconfig] 0-speech_vol_v2-client-55: changing port to 49158 (from 0) [2020-05-27 03:47:55.043844] D [socket.c:3053:socket_event_handler] 0-transport: EPOLLERR - disconnecting (sock:6) (non-SSL) [2020-05-27 03:47:55.043855] D [MSGID: 0] [client.c:2394:client_rpc_notify] 0-speech_vol_v2-client-55: disconnected (skipped notify) [2020-05-27 03:47:55.043876] D [name.c:171:client_fill_address_family] 0-speech_vol_v2-client-55: address-family not specified, marking it as unspec for getaddrinfo to resolve from (remote-host: 141.165.181.147) [2020-05-27 03:49:34.303646] D [logging.c:1756:gf_log_flush_timeout_cbk] 0-logging-infra: Log timer timed out. About to flush outstanding messages if present [2020-05-27 03:47:55.043854] D [MSGID: 0] [client.c:2334:client_rpc_notify] 0-speech_vol_v2-client-55: got RPC_CLNT_DISCONNECT [2020-05-27 03:47:55.046202] D [MSGID: 0] [common-utils.c:532:gf_resolve_ip6] 0-resolver: returning ip-141.165.181.147 (port-24007) for hostname: 141.165.181.147 and port: 24007 [2020-05-27 03:49:34.303898] D [logging.c:1718:__gf_log_inject_timer_event] 0-logging-infra: Starting timer now. Timeout = 120, current buf size = 5 [2020-05-27 03:50:02.344385] D [socket.c:3053:socket_event_handler] 0-transport: EPOLLERR - disconnecting (sock:7) (non-SSL) [2020-05-27 03:50:02.344418] D [MSGID: 0] [client.c:2334:client_rpc_notify] 0-speech_vol_v2-client-55: got RPC_CLNT_DISCONNECT [2020-05-27 03:50:02.344582] D [rpc-clnt-ping.c:96:rpc_clnt_remove_ping_timer_locked] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13a)[0x7fb22321e8ea] (--> /lib64/libgfrpc.so.0(+0x12eeb)[0x7fb222fc9eeb] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x92)[0x7fb222fc5922] (--> /lib64/libgfrpc.so.0(+0xf4e8)[0x7fb222fc64e8] (--> /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fb222fc2a93] ))))) 0-: 141.165.181.147:24007: ping timer event already removed [2020-05-27 03:50:05.344677] D [name.c:171:client_fill_address_family] 0-speech_vol_v2-client-55: address-family not specified, marking it as unspec for getaddrinfo to resolve from (remote-host: 141.165.181.147) [2020-05-27 03:50:05.347099] D [MSGID: 0] [common-utils.c:532:gf_resolve_ip6] 0-resolver: returning ip-141.165.181.147 (port-24007) for hostname: 141.165.181.147 and port: 24007 [2020-05-27 03:50:05.347126] D [socket.c:3384:socket_fix_ssl_opts] 0-speech_vol_v2-client-55: disabling SSL for portmapper connection [2020-05-27 03:50:05.347336] D [MSGID: 0] [client.c:2323:client_rpc_notify] 0-speech_vol_v2-client-55: got RPC_CLNT_CONNECT [2020-05-27 03:50:05.347542] D [rpc-clnt-ping.c:96:rpc_clnt_remove_ping_timer_locked] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13a)[0x7fb22321e8ea] (--> /lib64/libgfrpc.so.0(+0x12eeb)[0x7fb222fc9eeb] (--> /lib64/libgfrpc.so.0(+0x13691)[0x7fb222fca691] (--> /lib64/libgfrpc.so.0(rpc_clnt_submit+0x3b4)[0x7fb222fc6914] (--> /usr/lib64/glusterfs/7.2/xlator/protocol/client.so(+0x135b2)[0x7fb2146675b2] ))))) 0-: 141.165.181.147:24007: ping timer event already removed [2020-05-27 03:50:05.347591] D [MSGID: 0] [client-handshake.c:1399:server_has_portmap] 0-speech_vol_v2-client-55: detected portmapper on server [2020-05-27 03:50:05.347633] D [rpc-clnt-ping.c:195:rpc_clnt_ping_cbk] 0-speech_vol_v2-client-55: Ping latency is 0ms [2020-05-27 03:50:05.347701] I [rpc-clnt.c:1963:rpc_clnt_reconfig] 0-speech_vol_v2-client-55: changing port to 49158 (from 0) [2020-05-27 03:50:05.347735] D [socket.c:3053:socket_event_handler] 0-transport: EPOLLERR - disconnecting (sock:6) (non-SSL) [2020-05-27 03:50:05.347749] D [MSGID: 0] [client.c:2394:client_rpc_notify] 0-speech_vol_v2-client-55: disconnected (skipped notify) [2020-05-27 03:50:05.347765] D [name.c:171:client_fill_address_family] 0-speech_vol_v2-client-55: address-family not specified, marking it as unspec for getaddrinfo to resolve from (remote-host: 141.165.181.147) /0-speech_vol_v2-client-55 [2020-05-27 03:51:34.303977] D [logging.c:1756:gf_log_flush_timeout_cbk] 0-logging-infra: Log timer timed out. About to flush outstanding messages if present [2020-05-27 03:50:05.347748] D [MSGID: 0] [client.c:2334:client_rpc_notify] 0-speech_vol_v2-client-55: got RPC_CLNT_DISCONNECT [2020-05-27 03:50:05.350156] D [MSGID: 0] [common-utils.c:532:gf_resolve_ip6] 0-resolver: returning ip-141.165.181.147 (port-24007) for hostname: 141.165.181.147 and port: 24007 [2020-05-27 03:51:34.304035] D [logging.c:1718:__gf_log_inject_timer_event] 0-logging-infra: Starting timer now. Timeout = 120, current buf size = 5 [2020-05-27 03:52:12.648388] D [socket.c:3053:socket_event_handler] 0-transport: EPOLLERR - disconnecting (sock:7) (non-SSL) [2020-05-27 03:52:12.648424] D [MSGID: 0] [client.c:2334:client_rpc_notify] 0-speech_vol_v2-client-55: got RPC_CLNT_DISCONNECT [2020-05-27 03:52:12.648591] D [rpc-clnt-ping.c:96:rpc_clnt_remove_ping_timer_locked] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13a)[0x7fb22321e8ea] (--> /lib64/libgfrpc.so.0(+0x12eeb)[0x7fb222fc9eeb] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x92)[0x7fb222fc5922] (--> /lib64/libgfrpc.so.0(+0xf4e8)[0x7fb222fc64e8] (--> /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fb222fc2a93] ))))) 0-: 141.165.181.147:24007: ping timer event already removed [2020-05-27 03:52:15.648683] D [name.c:171:client_fill_address_family] 0-speech_vol_v2-client-55: address-family not specified, marking it as unspec for getaddrinfo to resolve from (remote-host: 141.165.181.147) [2020-05-27 03:52:15.651027] D [MSGID: 0] [common-utils.c:532:gf_resolve_ip6] 0-resolver: returning ip-141.165.181.147 (port-24007) for hostname: 141.165.181.147 and port: 24007 [2020-05-27 03:52:15.651052] D [socket.c:3384:socket_fix_ssl_opts] 0-speech_vol_v2-client-55: disabling SSL for portmapper connection [2020-05-27 03:52:15.651271] D [MSGID: 0] [client.c:2323:client_rpc_notify] 0-speech_vol_v2-client-55: got RPC_CLNT_CONNECT [2020-05-27 03:52:15.651501] D [rpc-clnt-ping.c:96:rpc_clnt_remove_ping_timer_locked] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13a)[0x7fb22321e8ea] (--> /lib64/libgfrpc.so.0(+0x12eeb)[0x7fb222fc9eeb] (--> /lib64/libgfrpc.so.0(+0x13691)[0x7fb222fca691] (--> /lib64/libgfrpc.so.0(rpc_clnt_submit+0x3b4)[0x7fb222fc6914] (--> /usr/lib64/glusterfs/7.2/xlator/protocol/client.so(+0x135b2)[0x7fb2146675b2] ))))) 0-: 141.165.181.147:24007: ping timer event already removed [2020-05-27 03:52:15.651557] D [MSGID: 0] [client-handshake.c:1399:server_has_portmap] 0-speech_vol_v2-client-55: detected portmapper on server [2020-05-27 03:52:15.651693] D [rpc-clnt-ping.c:195:rpc_clnt_ping_cbk] 0-speech_vol_v2-client-55: Ping latency is 0ms [2020-05-27 03:52:15.651734] I [rpc-clnt.c:1963:rpc_clnt_reconfig] 0-speech_vol_v2-client-55: changing port to 49158 (from 0) [2020-05-27 03:52:15.651758] D [socket.c:3053:socket_event_handler] 0-transport: EPOLLERR - disconnecting (sock:6) (non-SSL) [2020-05-27 03:52:15.651770] D [MSGID: 0] [client.c:2394:client_rpc_notify] 0-speech_vol_v2-client-55: disconnected (skipped notify) [2020-05-27 03:52:15.651785] D [name.c:171:client_fill_address_family] 0-speech_vol_v2-client-55: address-family not specified, marking it as unspec for getaddrinfo to resolve from (remote-host: 141.165.181.147) [2020-05-27 03:53:34.304123] D [logging.c:1756:gf_log_flush_timeout_cbk] 0-logging-infra: Log timer timed out. About to flush outstanding messages if present [2020-05-27 03:52:15.651768] D [MSGID: 0] [client.c:2334:client_rpc_notify] 0-speech_vol_v2-client-55: got RPC_CLNT_DISCONNECT [2020-05-27 03:52:15.654154] D [MSGID: 0] [common-utils.c:532:gf_resolve_ip6] 0-resolver: returning ip-141.165.181.147 (port-24007) for hostname: 141.165.181.147 and port: 24007 [2020-05-27 03:53:34.304179] D [logging.c:1718:__gf_log_inject_timer_event] 0-logging-infra: Starting timer now. Timeout = 120, current buf size = 5
0-speech_vol_v2-client-55对应服务端节点(147下的/data7/brick)日志 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 [2020-05-27 01:02:22.421765] W [socket.c:774:__socket_rwv] 0-tcp.speech_vol_v2-server: readv on 141.165.32.25:47645 failed (No data available) [2020-05-27 01:02:22.421795] I [MSGID: 115036] [server.c:501:server_rpc_notify] 0-speech_vol_v2-server: disconnecting connection from CTX_ID:de438bc8-1b52-446c-817f-60d646af0ba6-GRAPH_ID:2-PID:84971-HOST:ai-vtraining-prd-141-165-32-25.v-bj-4.vivo.lan-PC_NAME:speech_vol_v2-client-55-RECON_NO:-0 [2020-05-27 01:02:26.918776] W [socket.c:774:__socket_rwv] 0-tcp.speech_vol_v2-server: readv on 141.165.32.25:47562 failed (No data available) [2020-05-27 01:02:26.918807] I [MSGID: 115036] [server.c:501:server_rpc_notify] 0-speech_vol_v2-server: disconnecting connection from CTX_ID:36965041-2046-4413-88a4-08bc06fc1de4-GRAPH_ID:2-PID:87591-HOST:ai-vtraining-prd-141-165-32-25.v-bj-4.vivo.lan-PC_NAME:speech_vol_v2-client-55-RECON_NO:-0 [2020-05-27 01:03:13.270262] W [socket.c:774:__socket_rwv] 0-tcp.speech_vol_v2-server: readv on 141.165.86.129:48212 failed (No data available) [2020-05-27 01:03:13.270295] I [MSGID: 115036] [server.c:501:server_rpc_notify] 0-speech_vol_v2-server: disconnecting connection from CTX_ID:2dafdf38-ff96-45cc-99c2-1238764b59ae-GRAPH_ID:0-PID:75668-HOST:ai-vtraining-prd-141-165-86-129.v-bj-4.vivo.lan-PC_NAME:speech_vol_v2-client-55-RECON_NO:-0 [2020-05-27 01:03:13.285400] W [socket.c:774:__socket_rwv] 0-tcp.speech_vol_v2-server: readv on 141.165.86.129:48162 failed (No data available) [2020-05-27 01:03:13.285422] I [MSGID: 115036] [server.c:501:server_rpc_notify] 0-speech_vol_v2-server: disconnecting connection from CTX_ID:4251ffc2-e1c5-4357-a5d2-7c92e8221fad-GRAPH_ID:0-PID:75625-HOST:ai-vtraining-prd-141-165-86-129.v-bj-4.vivo.lan-PC_NAME:speech_vol_v2-client-55-RECON_NO:-0 [2020-05-27 01:36:05.410946] W [socket.c:774:__socket_rwv] 0-tcp.speech_vol_v2-server: readv on 10.196.20.133:47621 failed (No data available) [2020-05-27 01:36:05.410974] I [MSGID: 115036] [server.c:501:server_rpc_notify] 0-speech_vol_v2-server: disconnecting connection from CTX_ID:57fbf79a-bae3-4bcd-8670-70d559956cac-GRAPH_ID:0-PID:15841-HOST:ai-vtraining-gpu-prd-10-196-20-133.v-bj-4.vivo.lan-PC_NAME:speech_vol_v2-client-55-RECON_NO:-0 [2020-05-27 01:36:06.056913] W [socket.c:774:__socket_rwv] 0-tcp.speech_vol_v2-server: readv on 10.196.20.133:47331 failed (No data available) [2020-05-27 01:36:06.056941] I [MSGID: 115036] [server.c:501:server_rpc_notify] 0-speech_vol_v2-server: disconnecting connection from CTX_ID:b68c4c8e-e8d8-4d44-97a3-a503791d6177-GRAPH_ID:0-PID:16094-HOST:ai-vtraining-gpu-prd-10-196-20-133.v-bj-4.vivo.lan-PC_NAME:speech_vol_v2-client-55-RECON_NO:-0 [2020-05-27 02:10:34.719707] W [socket.c:774:__socket_rwv] 0-tcp.speech_vol_v2-server: readv on 141.165.30.45:48603 failed (No data available) [2020-05-27 02:10:34.719738] I [MSGID: 115036] [server.c:501:server_rpc_notify] 0-speech_vol_v2-server: disconnecting connection from CTX_ID:0ed18a02-3363-4cba-9313-93f7b221ede4-GRAPH_ID:10-PID:6869-HOST:ai-vtraining-prd-141-165-30-45.v-bj-4.vivo.lan-PC_NAME:speech_vol_v2-client-55-RECON_NO:-0 [2020-05-27 02:13:50.989206] I [addr.c:54:compare_addr_and_update] 0-/data7/brick: allowed = "*", received addr = "141.165.30.45" [2020-05-27 02:13:50.989234] I [MSGID: 115029] [server-handshake.c:549:server_setvolume] 0-speech_vol_v2-server: accepted client from CTX_ID:a327610d-0acf-4785-a912-17e87c126ba9-GRAPH_ID:0-PID:67045-HOST:ai-vtraining-prd-141-165-30-45.v-bj-4.vivo.lan-PC_NAME:speech_vol_v2-client-55-RECON_NO:-0 (version: 6.5) with subvol /data7/brick [2020-05-27 02:17:13.967294] W [socket.c:774:__socket_rwv] 0-tcp.speech_vol_v2-server: readv on 141.165.30.45:48606 failed (No data available) [2020-05-27 02:39:55.104419] W [glusterfsd.c:1596:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f9d087f1dd5] -->/usr/sbin/glusterfsd(glusterfs_sigwaiter+0xe5) [0x55f6769bb625] -->/usr/sbin/glusterfsd(cleanup_and_exit+0x6b) [0x55f6769bb48b] ) 0-: received signum (15), shutting down [2020-05-27 02:39:55.104502] W [socket.c:774:__socket_rwv] 0-glusterfs: writev on 141.165.181.147:24007 failed (Broken pipe)