perrynzhou

专注于系统组件研发

0%

glusterfs client客戶端出现RPC_CLNT_DISCONNECT问题和kernel出现task xxx blocked for more than 120 seconds

故障现象

  • glusterfs客户端无法连接某个节点的brick,一直出现rpc无法连接
  • kernel 出现hung_task_timeout_secs超时问题

故障分析

kernel层面分析
kernel出现sync flood,原因是客户端断掉后不断重新尝试导致,后面CallTrace出现 hung_task_timeout_secs关键字,该关键字含义是,异步刷新IO到磁盘,如果在hung_task_timeout_secs时间内IO没有刷完,就会报出这个错误,解决方法需要调整vm.dirty_ratio和vm.dirty_backgroud_ratio参数,两个参数分别控制脏页刷磁盘的比例,如果后端磁盘非常慢,超过操作系统默认的timeout会出现这样的calltrace.建议设置vm.dirty_ratio=5和vm.dirty_backgroud_ratio=5

客户端日志分析
日志中大量报出0-speech_vol_v2-client-55对应的brick无法建立rpc连接(通过网络telnet 主机和端口无异常),从日志来看可能是sync flood导致,客户端无法连接到这个glusterfsd

服务端日志
服务端的日志也很明显出现read/write出现告警

解决办法

  • 调整kernel中 vm.dirty_ratio=5和 vm.dirty_backgroud_ratio=10(该问题仅仅关闭kernel报错)
  • 重启节点的glusterd的进程或者重启节点(sync flood的问题)

glusterfs version

  • 服务端版本
1
2
3
[root@ai-storage-prd-141-165-65-141.v-bj-4.vivo.lan:/root]
# gluster --version
glusterfs 7.2
  • 客户端版本
1
2
3
4
5
[root@storage-center-prd-10-194-8-133.v-bj-4.vivo.lan:/var/log/glusterfs]
# rpm -qa|grep gluster
glusterfs-7.2-1.el7.x86_64
glusterfs-fuse-7.2-1.el7.x86_64
glusterfs-libs-7.2-1.el7.x86_64

cluster volume

  • volume info

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    # gluster volume info  speech_vol_v2     

    Volume Name: speech_vol_v2
    Type: Distributed-Replicate
    Volume ID: e1b28e82-0d0b-4c87-bb6a-45306ec4a568
    Status: Started
    Snapshot Count: 0
    Number of Bricks: 24 x 3 = 72
    Transport-type: tcp
    Bricks:
    Brick1: 141.165.65.141:/data1/brick
    Brick2: 141.165.65.142:/data1/brick
    Brick3: 141.165.65.144:/data1/brick
    Brick4: 141.165.65.141:/data2/brick
    Brick5: 141.165.65.142:/data2/brick
    Brick6: 141.165.65.144:/data2/brick
    Brick7: 141.165.65.141:/data3/brick
    Brick8: 141.165.65.142:/data3/brick
    Brick9: 141.165.65.144:/data3/brick
    Brick10: 141.165.65.141:/data4/brick
    Brick11: 141.165.65.142:/data4/brick
    Brick12: 141.165.65.144:/data4/brick
    Brick13: 141.165.65.141:/data5/brick
    Brick14: 141.165.65.142:/data5/brick
    Brick15: 141.165.65.144:/data5/brick
    Brick16: 141.165.65.141:/data6/brick
    Brick17: 141.165.65.142:/data6/brick
    Brick18: 141.165.65.144:/data6/brick
    Brick19: 141.165.65.141:/data7/brick
    Brick20: 141.165.65.142:/data7/brick
    Brick21: 141.165.65.144:/data7/brick
    Brick22: 141.165.65.141:/data8/brick
    Brick23: 141.165.65.142:/data8/brick
    Brick24: 141.165.65.144:/data8/brick
    Brick25: 141.165.65.141:/data9/brick
    Brick26: 141.165.65.142:/data9/brick
    Brick27: 141.165.65.144:/data9/brick
    Brick28: 141.165.65.141:/data10/brick
    Brick29: 141.165.65.142:/data10/brick
    Brick30: 141.165.65.144:/data10/brick
    Brick31: 141.165.65.141:/data11/brick
    Brick32: 141.165.65.142:/data11/brick
    Brick33: 141.165.65.144:/data11/brick
    Brick34: 141.165.65.141:/data12/brick
    Brick35: 141.165.65.142:/data12/brick
    Brick36: 141.165.65.144:/data12/brick
    Brick37: 141.165.181.146:/data1/brick
    Brick38: 141.165.181.147:/data1/brick
    Brick39: 141.165.181.148:/data1/brick
    Brick40: 141.165.181.146:/data2/brick
    Brick41: 141.165.181.147:/data2/brick
    Brick42: 141.165.181.148:/data2/brick
    Brick43: 141.165.181.146:/data3/brick
    Brick44: 141.165.181.147:/data3/brick
    Brick45: 141.165.181.148:/data3/brick
    Brick46: 141.165.181.146:/data4/brick
    Brick47: 141.165.181.147:/data4/brick
    Brick48: 141.165.181.148:/data4/brick
    Brick49: 141.165.181.146:/data5/brick
    Brick50: 141.165.181.147:/data5/brick
    Brick51: 141.165.181.148:/data5/brick
    Brick52: 141.165.181.146:/data6/brick
    Brick53: 141.165.181.147:/data6/brick
    Brick54: 141.165.181.148:/data6/brick
    Brick55: 141.165.181.146:/data7/brick
    Brick56: 141.165.181.147:/data7/brick
    Brick57: 141.165.181.148:/data7/brick
    Brick58: 141.165.181.146:/data8/brick
    Brick59: 141.165.181.147:/data8/brick
    Brick60: 141.165.181.148:/data8/brick
    Brick61: 141.165.181.146:/data9/brick
    Brick62: 141.165.181.147:/data9/brick
    Brick63: 141.165.181.148:/data9/brick
    Brick64: 141.165.181.146:/data10/brick
    Brick65: 141.165.181.147:/data10/brick
    Brick66: 141.165.181.148:/data10/brick
    Brick67: 141.165.181.146:/data11/brick
    Brick68: 141.165.181.147:/data11/brick
    Brick69: 141.165.181.148:/data11/brick
    Brick70: 141.165.181.146:/data12/brick
    Brick71: 141.165.181.147:/data12/brick
    Brick72: 141.165.181.148:/data12/brick
    Options Reconfigured:
    network.ping-timeout: 120
    cluster.read-hash-mode: 3
    storage.fips-mode-rchecksum: on
    nfs.disable: on
  • volume status

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
# gluster volume status   speech_vol_v2     
Status of volume: speech_vol_v2
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 141.165.65.141:/data1/brick 49152 0 Y 16432
Brick 141.165.65.142:/data1/brick 49152 0 Y 16355
Brick 141.165.65.144:/data1/brick 49152 0 Y 32212
Brick 141.165.65.141:/data2/brick 49153 0 Y 20613
Brick 141.165.65.142:/data2/brick 49153 0 Y 16376
Brick 141.165.65.144:/data2/brick 49153 0 Y 32232
Brick 141.165.65.141:/data3/brick 49154 0 Y 20627
Brick 141.165.65.142:/data3/brick 49154 0 Y 16402
Brick 141.165.65.144:/data3/brick 49154 0 Y 32252
Brick 141.165.65.141:/data4/brick 49155 0 Y 10020
Brick 141.165.65.142:/data4/brick 49155 0 Y 16433
Brick 141.165.65.144:/data4/brick 49155 0 Y 32272
Brick 141.165.65.141:/data5/brick 49156 0 Y 10026
Brick 141.165.65.142:/data5/brick 49156 0 Y 16457
Brick 141.165.65.144:/data5/brick 49156 0 Y 32292
Brick 141.165.65.141:/data6/brick 49157 0 Y 20620
Brick 141.165.65.142:/data6/brick 49157 0 Y 16485
Brick 141.165.65.144:/data6/brick 49157 0 Y 32312
Brick 141.165.65.141:/data7/brick 49158 0 Y 10055
Brick 141.165.65.142:/data7/brick 49158 0 Y 16505
Brick 141.165.65.144:/data7/brick 49158 0 Y 32332
Brick 141.165.65.141:/data8/brick 49159 0 Y 15551
Brick 141.165.65.142:/data8/brick 49159 0 Y 16530
Brick 141.165.65.144:/data8/brick 49159 0 Y 32352
Brick 141.165.65.141:/data9/brick 49160 0 Y 10075
Brick 141.165.65.142:/data9/brick 49160 0 Y 16552
Brick 141.165.65.144:/data9/brick 49160 0 Y 32372
Brick 141.165.65.141:/data10/brick 49161 0 Y 10086
Brick 141.165.65.142:/data10/brick 49161 0 Y 16573
Brick 141.165.65.144:/data10/brick 49161 0 Y 32394
Brick 141.165.65.141:/data11/brick 49162 0 Y 10099
Brick 141.165.65.142:/data11/brick 49162 0 Y 16595
Brick 141.165.65.144:/data11/brick 49162 0 Y 32415
Brick 141.165.65.141:/data12/brick 49163 0 Y 10106
Brick 141.165.65.142:/data12/brick 49163 0 Y 16616
Brick 141.165.65.144:/data12/brick 49163 0 Y 32435
Brick 141.165.181.146:/data1/brick 49152 0 Y 177222
Brick 141.165.181.147:/data1/brick 49164 0 Y 215532
Brick 141.165.181.148:/data1/brick 49152 0 Y 182183
Brick 141.165.181.146:/data2/brick 49153 0 Y 177242
Brick 141.165.181.147:/data2/brick 49165 0 Y 215541
Brick 141.165.181.148:/data2/brick 49153 0 Y 182203
Brick 141.165.181.146:/data3/brick 49154 0 Y 177263
Brick 141.165.181.147:/data3/brick 49166 0 Y 215552
Brick 141.165.181.148:/data3/brick 49154 0 Y 182224
Brick 141.165.181.146:/data4/brick 49155 0 Y 177283
Brick 141.165.181.147:/data4/brick 49167 0 Y 215563
Brick 141.165.181.148:/data4/brick 49155 0 Y 182244
Brick 141.165.181.146:/data5/brick 49156 0 Y 177303
Brick 141.165.181.147:/data5/brick 49168 0 Y 215574
Brick 141.165.181.148:/data5/brick 49156 0 Y 182264
Brick 141.165.181.146:/data6/brick 49157 0 Y 177323
Brick 141.165.181.147:/data6/brick 49169 0 Y 215585
Brick 141.165.181.148:/data6/brick 49157 0 Y 182284
Brick 141.165.181.146:/data7/brick 49158 0 Y 177343
Brick 141.165.181.147:/data7/brick 49158 0 Y 176773
Brick 141.165.181.148:/data7/brick 49158 0 Y 182304
Brick 141.165.181.146:/data8/brick 49159 0 Y 177363
Brick 141.165.181.147:/data8/brick 49170 0 Y 215596
Brick 141.165.181.148:/data8/brick 49159 0 Y 182324
Brick 141.165.181.146:/data9/brick 49160 0 Y 177383
Brick 141.165.181.147:/data9/brick 49171 0 Y 215607
Brick 141.165.181.148:/data9/brick 49160 0 Y 182346
Brick 141.165.181.146:/data10/brick 49161 0 Y 177403
Brick 141.165.181.147:/data10/brick 49172 0 Y 215618
Brick 141.165.181.148:/data10/brick 49161 0 Y 182366
Brick 141.165.181.146:/data11/brick 49162 0 Y 177423
Brick 141.165.181.147:/data11/brick 49173 0 Y 215629
Brick 141.165.181.148:/data11/brick 49162 0 Y 182386
Brick 141.165.181.146:/data12/brick 49163 0 Y 177443
Brick 141.165.181.147:/data12/brick 49174 0 Y 215639
Brick 141.165.181.148:/data12/brick 49163 0 Y 182406

客户端无法连接的Brick的信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
//这些信息存在于/var/log/glusterfs/mnt-{挂载名称}.log中的Final graph:
volume speech_vol_v2-client-55
type protocol/client
option ping-timeout 120
option remote-host 141.165.181.147
option remote-subvolume /data7/brick
option transport-type socket
option transport.socket.ssl-enabled off
option transport.tcp-user-timeout 0
option transport.socket.keepalive-time 20
option transport.socket.keepalive-interval 2
option transport.socket.keepalive-count 9
option send-gids true
end-volume

客户端挂载Debug日志

操作系统kernel日志(/var/log/message)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
May 26 12:20:18 storage-center-prd-141-165-181-147 kernel: [4036004.156726] TCP: request_sock_TCP: Possible SYN flooding on port 24007. Sending cookies.  Check SNMP counters.
May 26 16:52:13 storage-center-prd-141-165-181-147 systemd[1]: Starting Cleanup of Temporary Directories...
May 26 16:52:13 storage-center-prd-141-165-181-147 systemd[1]: Started Cleanup of Temporary Directories.
May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.524551] INFO: task kswapd0:266 blocked for more than 120 seconds.
May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.531180] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539187] kswapd0 D 0000000007496801 0 266 2 0x00000000
May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539195] ffff881ffcd3b8f0 0000000000000046 ffff881ffeab9fa0 ffff881ffcd3bfd8
May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539198] ffff881ffcd3bfd8 ffff881ffcd3bfd8 ffff881ffeab9fa0 ffff8804e82b4860
May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539200] 7fffffffffffffff ffff8804e82b4858 ffff881ffeab9fa0 0000000007496801
May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539202] Call Trace:
May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539216] [<ffffffff816a94c9>] schedule+0x29/0x70
May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539228] [<ffffffff816a6fd9>] schedule_timeout+0x239/0x2c0
May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539235] [<ffffffff812fab24>] ? blk_finish_plug+0x14/0x40
May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539271] [<ffffffffc01f9354>] ? _xfs_buf_ioapply+0x334/0x460 [xfs]
May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539273] [<ffffffff816a987d>] wait_for_completion+0xfd/0x140
May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539277] [<ffffffff810c4810>] ? wake_up_state+0x20/0x20
May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539292] [<ffffffffc01fb434>] ? xfs_bwrite+0x24/0x60 [xfs]
May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539305] [<ffffffffc01fb026>] xfs_buf_submit_wait+0xe6/0x1d0 [xfs]
May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539316] [<ffffffffc01fb434>] xfs_bwrite+0x24/0x60 [xfs]
May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539330] [<ffffffffc0203b71>] xfs_reclaim_inode+0x331/0x360 [xfs]
May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539346] [<ffffffffc0203e07>] xfs_reclaim_inodes_ag+0x267/0x390 [xfs]
May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539348] [<ffffffff810c4593>] ? try_to_wake_up+0x183/0x340
May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539351] [<ffffffff8121d67a>] ? evict+0x10a/0x180
May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539353] [<ffffffff810c4765>] ? wake_up_process+0x15/0x20
May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539366] [<ffffffffc0204df3>] xfs_reclaim_inodes_nr+0x33/0x40 [xfs]
May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539380] [<ffffffffc02146f5>] xfs_fs_free_cached_objects+0x15/0x20 [xfs]
May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539383] [<ffffffff81203888>] prune_super+0xe8/0x170
May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539387] [<ffffffff81195413>] shrink_slab+0x163/0x330
May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539390] [<ffffffff811f7537>] ? vmpressure+0x87/0x90
May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539392] [<ffffffff81199081>] balance_pgdat+0x4b1/0x5e0
May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539394] [<ffffffff81199323>] kswapd+0x173/0x440
May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539398] [<ffffffff810b1910>] ? wake_up_atomic_t+0x30/0x30
May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539401] [<ffffffff811991b0>] ? balance_pgdat+0x5e0/0x5e0
May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539402] [<ffffffff810b098f>] kthread+0xcf/0xe0
May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539404] [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539408] [<ffffffff816b4f18>] ret_from_fork+0x58/0x90
May 27 00:30:14 storage-center-prd-141-165-181-147 kernel: [4079697.539409] [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
May 27 00:32:14 storage-center-prd-141-165-181-147 kernel: [4079817.259262] INFO: task kswapd0:266 blocked for more than 120 seconds.
客户端Mount Debug日志详情
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
[2020-05-27 03:47:34.303569] D [logging.c:1718:__gf_log_inject_timer_event] 0-logging-infra: Starting timer now. Timeout = 120, current buf size = 5
[2020-05-27 03:47:52.040397] D [socket.c:3053:socket_event_handler] 0-transport: EPOLLERR - disconnecting (sock:7) (non-SSL)
[2020-05-27 03:47:52.040442] D [MSGID: 0] [client.c:2334:client_rpc_notify] 0-speech_vol_v2-client-55: got RPC_CLNT_DISCONNECT
[2020-05-27 03:47:52.040639] D [rpc-clnt-ping.c:96:rpc_clnt_remove_ping_timer_locked] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13a)[0x7fb22321e8ea] (--> /lib64/libgfrpc.so.0(+0x12eeb)[0x7fb222fc9eeb] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x92)[0x7fb222fc5922] (--> /lib64/libgfrpc.so.0(+0xf4e8)[0x7fb222fc64e8] (--> /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fb222fc2a93] ))))) 0-: 141.165.181.147:24007: ping timer event already removed
[2020-05-27 03:47:55.040804] D [name.c:171:client_fill_address_family] 0-speech_vol_v2-client-55: address-family not specified, marking it as unspec for getaddrinfo to resolve from (remote-host: 141.165.181.147)
[2020-05-27 03:47:55.043219] D [MSGID: 0] [common-utils.c:532:gf_resolve_ip6] 0-resolver: returning ip-141.165.181.147 (port-24007) for hostname: 141.165.181.147 and port: 24007
[2020-05-27 03:47:55.043244] D [socket.c:3384:socket_fix_ssl_opts] 0-speech_vol_v2-client-55: disabling SSL for portmapper connection
[2020-05-27 03:47:55.043408] D [MSGID: 0] [client.c:2323:client_rpc_notify] 0-speech_vol_v2-client-55: got RPC_CLNT_CONNECT
[2020-05-27 03:47:55.043611] D [rpc-clnt-ping.c:96:rpc_clnt_remove_ping_timer_locked] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13a)[0x7fb22321e8ea] (--> /lib64/libgfrpc.so.0(+0x12eeb)[0x7fb222fc9eeb] (--> /lib64/libgfrpc.so.0(+0x13691)[0x7fb222fca691] (--> /lib64/libgfrpc.so.0(rpc_clnt_submit+0x3b4)[0x7fb222fc6914] (--> /usr/lib64/glusterfs/7.2/xlator/protocol/client.so(+0x135b2)[0x7fb2146675b2] ))))) 0-: 141.165.181.147:24007: ping timer event already removed
[2020-05-27 03:47:55.043662] D [MSGID: 0] [client-handshake.c:1399:server_has_portmap] 0-speech_vol_v2-client-55: detected portmapper on server
[2020-05-27 03:47:55.043783] D [rpc-clnt-ping.c:195:rpc_clnt_ping_cbk] 0-speech_vol_v2-client-55: Ping latency is 0ms
[2020-05-27 03:47:55.043812] I [rpc-clnt.c:1963:rpc_clnt_reconfig] 0-speech_vol_v2-client-55: changing port to 49158 (from 0)
[2020-05-27 03:47:55.043844] D [socket.c:3053:socket_event_handler] 0-transport: EPOLLERR - disconnecting (sock:6) (non-SSL)
[2020-05-27 03:47:55.043855] D [MSGID: 0] [client.c:2394:client_rpc_notify] 0-speech_vol_v2-client-55: disconnected (skipped notify)
[2020-05-27 03:47:55.043876] D [name.c:171:client_fill_address_family] 0-speech_vol_v2-client-55: address-family not specified, marking it as unspec for getaddrinfo to resolve from (remote-host: 141.165.181.147)
[2020-05-27 03:49:34.303646] D [logging.c:1756:gf_log_flush_timeout_cbk] 0-logging-infra: Log timer timed out. About to flush outstanding messages if present
[2020-05-27 03:47:55.043854] D [MSGID: 0] [client.c:2334:client_rpc_notify] 0-speech_vol_v2-client-55: got RPC_CLNT_DISCONNECT
[2020-05-27 03:47:55.046202] D [MSGID: 0] [common-utils.c:532:gf_resolve_ip6] 0-resolver: returning ip-141.165.181.147 (port-24007) for hostname: 141.165.181.147 and port: 24007
[2020-05-27 03:49:34.303898] D [logging.c:1718:__gf_log_inject_timer_event] 0-logging-infra: Starting timer now. Timeout = 120, current buf size = 5
[2020-05-27 03:50:02.344385] D [socket.c:3053:socket_event_handler] 0-transport: EPOLLERR - disconnecting (sock:7) (non-SSL)
[2020-05-27 03:50:02.344418] D [MSGID: 0] [client.c:2334:client_rpc_notify] 0-speech_vol_v2-client-55: got RPC_CLNT_DISCONNECT
[2020-05-27 03:50:02.344582] D [rpc-clnt-ping.c:96:rpc_clnt_remove_ping_timer_locked] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13a)[0x7fb22321e8ea] (--> /lib64/libgfrpc.so.0(+0x12eeb)[0x7fb222fc9eeb] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x92)[0x7fb222fc5922] (--> /lib64/libgfrpc.so.0(+0xf4e8)[0x7fb222fc64e8] (--> /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fb222fc2a93] ))))) 0-: 141.165.181.147:24007: ping timer event already removed
[2020-05-27 03:50:05.344677] D [name.c:171:client_fill_address_family] 0-speech_vol_v2-client-55: address-family not specified, marking it as unspec for getaddrinfo to resolve from (remote-host: 141.165.181.147)
[2020-05-27 03:50:05.347099] D [MSGID: 0] [common-utils.c:532:gf_resolve_ip6] 0-resolver: returning ip-141.165.181.147 (port-24007) for hostname: 141.165.181.147 and port: 24007
[2020-05-27 03:50:05.347126] D [socket.c:3384:socket_fix_ssl_opts] 0-speech_vol_v2-client-55: disabling SSL for portmapper connection
[2020-05-27 03:50:05.347336] D [MSGID: 0] [client.c:2323:client_rpc_notify] 0-speech_vol_v2-client-55: got RPC_CLNT_CONNECT
[2020-05-27 03:50:05.347542] D [rpc-clnt-ping.c:96:rpc_clnt_remove_ping_timer_locked] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13a)[0x7fb22321e8ea] (--> /lib64/libgfrpc.so.0(+0x12eeb)[0x7fb222fc9eeb] (--> /lib64/libgfrpc.so.0(+0x13691)[0x7fb222fca691] (--> /lib64/libgfrpc.so.0(rpc_clnt_submit+0x3b4)[0x7fb222fc6914] (--> /usr/lib64/glusterfs/7.2/xlator/protocol/client.so(+0x135b2)[0x7fb2146675b2] ))))) 0-: 141.165.181.147:24007: ping timer event already removed
[2020-05-27 03:50:05.347591] D [MSGID: 0] [client-handshake.c:1399:server_has_portmap] 0-speech_vol_v2-client-55: detected portmapper on server
[2020-05-27 03:50:05.347633] D [rpc-clnt-ping.c:195:rpc_clnt_ping_cbk] 0-speech_vol_v2-client-55: Ping latency is 0ms
[2020-05-27 03:50:05.347701] I [rpc-clnt.c:1963:rpc_clnt_reconfig] 0-speech_vol_v2-client-55: changing port to 49158 (from 0)
[2020-05-27 03:50:05.347735] D [socket.c:3053:socket_event_handler] 0-transport: EPOLLERR - disconnecting (sock:6) (non-SSL)
[2020-05-27 03:50:05.347749] D [MSGID: 0] [client.c:2394:client_rpc_notify] 0-speech_vol_v2-client-55: disconnected (skipped notify)
[2020-05-27 03:50:05.347765] D [name.c:171:client_fill_address_family] 0-speech_vol_v2-client-55: address-family not specified, marking it as unspec for getaddrinfo to resolve from (remote-host: 141.165.181.147)
/0-speech_vol_v2-client-55
[2020-05-27 03:51:34.303977] D [logging.c:1756:gf_log_flush_timeout_cbk] 0-logging-infra: Log timer timed out. About to flush outstanding messages if present
[2020-05-27 03:50:05.347748] D [MSGID: 0] [client.c:2334:client_rpc_notify] 0-speech_vol_v2-client-55: got RPC_CLNT_DISCONNECT
[2020-05-27 03:50:05.350156] D [MSGID: 0] [common-utils.c:532:gf_resolve_ip6] 0-resolver: returning ip-141.165.181.147 (port-24007) for hostname: 141.165.181.147 and port: 24007
[2020-05-27 03:51:34.304035] D [logging.c:1718:__gf_log_inject_timer_event] 0-logging-infra: Starting timer now. Timeout = 120, current buf size = 5
[2020-05-27 03:52:12.648388] D [socket.c:3053:socket_event_handler] 0-transport: EPOLLERR - disconnecting (sock:7) (non-SSL)
[2020-05-27 03:52:12.648424] D [MSGID: 0] [client.c:2334:client_rpc_notify] 0-speech_vol_v2-client-55: got RPC_CLNT_DISCONNECT
[2020-05-27 03:52:12.648591] D [rpc-clnt-ping.c:96:rpc_clnt_remove_ping_timer_locked] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13a)[0x7fb22321e8ea] (--> /lib64/libgfrpc.so.0(+0x12eeb)[0x7fb222fc9eeb] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x92)[0x7fb222fc5922] (--> /lib64/libgfrpc.so.0(+0xf4e8)[0x7fb222fc64e8] (--> /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fb222fc2a93] ))))) 0-: 141.165.181.147:24007: ping timer event already removed
[2020-05-27 03:52:15.648683] D [name.c:171:client_fill_address_family] 0-speech_vol_v2-client-55: address-family not specified, marking it as unspec for getaddrinfo to resolve from (remote-host: 141.165.181.147)
[2020-05-27 03:52:15.651027] D [MSGID: 0] [common-utils.c:532:gf_resolve_ip6] 0-resolver: returning ip-141.165.181.147 (port-24007) for hostname: 141.165.181.147 and port: 24007
[2020-05-27 03:52:15.651052] D [socket.c:3384:socket_fix_ssl_opts] 0-speech_vol_v2-client-55: disabling SSL for portmapper connection
[2020-05-27 03:52:15.651271] D [MSGID: 0] [client.c:2323:client_rpc_notify] 0-speech_vol_v2-client-55: got RPC_CLNT_CONNECT
[2020-05-27 03:52:15.651501] D [rpc-clnt-ping.c:96:rpc_clnt_remove_ping_timer_locked] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13a)[0x7fb22321e8ea] (--> /lib64/libgfrpc.so.0(+0x12eeb)[0x7fb222fc9eeb] (--> /lib64/libgfrpc.so.0(+0x13691)[0x7fb222fca691] (--> /lib64/libgfrpc.so.0(rpc_clnt_submit+0x3b4)[0x7fb222fc6914] (--> /usr/lib64/glusterfs/7.2/xlator/protocol/client.so(+0x135b2)[0x7fb2146675b2] ))))) 0-: 141.165.181.147:24007: ping timer event already removed
[2020-05-27 03:52:15.651557] D [MSGID: 0] [client-handshake.c:1399:server_has_portmap] 0-speech_vol_v2-client-55: detected portmapper on server
[2020-05-27 03:52:15.651693] D [rpc-clnt-ping.c:195:rpc_clnt_ping_cbk] 0-speech_vol_v2-client-55: Ping latency is 0ms
[2020-05-27 03:52:15.651734] I [rpc-clnt.c:1963:rpc_clnt_reconfig] 0-speech_vol_v2-client-55: changing port to 49158 (from 0)
[2020-05-27 03:52:15.651758] D [socket.c:3053:socket_event_handler] 0-transport: EPOLLERR - disconnecting (sock:6) (non-SSL)
[2020-05-27 03:52:15.651770] D [MSGID: 0] [client.c:2394:client_rpc_notify] 0-speech_vol_v2-client-55: disconnected (skipped notify)
[2020-05-27 03:52:15.651785] D [name.c:171:client_fill_address_family] 0-speech_vol_v2-client-55: address-family not specified, marking it as unspec for getaddrinfo to resolve from (remote-host: 141.165.181.147)
[2020-05-27 03:53:34.304123] D [logging.c:1756:gf_log_flush_timeout_cbk] 0-logging-infra: Log timer timed out. About to flush outstanding messages if present
[2020-05-27 03:52:15.651768] D [MSGID: 0] [client.c:2334:client_rpc_notify] 0-speech_vol_v2-client-55: got RPC_CLNT_DISCONNECT
[2020-05-27 03:52:15.654154] D [MSGID: 0] [common-utils.c:532:gf_resolve_ip6] 0-resolver: returning ip-141.165.181.147 (port-24007) for hostname: 141.165.181.147 and port: 24007
[2020-05-27 03:53:34.304179] D [logging.c:1718:__gf_log_inject_timer_event] 0-logging-infra: Starting timer now. Timeout = 120, current buf size = 5
0-speech_vol_v2-client-55对应服务端节点(147下的/data7/brick)日志
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
[2020-05-27 01:02:22.421765] W [socket.c:774:__socket_rwv] 0-tcp.speech_vol_v2-server: readv on 141.165.32.25:47645 failed (No data available)
[2020-05-27 01:02:22.421795] I [MSGID: 115036] [server.c:501:server_rpc_notify] 0-speech_vol_v2-server: disconnecting connection from CTX_ID:de438bc8-1b52-446c-817f-60d646af0ba6-GRAPH_ID:2-PID:84971-HOST:ai-vtraining-prd-141-165-32-25.v-bj-4.vivo.lan-PC_NAME:speech_vol_v2-client-55-RECON_NO:-0
[2020-05-27 01:02:26.918776] W [socket.c:774:__socket_rwv] 0-tcp.speech_vol_v2-server: readv on 141.165.32.25:47562 failed (No data available)
[2020-05-27 01:02:26.918807] I [MSGID: 115036] [server.c:501:server_rpc_notify] 0-speech_vol_v2-server: disconnecting connection from CTX_ID:36965041-2046-4413-88a4-08bc06fc1de4-GRAPH_ID:2-PID:87591-HOST:ai-vtraining-prd-141-165-32-25.v-bj-4.vivo.lan-PC_NAME:speech_vol_v2-client-55-RECON_NO:-0
[2020-05-27 01:03:13.270262] W [socket.c:774:__socket_rwv] 0-tcp.speech_vol_v2-server: readv on 141.165.86.129:48212 failed (No data available)
[2020-05-27 01:03:13.270295] I [MSGID: 115036] [server.c:501:server_rpc_notify] 0-speech_vol_v2-server: disconnecting connection from CTX_ID:2dafdf38-ff96-45cc-99c2-1238764b59ae-GRAPH_ID:0-PID:75668-HOST:ai-vtraining-prd-141-165-86-129.v-bj-4.vivo.lan-PC_NAME:speech_vol_v2-client-55-RECON_NO:-0
[2020-05-27 01:03:13.285400] W [socket.c:774:__socket_rwv] 0-tcp.speech_vol_v2-server: readv on 141.165.86.129:48162 failed (No data available)
[2020-05-27 01:03:13.285422] I [MSGID: 115036] [server.c:501:server_rpc_notify] 0-speech_vol_v2-server: disconnecting connection from CTX_ID:4251ffc2-e1c5-4357-a5d2-7c92e8221fad-GRAPH_ID:0-PID:75625-HOST:ai-vtraining-prd-141-165-86-129.v-bj-4.vivo.lan-PC_NAME:speech_vol_v2-client-55-RECON_NO:-0
[2020-05-27 01:36:05.410946] W [socket.c:774:__socket_rwv] 0-tcp.speech_vol_v2-server: readv on 10.196.20.133:47621 failed (No data available)
[2020-05-27 01:36:05.410974] I [MSGID: 115036] [server.c:501:server_rpc_notify] 0-speech_vol_v2-server: disconnecting connection from CTX_ID:57fbf79a-bae3-4bcd-8670-70d559956cac-GRAPH_ID:0-PID:15841-HOST:ai-vtraining-gpu-prd-10-196-20-133.v-bj-4.vivo.lan-PC_NAME:speech_vol_v2-client-55-RECON_NO:-0
[2020-05-27 01:36:06.056913] W [socket.c:774:__socket_rwv] 0-tcp.speech_vol_v2-server: readv on 10.196.20.133:47331 failed (No data available)
[2020-05-27 01:36:06.056941] I [MSGID: 115036] [server.c:501:server_rpc_notify] 0-speech_vol_v2-server: disconnecting connection from CTX_ID:b68c4c8e-e8d8-4d44-97a3-a503791d6177-GRAPH_ID:0-PID:16094-HOST:ai-vtraining-gpu-prd-10-196-20-133.v-bj-4.vivo.lan-PC_NAME:speech_vol_v2-client-55-RECON_NO:-0
[2020-05-27 02:10:34.719707] W [socket.c:774:__socket_rwv] 0-tcp.speech_vol_v2-server: readv on 141.165.30.45:48603 failed (No data available)
[2020-05-27 02:10:34.719738] I [MSGID: 115036] [server.c:501:server_rpc_notify] 0-speech_vol_v2-server: disconnecting connection from CTX_ID:0ed18a02-3363-4cba-9313-93f7b221ede4-GRAPH_ID:10-PID:6869-HOST:ai-vtraining-prd-141-165-30-45.v-bj-4.vivo.lan-PC_NAME:speech_vol_v2-client-55-RECON_NO:-0
[2020-05-27 02:13:50.989206] I [addr.c:54:compare_addr_and_update] 0-/data7/brick: allowed = "*", received addr = "141.165.30.45"
[2020-05-27 02:13:50.989234] I [MSGID: 115029] [server-handshake.c:549:server_setvolume] 0-speech_vol_v2-server: accepted client from CTX_ID:a327610d-0acf-4785-a912-17e87c126ba9-GRAPH_ID:0-PID:67045-HOST:ai-vtraining-prd-141-165-30-45.v-bj-4.vivo.lan-PC_NAME:speech_vol_v2-client-55-RECON_NO:-0 (version: 6.5) with subvol /data7/brick
[2020-05-27 02:17:13.967294] W [socket.c:774:__socket_rwv] 0-tcp.speech_vol_v2-server: readv on 141.165.30.45:48606 failed (No data available)
[2020-05-27 02:39:55.104419] W [glusterfsd.c:1596:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f9d087f1dd5] -->/usr/sbin/glusterfsd(glusterfs_sigwaiter+0xe5) [0x55f6769bb625] -->/usr/sbin/glusterfsd(cleanup_and_exit+0x6b) [0x55f6769bb48b] ) 0-: received signum (15), shutting down
[2020-05-27 02:39:55.104502] W [socket.c:774:__socket_rwv] 0-glusterfs: writev on 141.165.181.147:24007 failed (Broken pipe)