mount目录挂载导致目标存储服务器宕机

2016/02/19 linux

mount nfs目录挂载导致目标服务器宕机

宕机背景

由于业务规模一般,现业务由两台互相备大容量存储服务器BIG42、BIG43 向 8台WEB服务器提供文件存储服务。

BIG42、BIG43部署情况

BIG42、BIG43采用sersync实现互备(基于rsync互相同步数据,BIG42新增数据会向BIG43同步,BIG43新增数据会向BIG42同步)

WEB服务器部署情况

WEB服务器通过mount -t nfs方式向BIG42、BIG43挂载远程目录使用

mount -t nfs xx.xxx.xxx.42:/data/uploads/www_xxxx_com/data /home/wwwroot/xxx.xxxxx.com/data
...

宕机现象

  • WEB服务器 : load average达到15~18
  • WEB服务器 : df -h 无法查看到挂载目录
  • WEB服务器 : nginx服务停止,站点无法访问(挂载BIG42的WEB均出现故障)
  • DATA服务器 : 晚上20点BIG43宕机ssh无法连接、BIG42正常load average在0.5以下
  • DATA服务器 : 紧急将BIG43的挂载切换至BIG42,业务暂时恢复,次日凌晨02点BIG42出现宕机

BIG42日志

Jun 21 02:43:47 szwg-m91-store-webmedia02 xinetd[9600]: Deactivating service rsync due to excessive incoming connections.  Restarting in 10 seconds.
Jun 21 02:43:47 szwg-m91-store-webmedia02 xinetd[9600]: FAIL: rsync connections per second from=10.199.146.43
Jun 21 02:43:47 szwg-m91-store-webmedia02 xinetd[9600]: EXIT: rsync status=0 pid=1317 duration=0(sec)
Jun 21 02:43:57 szwg-m91-store-webmedia02 xinetd[9600]: Activating service rsync
Jun 21 02:44:00 szwg-m91-store-webmedia02 xinetd[9600]: select reported EBADF but no bad file descriptors were found
Jun 21 02:44:00 szwg-m91-store-webmedia02 xinetd[9600]: START: rsync pid=30137 from=10.199.146.43
Jun 21 02:44:00 szwg-m91-store-webmedia02 xinetd[9600]: select reported EBADF but no bad file descriptors were found
Jun 21 02:44:00 szwg-m91-store-webmedia02 xinetd[9600]: START: rsync pid=30138 from=10.199.146.43
Jun 21 02:44:00 szwg-m91-store-webmedia02 xinetd[9600]: select reported EBADF but no bad file descriptors were found
Jun 21 02:44:00 szwg-m91-store-webmedia02 xinetd[9600]: START: rsync pid=30162 from=10.199.146.43
Jun 21 02:44:00 szwg-m91-store-webmedia02 xinetd[9600]: select reported EBADF but no bad file descriptors were found
Jun 21 02:44:00 szwg-m91-store-webmedia02 xinetd[9600]: START: rsync pid=30175 from=10.199.146.43
Jun 21 02:44:00 szwg-m91-store-webmedia02 xinetd[9600]: select reported EBADF but no bad file descriptors were found
Jun 21 02:44:00 szwg-m91-store-webmedia02 xinetd[9600]: START: rsync pid=30176 from=10.199.146.43
Jun 21 02:44:00 szwg-m91-store-webmedia02 xinetd[9600]: 1 descriptors still set
Jun 21 02:44:00 szwg-m91-store-webmedia02 xinetd[9600]: 1 descriptors still set
Jun 21 02:44:00 szwg-m91-store-webmedia02 rsyslogd-2177: imuxsock begins to drop messages from pid 9600 due to rate-limiting
...
Jun 21 02:44:06 szwg-m91-store-webmedia02 rsyslogd-2177: imuxsock lost 423120 messages from pid 9600 due to rate-limiting
...

故障分析

原因:

xinetd当最大连接数到达xx数据量后,xinetd会将配置的服务停止yy秒,用于防止DDOS攻击.

“Deactivating service rsync due to excessive incoming connections.  Restarting in 10 seconds.”

相关配置:

/etc/xinetd.d/rsync配置中没有定义per_source、instances参数,此时将继承defaults部分参数设置。

cat /etc/xinetd.conf
defaults
{
...
cps		    = 50 10
instances	= 50
per_source	= 10
...


cat /etc/xinetd.d/rsync
service rsync
{
        disable = yes
        flags           = IPv6
        socket_type     = stream
        wait            = no
        user            = root
        server          = /usr/bin/rsync
        server_args     = --daemon
        log_on_failure  += USERID
}

参数说明:

cps       :用来设定连接速率。它需要两个参数,第一个参数表示每秒可以处理的连接数,如果超过了这个连接数时,之后进入的连接将被暂时停止处理;第二个参数表示停止处理多少秒后,继续处理先前暂停处理的连接
per_source:参数值可以为整数或者UNLIMITED关键字。它表示每一个IP地址上最多可以建立的实例数目。本属性也可以定义在defaults部分
instances :接受一个大于或等于1的整数或UNLIMITED。设置可同时运行的最大进程数。UNLIMITED意味着xinetd对该数没有限制

故障修复

不限制per_source和instances

调整配置:

cat /etc/xinetd.d/rsync
service rsync
{
	per_source      = UNLIMITED
    instances       = UNLIMITED
	disable	        = no
	flags		    = IPv6
	socket_type     = stream
	wait            = no
	user            = root
	server          = /usr/bin/rsync
	server_args     = --daemon
	log_on_failure  += USERID
}

参考资料

http://server.it168.com/a2009/0624/594/000000594998_2.shtml
http://blog.sina.com.cn/s/blog_5d3da3280100btjh.html
http://blog.chinaunix.net/uid-20269419-id-4192611.html

Search

    Table of Contents