上班坐下来没多久,接同事电话说有两台mysql服务器无法访问,其中这两台服务器是mycat服务器+MySQL服务器,具体处理过程如下:
一、错误信息
错误信息01:
- INFO | jvm 3 | 2018/05/10 10:00:54 | 2018-05-10 10:00:54,406 [INFO ][$_NIOREACTOR-3-RW] close connection,reason:Bad:
- java.lang.OutOfMemoryError: Unable to acquire 131072 bytes of memory, got 0 ,MySQLConnection [id=6173, lastTime=1525917654
- 250, user=username, schema=database10, old shema=database10, borrowed=true, fromSlaveDB=true, threadId=42067, charset=utf8, txIsolation=3,
- autocommit=true, attachment=dn10{SELECT *
- INFO | jvm 3 | 2018/05/10 10:00:54 | FROM cu_link
- INFO | jvm 3 | 2018/05/10 10:00:54 | WHERE (customer_id = 1964
- INFO | jvm 3 | 2018/05/10 10:00:54 | OR customer_link_id = 1964)
- INFO | jvm 3 | 2018/05/10 10:00:54 | AND state = 1
- INFO | jvm 3 | 2018/05/10 10:00:54 | AND company_id = '3BD'
- INFO | jvm 3 | 2018/05/10 10:00:54 | LIMIT 100000000}, respHandler=io.mycat.backend.mysql.nio.handler.MultiNodeQueryH
- andler@530d01dc, host=xx.xx.xx.xx, port=3306, statusSync=null, writeQueue=0, modifiedSQLExecuted=false] (io.mycat.net.
- AbstractConnection:AbstractConnection.java:508)
- INFO | jvm 3 | 2018/05/10 10:00:54 | 2018-05-10 10:00:54,407 [ERROR][$_NIOREACTOR-3-RW] caught err: java.lang.OutOfM
- emoryError: Unable to acquire 131072 bytes of memory, got 0
- INFO | jvm 3 | 2018/05/10 10:00:54 | at io.mycat.memory.unsafe.memory.mm.MemoryConsumer.allocateLongArray(Memor
- yConsumer.java:92)
- INFO | jvm 3 | 2018/05/10 10:00:54 | at io.mycat.memory.unsafe.utils.sort.UnsafeInMemorySorter.<init>(UnsafeInM
- emorySorter.java:114)
- INFO | jvm 3 | 2018/05/10 10:00:54 | at io.mycat.memory.unsafe.utils.sort.UnsafeExternalSorter.<init>(UnsafeExt
- ernalSorter.java:131)
- INFO | jvm 3 | 2018/05/10 10:00:54 | at io.mycat.memory.unsafe.utils.sort.UnsafeExternalSorter.create(UnsafeExt
- ernalSorter.java:98)
- INFO | jvm 3 | 2018/05/10 10:00:54 | at io.mycat.memory.unsafe.utils.sort.UnsafeExternalRowSorter.<init>(Unsafe
- ExternalRowSorter.java:64)
- INFO | jvm 3 | 2018/05/10 10:00:54 | at io.mycat.sqlengine.mpp.DataNodeMergeManager.onRowMetaData(DataNodeMerge
- Manager.java:249)
- INFO | jvm 3 | 2018/05/10 10:00:54 | at io.mycat.backend.mysql.nio.handler.MultiNodeQueryHandler.fieldEofRespon
- se(MultiNodeQueryHandler.java:643)
- INFO | jvm 3 | 2018/05/10 10:00:54 | at io.mycat.backend.mysql.nio.MySQLConnectionHandler.handleFieldEofPacket(
- MySQLConnectionHandler.java:192)
- INFO | jvm 3 | 2018/05/10 10:00:54 | at io.mycat.backend.mysql.nio.MySQLConnectionHandler.handleData(MySQLConne
- ctionHandler.java:118)
- INFO | jvm 3 | 2018/05/10 10:00:54 | at io.mycat.net.handler.BackendAsyncHandler.offerData(BackendAsyncHandler.
- java:36)
- INFO | jvm 3 | 2018/05/10 10:00:54 | at io.mycat.backend.mysql.nio.MySQLConnectionHandler.handle(MySQLConnectio
- nHandler.java:80)
- INFO | jvm 3 | 2018/05/10 10:00:54 | at io.mycat.net.AbstractConnection.handle(AbstractConnection.java:263)
- INFO | jvm 3 | 2018/05/10 10:00:54 | at io.mycat.net.AbstractConnection.onReadData(AbstractConnection.java:321)
- INFO | jvm 3 | 2018/05/10 10:00:54 | at io.mycat.net.NIOSocketWR.asynRead(NIOSocketWR.java:190)
- INFO | jvm 3 | 2018/05/10 10:00:54 | at io.mycat.net.AbstractConnection.asynRead(AbstractConnection.java:273)
- INFO | jvm 3 | 2018/05/10 10:00:54 | at io.mycat.net.NIOReactor$RW.run(NIOReactor.java:102)
- INFO | jvm 3 | 2018/05/10 10:00:54 | at java.lang.Thread.run(Thread.java:745)
- INFO | jvm 3 | 2018/05/10 10:00:54 | (io.mycat.net.NIOReactor$RW:NIOReactor.java:132)
复制代码
错误信息02:
- LIMIT 100000000}, packetId=1], host=xx.xx.xx.xx, port=3306, statusSync=null, writeQueue=0, modifiedSQLExecuted=false] frontend host:x.x.x.x/49657/username
- 2018-05-10 10:25:39.883 WARN [$_NIOREACTOR-2-RW] (io.mycat.backend.mysql.nio.handler.SingleNodeHandler.backConnectionErr(SingleNodeHandler.java:249)) - execute sql err : errno:1146 Table 'cr.vacode' doesn't exist con:MySQLConnection [id=89, lastTime=1525919139870, user=username, schema=cr, old shema=cr, borrowed=true, fromSlaveDB=true, threadId=553442, charset=utf8, txIsolation=3, autocommit=true, attachment=dn00{SELECT *
- FROM vaCode
- ORDER BY id DESC
复制代码
错误信息03:
- java.sql.SQLNonTransientException: find no Route:SELECT * FROM `database03`.`cain` LIMIT 0
复制代码
二、错误原因
1、针对错误信息01,基本可以断定是由于服务器jvm设置不当,导致内存溢出。
2、针对错误信息02,基本可以断定是由于在数据库服务器新增表,没有配置在mycat中或由于MyCAT大小写配置和数据库服务器表名大小写配置有关
3、针对错误信息03,基本可以断定路由问题
三、解决方案
1、查看系统内存
- [root@localhost ~]# free -g
- total used free shared buffers cached
- Mem: 15 15 0 5 0 6
- -/+ buffers/cache: 8 6
- Swap: 7 5 2
复制代码
2、查看系统日志
[root@localhost logs]#vi /var/log/message
如果是OOM的话,系统日志会记录的。
3、查看数据库日志和mycat日志
[root@localhost logs]#vi /home/mysql/err.log
[root@localhost logs]# vi /usr/local/mycat/logs/mycat.log [root@localhost logs]# [root@localhost logs]# vi /usr/local/mycat/logs/wrapper.log
4、查看监控情况
通过zabbix查看系统CPU、内存、SWAP使用情况
5、查看具体进程对应资源消耗
[root@localhost ~]# top -p 31665

[root@localhost ~]# top -p 30388

通过ps 查看系统进程的时候和top查看资源消耗的时候,发现居然还跑着oracle进程,而且其sga设置为8G,PGA设置的是2G

这里看到实际设置的最大应用内存(21G+)已经超过了物理内存的实际最大值(15G),而写看到系统已经使用了swap分区(虽然现在看着还有7G左右的cache),说明
6、查看mycat的进程运行内存
下面的最大堆内存4G(具体使用情况可以通过jmap查看),单线程内存最大64M,最大的nio内存2G(DirectMemory是java nio引入的,直接以native的方式分配内存,不受jvm管理)

查看其具体配置文件
[root@localhost mycat]# more /usr/local/mycat/conf/wrapper.conf
- # Java Additional Parameters
- #wrapper.java.additional.1=
- wrapper.java.additional.1=-DMYCAT_HOME=.
- wrapper.java.additional.2=-server
- wrapper.java.additional.3=-XX:MaxPermSize=64M
- wrapper.java.additional.4=-XX:+AggressiveOpts
- wrapper.java.additional.5=-XX:MaxDirectMemorySize=2G
- wrapper.java.additional.6=-Dcom.sun.management.jmxremote
- wrapper.java.additional.7=-Dcom.sun.management.jmxremote.port=1984
- wrapper.java.additional.8=-Dcom.sun.management.jmxremote.authenticate=false
- wrapper.java.additional.9=-Dcom.sun.management.jmxremote.ssl=false
- wrapper.java.additional.10=-Xmx4G
- wrapper.java.additional.11=-Xms1G
- # Initial Java Heap Size (in MB)
- #wrapper.java.initmemory=3
- # Maximum Java Heap Size (in MB)
- #wrapper.java.maxmemory=64
- # Application parameters. Add parameters as needed starting from 1
- wrapper.app.parameter.1=io.mycat.MycatStartup
- wrapper.app.parameter.2=start
复制代码
这里看内存设置还是正常范围内
7、最终解决方案
(由于开发联系我时,我测试数据库服务、MyCAT服务均正常,所以具体排查过程大致如上,后经与开发沟通,他们说是磁盘空间不足导致的,清理磁盘重启mycat问题解决)
8、资源统计
[root@localhost logs]# dstat -lrmc --top-io --top-mem --top-cpu Terminal width too small, trimming output. ---load-avg--- --io/total- ------memory-usage----- ----total-cpu-usage---- ----most-expensive---- --most-expensive-> 1m 5m 15m | read writ| used buff cach free|usr sys idl wai hiq siq| i/o process | memory process > 0.86 0.56 0.47|76.7 105 |8825M 7816k 6938M 181M| 4 1 92 3 0 0|init 16M 477k|mysqld 5789M> 0.86 0.56 0.47|1083 177 |8835M 7828k 6928M 181M| 30 4 47 19 0 1|mysqld 416k 4354k|mysqld 5789M>
|