您的足迹:首页 > Hbase >hbase集群部分节点HRegionServer启动后自动关闭的问题

hbase集群部分节点HRegionServer启动后自动关闭的问题

我有4HRegionServer节点,1个master,其中3个是unbuntu 系统,2个节点是centos 6.5,

启动过程都很正常,但是一会后slave3 的HRegionServer会自动关闭.

查看tail -n100 hbase-hadoop-regionserver-Slave3.log日志如下:

015-07-04 16:18:52,761 WARN  [regionserver/Slave3/192.168.2.38:16020] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=Master:2181,Slave1:2181,Slave2:2181,Slave3:2181,Slavrg.apache.zookeeper.KeeperException$OperationTimeoutException: KeeperErrorCode = OperationTimeout
2015-07-04 16:18:52,761 ERROR [regionserver/Slave3/192.168.2.38:16020] zookeeper.RecoverableZooKeeper: ZooKeeper delete failed after 4 attempts
2015-07-04 16:18:52,762 WARN  [regionserver/Slave3/192.168.2.38:16020] regionserver.HRegionServer: Failed deleting my ephemeral node
org.apache.zookeeper.KeeperException$OperationTimeoutException: KeeperErrorCode = OperationTimeout
        at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.checkZk(RecoverableZooKeeper.java:145)
        at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:179)
        at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1347)
        at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1336)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1391)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1074)
        at java.lang.Thread.run(Thread.java:745)
2015-07-04 16:18:52,767 INFO  [regionserver/Slave3/192.168.2.38:16020] regionserver.HRegionServer: stopping server Slave3,16020,1435997816385; zookeeper connection close

 

通过调整系统时间解决问题了.参考内容复制如下:

2、问题原因是时间不致造成的,解决方法如下:
1)在hbase-site.xml文件中 修改增加 ,将时间改大点
<property>
<name>hbase.master.maxclockskew</name>
<value>150000</value>
</property>
2)修改系统时间,将时间改为一致(建议采用本方法):
修改日期
date -s 11/23/2013
修改时间
date -s 15:14:00
检查硬件(CMOS)时间
clock -r
将系统时间写入CMOS
clock -w

3、修改完成后单独启动HRegionServer节点即可:
启动集群中所有的regionserver
./hbase-daemons.sh start regionserver
启动某个regionserver
./hbase-daemon.sh start regionserver

其实最好关闭hbase和hadoop之后重启,才能浏览器http://192.168.2.35:16010/查看到结果.

其他错误集锦

进入hbase shell后,提示

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hbase/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

原因是发生jar包冲突,删除hbase中的即可

 rm lib/slf4j-log4j12-1.6.4.jar

HBase shell执行list命令报错

在Hbase shell执行list命令报错:

2014-07-17 14:01:32,384 ERROR [main] zookeeper.ZooKeeperWatcher: hconnection-0x4b936059, quorum=slave2:2181,slave1:2181,master:2181, baseZNode=/hbase Received unexpected KeeperException, re-throwing exception
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
        at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:199)
        at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:479)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.checkIfBaseNodeAvailable(HConnectionManager.java:822)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.access$200(HConnectionManager.java:544)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$StubMaker.makeStubNoRetries(HConnectionManager.java:1517)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$StubMaker.makeStub(HConnectionManager.java:1563)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceStubMaker.makeStub(HConnectionManager.java:1618)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveMasterService(HConnectionManager.java:1826)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.listTableNames(HConnectionManager.java:2542)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getTableNames(HConnectionManager.java:2532)
        at org.apache.hadoop.hbase.client.HBaseAdmin.getTableNames(HBaseAdmin.java:352)
        at org.apache.hadoop.hbase.client.HBaseAdmin.getTableNames(HBaseAdmin.java:368)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.j
2014-07-17 14:01:32,387 ERROR [main] client.HConnectionManager$HConnectionImplementation: Can't get connection to ZooKeeper: KeeperErrorCode = ConnectionLoss for /hbase

关键错误信息:client.HConnectionManager$HConnectionImplementation: Can't get connection to ZooKeeper: KeeperErrorCode = ConnectionLoss for /hbase。根据信息可以判断zk无法连接。执行jps查看zk都正常。查看hbase-site.xml中zk节点配置正常。根据经验,应该是防火墙没有关闭,2181端口无法访问。ok执行service iptables stop关闭防火墙,重启hbase。进入hbase shell,执行list:

hbase(main):001:0> list
TABLE
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/root/hadoop/hbase/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/hadoop/hadoop-2.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
2014-07-17 14:06:26,013 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
0 row(s) in 1.0070 seconds=> []

一切正常,问题解决。

HBase Shell 增删改异常。在hbase shell上做增删改就会报异常,

zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect。
经判断是hbase版本的jar包和hadoop中的jar包不兼容的问题。解决方法:将hadoop中hadoop-2.2.0相关的jar包copy过来(${HABASE_HOME}/lib)替换即可。
本博客所有文章如无特别注明均为原创。作者:数据为王复制或转载请以超链接形式注明转自 数据为王
原文地址《hbase集群部分节点HRegionServer启动后自动关闭的问题

相关推荐


  • blogger

发表评论

路人甲 表情
看不清楚?点图切换 Ctrl+Enter快速提交

网友评论(0)