Discussion:
Region Server lost response when doing BatchUpdate
(too old to reply)
11 Nov.
2009-04-13 12:28:35 UTC
Permalink
hi coleagues,
We are doing data inserting on 32 nodes hbase cluster using mapreduce
framework recently, but the operation always gets failed because of
regionserver exceptions. We issued 4 map task on the same node
simultaneously, and exploit the BatchUpdate() function to handle work of
inserting data.
We had been suffered from such problem since last month, which only took
place on relatively large clusters at high concurrent inserting rate. We are
using hadoop-0.19.2 on current svn, and it's the head revision on svn last
week. We are using hbase 0.19.0.

Here is the configure file of hadoop-site.xml:

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.33.204:11004/</value>
</property>

<property>
<name>mapred.job.tracker</name>
<value>192.168.33.204:11005</value>
</property>

<property>
<name>dfs.secondary.http.address</name>
<value>0.0.0.0:51100</value>
<description>
The secondary namenode http server address and port.
If the port is 0 then the server will start on a free port.
</description>
</property>

<property>
<name>dfs.datanode.address</name>
<value>0.0.0.0:51110</value>
<description>
The address where the datanode server will listen to.
If the port is 0 then the server will start on a free port.
</description>
</property>

<property>
<name>dfs.datanode.http.address</name>
<value>0.0.0.0:51175</value>
<description>
The datanode http server address and port.
If the port is 0 then the server will start on a free port.
</description>
</property>

<property>
<name>dfs.datanode.ipc.address</name>
<value>0.0.0.0:11010</value>
<description>
The datanode ipc server address and port.
If the port is 0 then the server will start on a free port.
</description>
</property>

<property>
<name>dfs.datanode.handler.count</name>
<value>30</value>
<description>The number of server threads for the datanode.</description>
</property>

<property>
<name>dfs.namenode.handler.count</name>
<value>30</value>
<description>The number of server threads for the namenode.</description>
</property>

<property>
<name>mapred.job.tracker.handler.count</name>
<value>30</value>
</property>

<property>
<name>mapred.reduce.parallel.copies</name>
<value>30</value>
</property>

<property>
<name>dfs.http.address</name>
<value>0.0.0.0:51170</value>
<description>
The address and the base port where the dfs namenode web ui will listen
on.
If the port is 0 then the server will start on a free port.
</description>
</property>

<property>
<name>dfs.datanode.max.xcievers</name>
<value>8192</value>
<description>
</description>
</property>

<property>
<name>dfs.datanode.socket.write.timeout</name>
<value>0</value>
<description>
</description>
</property>


<property>
<name>dfs.datanode.https.address</name>
<value>0.0.0.0:50477</value>
</property>

<property>
<name>dfs.https.address</name>
<value>0.0.0.0:50472</value>
</property>

<property>
<name>mapred.job.tracker.http.address</name>
<value>0.0.0.0:51130</value>
<description>
The job tracker http server address and port the server will listen on.
If the port is 0 then the server will start on a free port.
</description>
</property>

<property>
<name>mapred.task.tracker.http.address</name>
<value>0.0.0.0:51160</value>
<description>
The task tracker http server address and port.
If the port is 0 then the server will start on a free port.
</description>
</property>


<property>
<name>mapred.map.tasks</name>
<value>3</value>
</property>

<property>
<name>mapred.reduce.tasks</name>
<value>2</value>
</property>

<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>4</value>
<description>
The maximum number of map tasks that will be run simultaneously by a
task tracker.
</description>
</property>

<property>
<name>dfs.name.dir</name>

<value>/data0/hbase/filesystem/dfs/name,/data1/hbase/filesystem/dfs/name,/data2/hbase/filesystem/dfs/name,/data3/hbase/filesystem/dfs/name</value>
</property>

<property>
<name>dfs.data.dir</name>

<value>/data0/hbase/filesystem/dfs/data,/data1/hbase/filesystem/dfs/data,/data2/hbase/filesystem/dfs/data,/data3/hbase/filesystem/dfs/data</value>
</property>

<property>
<name>fs.checkpoint.dir</name>

<value>/data0/hbase/filesystem/dfs/namesecondary,/data1/hbase/filesystem/dfs/namesecondary,/data2/hbase/filesystem/dfs/namesecondary,/data3/hbase/filesystem/dfs/namesecondary</value>
</property>

<property>
<name>mapred.system.dir</name>
<value>/data1/hbase/filesystem/mapred/system</value>
</property>

<property>
<name>mapred.local.dir</name>

<value>/data0/hbase/filesystem/mapred/local,/data1/hbase/filesystem/mapred/local,/data2/hbase/filesystem/mapred/local,/data3/hbase/filesystem/mapred/local</value>
</property>

<property>
<name>dfs.replication</name>
<value>3</value>
</property>

<property>
<name>hadoop.tmp.dir</name>
<value>/data1/hbase/filesystem/tmp</value>
</property>

<property>
<name>mapred.task.timeout</name>
<value>3600000</value>
<description>The number of milliseconds before a task will be
terminated if it neither reads an input, writes an output, nor
updates its status string.
</description>
</property>

<property>
<name>ipc.client.idlethreshold</name>
<value>4000</value>
<description>Defines the threshold number of connections after which
connections will be inspected for idleness.
</description>
</property>


<property>
<name>ipc.client.connection.maxidletime</name>
<value>120000</value>
<description>The maximum time in msec after which a client will bring down
the
connection to the server.
</description>
</property>

<property>
<value>-Xmx256m -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode</value>
</property>

</configuration>






And here is the hbase-site.xml config file:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
<property>
<name>hbase.master</name>
<value>192.168.33.204:62000</value>
<description>The host and port that the HBase master runs at.
A value of 'local' runs the master and a regionserver in
a single process.
</description>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://192.168.33.204:11004/hbase</value>
<description>The directory shared by region servers.
Should be fully-qualified to include the filesystem to use.
E.g: hdfs://NAMENODE_SERVER:PORT/HBASE_ROOTDIR
</description>
</property>

<property>
<name>hbase.master.info.port</name>
<value>62010</value>
<description>The port for the hbase master web UI
Set to -1 if you do not want the info server to run.
</description>
</property>
<property>
<name>hbase.master.info.bindAddress</name>
<value>0.0.0.0</value>
<description>The address for the hbase master web UI
</description>
</property>
<property>
<name>hbase.regionserver</name>
<value>0.0.0.0:62020</value>
<description>The host and port a HBase region server runs at.
</description>
</property>

<property>
<name>hbase.regionserver.info.port</name>
<value>62030</value>
<description>The port for the hbase regionserver web UI
Set to -1 if you do not want the info server to run.
</description>
</property>
<property>
<name>hbase.regionserver.info.bindAddress</name>
<value>0.0.0.0</value>
<description>The address for the hbase regionserver web UI
</description>
</property>

<property>
<name>hbase.regionserver.handler.count</name>
<value>20</value>
</property>

<property>
<name>hbase.master.lease.period</name>
<value>180000</value>
</property>

</configuration>


Here is a slice of the error log file on one of the failed
regionservers, which lose response after the OOM Exception:

2009-04-13 15:20:26,077 FATAL
org.apache.hadoop.hbase.regionserver.HRegionServer: OutOfMemoryError,
aborting.
java.lang.OutOfMemoryError: Java heap space
2009-04-13 15:20:48,062 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
request=0, regions=121, stores=121, storefiles=5188, storefileIndexSize=195,
memcacheSize=214, usedHeap=4991, maxHeap=4991
2009-04-13 15:20:48,062 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
server on 62020
2009-04-13 15:20:48,063 INFO
org.apache.hadoop.hbase.regionserver.LogFlusher:
regionserver/0:0:0:0:0:0:0:0:62020.logFlusher exiting
2009-04-13 15:20:48,201 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Stopping infoServer
2009-04-13 15:20:48,228 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
Responder, call batchUpdates([***@74f0bb4e,
[Lorg.apache.hadoop.hbase.io.BatchUpdate;@689939dc) from
192.168.33.206:47754: output error
2009-04-13 15:20:48,229 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 5 on 62020 caught: java.nio.channels.ClosedByInterruptException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)

2009-04-13 15:20:48,229 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 5 on 62020: exiting
2009-04-13 15:20:48,297 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC
Server Responder
2009-04-13 15:20:48,552 INFO org.apache.zookeeper.ClientCnxn: Attempting
connection to server /192.168.33.204:2181
2009-04-13 15:20:48,552 WARN org.apache.zookeeper.ClientCnxn: Exception
closing session 0x0 to ***@480edf31
java.io.IOException: TIMED OUT
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:837)
2009-04-13 15:20:48,555 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 9 on 62020, call batchUpdates([***@3509aa7f,
[Lorg.apache.hadoop.hbase.io.BatchUpdate;@d98930d) from 192.168.33.234:44367:
error: java.io.IOException: Server not running, aborting
java.io.IOException: Server not running, aborting
at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2809)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:2304)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
2009-04-13 15:20:48,561 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
Responder, call batchUpdates([***@525a19ce,
[Lorg.apache.hadoop.hbase.io.BatchUpdate;@19544d9f) from
192.168.33.208:47852: output error
2009-04-13 15:20:48,561 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
Responder, call batchUpdates([***@483206fe,
[Lorg.apache.hadoop.hbase.io.BatchUpdate;@4c6932b9) from
192.168.33.221:37020: output error
2009-04-13 15:20:48,561 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 0 on 62020 caught: java.nio.channels.ClosedByInterruptException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)

2009-04-13 15:20:48,561 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 0 on 62020: exiting
2009-04-13 15:20:48,561 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 7 on 62020 caught: java.nio.channels.ClosedByInterruptException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)

2009-04-13 15:20:48,655 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 7 on 62020: exiting
2009-04-13 15:20:48,692 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
Responder, call batchUpdates([***@61af3c0e,
[Lorg.apache.hadoop.hbase.io.BatchUpdate;@378fed3c) from 192.168.34.1:35923:
output error
2009-04-13 15:20:48,877 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
Responder, call batchUpdates([***@2c4ff8dd,
[Lorg.apache.hadoop.hbase.io.BatchUpdate;@365b8be5) from 192.168.34.3:39443:
output error
2009-04-13 15:20:48,877 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 16 on 62020 caught: java.nio.channels.ClosedByInterruptException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)

2009-04-13 15:20:48,877 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 16 on 62020: exiting
2009-04-13 15:20:48,877 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
Responder, call batchUpdates([***@343d8344,
[Lorg.apache.hadoop.hbase.io.BatchUpdate;@32750027) from
192.168.33.236:45479: output error
2009-04-13 15:20:49,008 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 17 on 62020 caught: java.nio.channels.ClosedByInterruptException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)

2009-04-13 15:20:49,008 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 17 on 62020: exiting
2009-04-13 15:20:48,654 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
Responder, call batchUpdates([***@3ff34fed,
[Lorg.apache.hadoop.hbase.io.BatchUpdate;@7f047167) from
192.168.33.219:40059: output error
2009-04-13 15:20:48,654 ERROR com.cmri.hugetable.zookeeper.ZNodeWatcher:
processNode /hugetable09/hugetable/acl.lock error!KeeperErrorCode =
ConnectionLoss
2009-04-13 15:20:48,649 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
Responder, call batchUpdates([***@721d9b81,
[Lorg.apache.hadoop.hbase.io.BatchUpdate;@75cc6cae) from
192.168.33.254:51617: output error
2009-04-13 15:20:48,649 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 12 on 62020, call batchUpdates([***@655edc27,
[Lorg.apache.hadoop.hbase.io.BatchUpdate;@36c7b86f) from
192.168.33.238:51231: error: java.io.IOException: Server not running,
aborting
java.io.IOException: Server not running, aborting
at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2809)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:2304)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
2009-04-13 15:20:48,648 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
Responder, call batchUpdates([***@3c853cce,
[Lorg.apache.hadoop.hbase.io.BatchUpdate;@4f5b176c) from
192.168.33.209:43520: output error
2009-04-13 15:20:49,225 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 4 on 62020 caught: java.nio.channels.ClosedByInterruptException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)

2009-04-13 15:20:49,226 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 4 on 62020: exiting
2009-04-13 15:20:48,648 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
Responder, call batchUpdates([***@3509aa7f,
[Lorg.apache.hadoop.hbase.io.BatchUpdate;@d98930d) from 192.168.33.234:44367:
output error
2009-04-13 15:20:48,647 INFO org.mortbay.util.ThreadedServer: Stopping
Acceptor ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=62030]
2009-04-13 15:20:49,266 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 9 on 62020 caught: java.nio.channels.ClosedByInterruptException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)

2009-04-13 15:20:49,266 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 9 on 62020: exiting
2009-04-13 15:20:48,646 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 2 on 62020, call batchUpdates([***@2cc91b6,
[Lorg.apache.hadoop.hbase.io.BatchUpdate;@44724529) from
192.168.33.210:44154: error: java.io.IOException: Server not running,
aborting
java.io.IOException: Server not running, aborting
at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2809)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:2304)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
2009-04-13 15:20:48,572 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
Responder, call batchUpdates([***@e8136e0,
[Lorg.apache.hadoop.hbase.io.BatchUpdate;@4539b390) from
192.168.33.217:60476: output error
2009-04-13 15:20:49,272 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
Responder, call batchUpdates([***@2cc91b6,
[Lorg.apache.hadoop.hbase.io.BatchUpdate;@44724529) from
192.168.33.210:44154: output error
2009-04-13 15:20:49,272 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 8 on 62020 caught: java.nio.channels.ClosedByInterruptException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)

2009-04-13 15:20:49,272 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 8 on 62020: exiting
2009-04-13 15:20:49,263 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
Responder, call batchUpdates([***@655edc27,
[Lorg.apache.hadoop.hbase.io.BatchUpdate;@36c7b86f) from
192.168.33.238:51231: output error
2009-04-13 15:20:49,225 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 1 on 62020 caught: java.nio.channels.ClosedByInterruptException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)

2009-04-13 15:20:49,068 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 14 on 62020 caught: java.nio.channels.ClosedByInterruptException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)

2009-04-13 15:20:49,345 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 14 on 62020: exiting
2009-04-13 15:20:49,048 ERROR
org.apache.hadoop.hbase.regionserver.HRegionServer:
java.lang.OutOfMemoryError: Java heap space
2009-04-13 15:20:49,484 FATAL
org.apache.hadoop.hbase.regionserver.HRegionServer: OutOfMemoryError,
aborting.
java.lang.OutOfMemoryError: Java heap space
at
java.util.concurrent.ConcurrentHashMap$Values.iterator(ConcurrentHashMap.java:1187)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.getGlobalMemcacheSize(HRegionServer.java:2863)
at
org.apache.hadoop.hbase.regionserver.MemcacheFlusher.reclaimMemcacheMemory(MemcacheFlusher.java:260)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:2307)
at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
2009-04-13 15:20:49,488 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
request=0, regions=121, stores=121, storefiles=5188, storefileIndexSize=195,
memcacheSize=214, usedHeap=4985, maxHeap=4991
2009-04-13 15:20:49,489 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 15 on 62020, call batchUpdates([***@302bb17f,
[Lorg.apache.hadoop.hbase.io.BatchUpdate;@492218e) from 192.168.33.235:35276:
error: java.io.IOException: java.lang.OutOfMemoryError: Java heap space
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
at
org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1334)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1324)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:2320)
at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
Caused by: java.lang.OutOfMemoryError: Java heap space
at
java.util.concurrent.ConcurrentHashMap$Values.iterator(ConcurrentHashMap.java:1187)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.getGlobalMemcacheSize(HRegionServer.java:2863)
at
org.apache.hadoop.hbase.regionserver.MemcacheFlusher.reclaimMemcacheMemory(MemcacheFlusher.java:260)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:2307)
... 5 more
2009-04-13 15:20:49,490 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
Responder, call batchUpdates([***@302bb17f,
[Lorg.apache.hadoop.hbase.io.BatchUpdate;@492218e) from 192.168.33.235:35276:
output error
2009-04-13 15:20:49,047 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC
Server listener on 62020
2009-04-13 15:20:49,493 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 15 on 62020 caught: java.nio.channels.ClosedChannelException
at
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)

Any suggenstion is welcomed! Thanks a lot!
Jean-Daniel Cryans
2009-04-13 12:40:57 UTC
Permalink
I see that your region server had 5188 store files in 121 store, I'm
99% sure that it's the cause of your OOME. Luckily for you, we've been
working on this issue since last week. What you should do :

- Upgrade to HBase 0.19.1

- Apply the latest patch in
https://issues.apache.org/jira/browse/HBASE-1058 (the v3)

Then you should be good. As to what caused this huge number of store
files, I wouldn't be surprised if your data was uploaded sequentially
so that would mean that whatever the number of regions (hence the
level of distribution) in your table, only 1 region gets the load.
This implies that another work around to your problem would be to
insert with a more randomized pattern.

Thx for trying either solution,

J-D
Post by 11 Nov.
hi coleagues,
   We are doing data inserting on 32 nodes hbase cluster using mapreduce
framework recently, but the operation always gets failed because of
regionserver exceptions. We issued 4 map task on the same node
simultaneously, and exploit the BatchUpdate() function to handle work of
inserting data.
   We had been suffered from such problem since last month, which only took
place on relatively large clusters at high concurrent inserting rate. We are
using hadoop-0.19.2 on current svn, and it's the head revision on svn last
week. We are using hbase 0.19.0.
<configuration>
<property>
 <name>fs.default.name</name>
 <value>hdfs://192.168.33.204:11004/</value>
</property>
<property>
 <name>mapred.job.tracker</name>
 <value>192.168.33.204:11005</value>
</property>
<property>
 <name>dfs.secondary.http.address</name>
 <value>0.0.0.0:51100</value>
 <description>
   The secondary namenode http server address and port.
   If the port is 0 then the server will start on a free port.
 </description>
</property>
<property>
 <name>dfs.datanode.address</name>
 <value>0.0.0.0:51110</value>
 <description>
   The address where the datanode server will listen to.
   If the port is 0 then the server will start on a free port.
 </description>
</property>
<property>
 <name>dfs.datanode.http.address</name>
 <value>0.0.0.0:51175</value>
 <description>
   The datanode http server address and port.
   If the port is 0 then the server will start on a free port.
 </description>
</property>
<property>
 <name>dfs.datanode.ipc.address</name>
 <value>0.0.0.0:11010</value>
 <description>
   The datanode ipc server address and port.
   If the port is 0 then the server will start on a free port.
 </description>
</property>
<property>
 <name>dfs.datanode.handler.count</name>
 <value>30</value>
 <description>The number of server threads for the datanode.</description>
</property>
<property>
 <name>dfs.namenode.handler.count</name>
 <value>30</value>
 <description>The number of server threads for the namenode.</description>
</property>
<property>
 <name>mapred.job.tracker.handler.count</name>
 <value>30</value>
</property>
<property>
 <name>mapred.reduce.parallel.copies</name>
 <value>30</value>
</property>
<property>
 <name>dfs.http.address</name>
 <value>0.0.0.0:51170</value>
 <description>
   The address and the base port where the dfs namenode web ui will listen
on.
   If the port is 0 then the server will start on a free port.
 </description>
</property>
<property>
 <name>dfs.datanode.max.xcievers</name>
 <value>8192</value>
 <description>
 </description>
</property>
<property>
 <name>dfs.datanode.socket.write.timeout</name>
 <value>0</value>
 <description>
 </description>
</property>
<property>
 <name>dfs.datanode.https.address</name>
 <value>0.0.0.0:50477</value>
</property>
<property>
 <name>dfs.https.address</name>
 <value>0.0.0.0:50472</value>
</property>
<property>
 <name>mapred.job.tracker.http.address</name>
 <value>0.0.0.0:51130</value>
 <description>
   The job tracker http server address and port the server will listen on.
   If the port is 0 then the server will start on a free port.
 </description>
</property>
<property>
 <name>mapred.task.tracker.http.address</name>
 <value>0.0.0.0:51160</value>
 <description>
   The task tracker http server address and port.
   If the port is 0 then the server will start on a free port.
 </description>
</property>
<property>
 <name>mapred.map.tasks</name>
 <value>3</value>
</property>
<property>
 <name>mapred.reduce.tasks</name>
 <value>2</value>
</property>
<property>
 <name>mapred.tasktracker.map.tasks.maximum</name>
 <value>4</value>
 <description>
       The maximum number of map tasks that will be run simultaneously by a
task tracker.
 </description>
</property>
<property>
 <name>dfs.name.dir</name>
<value>/data0/hbase/filesystem/dfs/name,/data1/hbase/filesystem/dfs/name,/data2/hbase/filesystem/dfs/name,/data3/hbase/filesystem/dfs/name</value>
</property>
<property>
 <name>dfs.data.dir</name>
<value>/data0/hbase/filesystem/dfs/data,/data1/hbase/filesystem/dfs/data,/data2/hbase/filesystem/dfs/data,/data3/hbase/filesystem/dfs/data</value>
</property>
<property>
 <name>fs.checkpoint.dir</name>
<value>/data0/hbase/filesystem/dfs/namesecondary,/data1/hbase/filesystem/dfs/namesecondary,/data2/hbase/filesystem/dfs/namesecondary,/data3/hbase/filesystem/dfs/namesecondary</value>
</property>
<property>
 <name>mapred.system.dir</name>
 <value>/data1/hbase/filesystem/mapred/system</value>
</property>
<property>
 <name>mapred.local.dir</name>
<value>/data0/hbase/filesystem/mapred/local,/data1/hbase/filesystem/mapred/local,/data2/hbase/filesystem/mapred/local,/data3/hbase/filesystem/mapred/local</value>
</property>
<property>
 <name>dfs.replication</name>
 <value>3</value>
</property>
<property>
 <name>hadoop.tmp.dir</name>
 <value>/data1/hbase/filesystem/tmp</value>
</property>
<property>
 <name>mapred.task.timeout</name>
 <value>3600000</value>
 <description>The number of milliseconds before a task will be
 terminated if it neither reads an input, writes an output, nor
 updates its status string.
 </description>
</property>
<property>
 <name>ipc.client.idlethreshold</name>
 <value>4000</value>
 <description>Defines the threshold number of connections after which
              connections will be inspected for idleness.
 </description>
</property>
<property>
 <name>ipc.client.connection.maxidletime</name>
 <value>120000</value>
 <description>The maximum time in msec after which a client will bring down
the
              connection to the server.
 </description>
</property>
<property>
 <value>-Xmx256m -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode</value>
</property>
</configuration>
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
 <property>
   <name>hbase.master</name>
   <value>192.168.33.204:62000</value>
   <description>The host and port that the HBase master runs at.
   A value of 'local' runs the master and a regionserver in
   a single process.
   </description>
 </property>
 <property>
   <name>hbase.rootdir</name>
   <value>hdfs://192.168.33.204:11004/hbase</value>
   <description>The directory shared by region servers.
   Should be fully-qualified to include the filesystem to use.
   E.g: hdfs://NAMENODE_SERVER:PORT/HBASE_ROOTDIR
   </description>
 </property>
 <property>
   <name>hbase.master.info.port</name>
   <value>62010</value>
   <description>The port for the hbase master web UI
   Set to -1 if you do not want the info server to run.
   </description>
 </property>
 <property>
   <name>hbase.master.info.bindAddress</name>
   <value>0.0.0.0</value>
   <description>The address for the hbase master web UI
   </description>
 </property>
 <property>
   <name>hbase.regionserver</name>
   <value>0.0.0.0:62020</value>
   <description>The host and port a HBase region server runs at.
   </description>
 </property>
 <property>
   <name>hbase.regionserver.info.port</name>
   <value>62030</value>
   <description>The port for the hbase regionserver web UI
   Set to -1 if you do not want the info server to run.
   </description>
 </property>
 <property>
   <name>hbase.regionserver.info.bindAddress</name>
   <value>0.0.0.0</value>
   <description>The address for the hbase regionserver web UI
   </description>
 </property>
 <property>
   <name>hbase.regionserver.handler.count</name>
   <value>20</value>
 </property>
 <property>
   <name>hbase.master.lease.period</name>
   <value>180000</value>
 </property>
</configuration>
   Here is a slice of the error log file on one of the failed
2009-04-13 15:20:26,077 FATAL
org.apache.hadoop.hbase.regionserver.HRegionServer: OutOfMemoryError,
aborting.
java.lang.OutOfMemoryError: Java heap space
2009-04-13 15:20:48,062 INFO
request=0, regions=121, stores=121, storefiles=5188, storefileIndexSize=195,
memcacheSize=214, usedHeap=4991, maxHeap=4991
2009-04-13 15:20:48,062 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
server on 62020
2009-04-13 15:20:48,063 INFO
regionserver/0:0:0:0:0:0:0:0:62020.logFlusher exiting
2009-04-13 15:20:48,201 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Stopping infoServer
2009-04-13 15:20:48,228 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
192.168.33.206:47754: output error
2009-04-13 15:20:48,229 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 5 on 62020 caught: java.nio.channels.ClosedByInterruptException
   at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
   at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
   at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
   at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
   at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
   at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
2009-04-13 15:20:48,229 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 5 on 62020: exiting
2009-04-13 15:20:48,297 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC
Server Responder
2009-04-13 15:20:48,552 INFO org.apache.zookeeper.ClientCnxn: Attempting
connection to server /192.168.33.204:2181
2009-04-13 15:20:48,552 WARN org.apache.zookeeper.ClientCnxn: Exception
java.io.IOException: TIMED OUT
   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:837)
2009-04-13 15:20:48,555 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
error: java.io.IOException: Server not running, aborting
java.io.IOException: Server not running, aborting
   at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2809)
   at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:2304)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
   at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
2009-04-13 15:20:48,561 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
192.168.33.208:47852: output error
2009-04-13 15:20:48,561 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
192.168.33.221:37020: output error
2009-04-13 15:20:48,561 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 0 on 62020 caught: java.nio.channels.ClosedByInterruptException
   at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
   at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
   at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
   at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
   at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
   at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
2009-04-13 15:20:48,561 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 0 on 62020: exiting
2009-04-13 15:20:48,561 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 7 on 62020 caught: java.nio.channels.ClosedByInterruptException
   at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
   at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
   at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
   at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
   at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
   at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
2009-04-13 15:20:48,655 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 7 on 62020: exiting
2009-04-13 15:20:48,692 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
output error
2009-04-13 15:20:48,877 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
output error
2009-04-13 15:20:48,877 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 16 on 62020 caught: java.nio.channels.ClosedByInterruptException
   at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
   at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
   at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
   at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
   at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
   at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
2009-04-13 15:20:48,877 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 16 on 62020: exiting
2009-04-13 15:20:48,877 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
192.168.33.236:45479: output error
2009-04-13 15:20:49,008 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 17 on 62020 caught: java.nio.channels.ClosedByInterruptException
   at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
   at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
   at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
   at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
   at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
   at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
2009-04-13 15:20:49,008 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 17 on 62020: exiting
2009-04-13 15:20:48,654 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
192.168.33.219:40059: output error
processNode /hugetable09/hugetable/acl.lock error!KeeperErrorCode =
ConnectionLoss
2009-04-13 15:20:48,649 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
192.168.33.254:51617: output error
2009-04-13 15:20:48,649 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
192.168.33.238:51231: error: java.io.IOException: Server not running,
aborting
java.io.IOException: Server not running, aborting
   at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2809)
   at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:2304)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
   at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
2009-04-13 15:20:48,648 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
192.168.33.209:43520: output error
2009-04-13 15:20:49,225 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 4 on 62020 caught: java.nio.channels.ClosedByInterruptException
   at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
   at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
   at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
   at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
   at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
   at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
2009-04-13 15:20:49,226 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 4 on 62020: exiting
2009-04-13 15:20:48,648 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
output error
2009-04-13 15:20:48,647 INFO org.mortbay.util.ThreadedServer: Stopping
Acceptor ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=62030]
2009-04-13 15:20:49,266 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 9 on 62020 caught: java.nio.channels.ClosedByInterruptException
   at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
   at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
   at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
   at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
   at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
   at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
2009-04-13 15:20:49,266 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 9 on 62020: exiting
2009-04-13 15:20:48,646 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
192.168.33.210:44154: error: java.io.IOException: Server not running,
aborting
java.io.IOException: Server not running, aborting
   at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2809)
   at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:2304)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
   at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
2009-04-13 15:20:48,572 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
192.168.33.217:60476: output error
2009-04-13 15:20:49,272 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
192.168.33.210:44154: output error
2009-04-13 15:20:49,272 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 8 on 62020 caught: java.nio.channels.ClosedByInterruptException
   at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
   at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
   at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
   at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
   at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
   at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
2009-04-13 15:20:49,272 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 8 on 62020: exiting
2009-04-13 15:20:49,263 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
192.168.33.238:51231: output error
2009-04-13 15:20:49,225 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 1 on 62020 caught: java.nio.channels.ClosedByInterruptException
   at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
   at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
   at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
   at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
   at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
   at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
2009-04-13 15:20:49,068 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 14 on 62020 caught: java.nio.channels.ClosedByInterruptException
   at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
   at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
   at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
   at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
   at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
   at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
2009-04-13 15:20:49,345 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 14 on 62020: exiting
2009-04-13 15:20:49,048 ERROR
java.lang.OutOfMemoryError: Java heap space
2009-04-13 15:20:49,484 FATAL
org.apache.hadoop.hbase.regionserver.HRegionServer: OutOfMemoryError,
aborting.
java.lang.OutOfMemoryError: Java heap space
   at
java.util.concurrent.ConcurrentHashMap$Values.iterator(ConcurrentHashMap.java:1187)
   at
org.apache.hadoop.hbase.regionserver.HRegionServer.getGlobalMemcacheSize(HRegionServer.java:2863)
   at
org.apache.hadoop.hbase.regionserver.MemcacheFlusher.reclaimMemcacheMemory(MemcacheFlusher.java:260)
   at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:2307)
   at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
   at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
   at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
2009-04-13 15:20:49,488 INFO
request=0, regions=121, stores=121, storefiles=5188, storefileIndexSize=195,
memcacheSize=214, usedHeap=4985, maxHeap=4991
2009-04-13 15:20:49,489 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
error: java.io.IOException: java.lang.OutOfMemoryError: Java heap space
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
   at
org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1334)
   at
org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1324)
   at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:2320)
   at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
   at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
   at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
Caused by: java.lang.OutOfMemoryError: Java heap space
   at
java.util.concurrent.ConcurrentHashMap$Values.iterator(ConcurrentHashMap.java:1187)
   at
org.apache.hadoop.hbase.regionserver.HRegionServer.getGlobalMemcacheSize(HRegionServer.java:2863)
   at
org.apache.hadoop.hbase.regionserver.MemcacheFlusher.reclaimMemcacheMemory(MemcacheFlusher.java:260)
   at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:2307)
   ... 5 more
2009-04-13 15:20:49,490 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
output error
2009-04-13 15:20:49,047 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC
Server listener on 62020
2009-04-13 15:20:49,493 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 15 on 62020 caught: java.nio.channels.ClosedChannelException
   at
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)
   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
   at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
   at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
   at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
   at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
   at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
   Any suggenstion is welcomed! Thanks a lot!
11 Nov.
2009-04-14 02:05:18 UTC
Permalink
hi Jean-Daniel,
As you said, we were inserting data using sequential pattern, and if we
use random pattern there would not be such prolem.
I'm trying hbase 0.19.1 and the patch now.
Thanks!
Post by Jean-Daniel Cryans
I see that your region server had 5188 store files in 121 store, I'm
99% sure that it's the cause of your OOME. Luckily for you, we've been
- Upgrade to HBase 0.19.1
- Apply the latest patch in
https://issues.apache.org/jira/browse/HBASE-1058 (the v3)
Then you should be good. As to what caused this huge number of store
files, I wouldn't be surprised if your data was uploaded sequentially
so that would mean that whatever the number of regions (hence the
level of distribution) in your table, only 1 region gets the load.
This implies that another work around to your problem would be to
insert with a more randomized pattern.
Thx for trying either solution,
J-D
Post by 11 Nov.
hi coleagues,
We are doing data inserting on 32 nodes hbase cluster using mapreduce
framework recently, but the operation always gets failed because of
regionserver exceptions. We issued 4 map task on the same node
simultaneously, and exploit the BatchUpdate() function to handle work of
inserting data.
We had been suffered from such problem since last month, which only
took
Post by 11 Nov.
place on relatively large clusters at high concurrent inserting rate. We
are
Post by 11 Nov.
using hadoop-0.19.2 on current svn, and it's the head revision on svn
last
Post by 11 Nov.
week. We are using hbase 0.19.0.
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.33.204:11004/</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>192.168.33.204:11005</value>
</property>
<property>
<name>dfs.secondary.http.address</name>
<value>0.0.0.0:51100</value>
<description>
The secondary namenode http server address and port.
If the port is 0 then the server will start on a free port.
</description>
</property>
<property>
<name>dfs.datanode.address</name>
<value>0.0.0.0:51110</value>
<description>
The address where the datanode server will listen to.
If the port is 0 then the server will start on a free port.
</description>
</property>
<property>
<name>dfs.datanode.http.address</name>
<value>0.0.0.0:51175</value>
<description>
The datanode http server address and port.
If the port is 0 then the server will start on a free port.
</description>
</property>
<property>
<name>dfs.datanode.ipc.address</name>
<value>0.0.0.0:11010</value>
<description>
The datanode ipc server address and port.
If the port is 0 then the server will start on a free port.
</description>
</property>
<property>
<name>dfs.datanode.handler.count</name>
<value>30</value>
<description>The number of server threads for the
datanode.</description>
Post by 11 Nov.
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>30</value>
<description>The number of server threads for the
namenode.</description>
Post by 11 Nov.
</property>
<property>
<name>mapred.job.tracker.handler.count</name>
<value>30</value>
</property>
<property>
<name>mapred.reduce.parallel.copies</name>
<value>30</value>
</property>
<property>
<name>dfs.http.address</name>
<value>0.0.0.0:51170</value>
<description>
The address and the base port where the dfs namenode web ui will
listen
Post by 11 Nov.
on.
If the port is 0 then the server will start on a free port.
</description>
</property>
<property>
<name>dfs.datanode.max.xcievers</name>
<value>8192</value>
<description>
</description>
</property>
<property>
<name>dfs.datanode.socket.write.timeout</name>
<value>0</value>
<description>
</description>
</property>
<property>
<name>dfs.datanode.https.address</name>
<value>0.0.0.0:50477</value>
</property>
<property>
<name>dfs.https.address</name>
<value>0.0.0.0:50472</value>
</property>
<property>
<name>mapred.job.tracker.http.address</name>
<value>0.0.0.0:51130</value>
<description>
The job tracker http server address and port the server will listen
on.
Post by 11 Nov.
If the port is 0 then the server will start on a free port.
</description>
</property>
<property>
<name>mapred.task.tracker.http.address</name>
<value>0.0.0.0:51160</value>
<description>
The task tracker http server address and port.
If the port is 0 then the server will start on a free port.
</description>
</property>
<property>
<name>mapred.map.tasks</name>
<value>3</value>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>2</value>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>4</value>
<description>
The maximum number of map tasks that will be run simultaneously by
a
Post by 11 Nov.
task tracker.
</description>
</property>
<property>
<name>dfs.name.dir</name>
<value>/data0/hbase/filesystem/dfs/name,/data1/hbase/filesystem/dfs/name,/data2/hbase/filesystem/dfs/name,/data3/hbase/filesystem/dfs/name</value>
Post by 11 Nov.
</property>
<property>
<name>dfs.data.dir</name>
<value>/data0/hbase/filesystem/dfs/data,/data1/hbase/filesystem/dfs/data,/data2/hbase/filesystem/dfs/data,/data3/hbase/filesystem/dfs/data</value>
Post by 11 Nov.
</property>
<property>
<name>fs.checkpoint.dir</name>
<value>/data0/hbase/filesystem/dfs/namesecondary,/data1/hbase/filesystem/dfs/namesecondary,/data2/hbase/filesystem/dfs/namesecondary,/data3/hbase/filesystem/dfs/namesecondary</value>
Post by 11 Nov.
</property>
<property>
<name>mapred.system.dir</name>
<value>/data1/hbase/filesystem/mapred/system</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/data0/hbase/filesystem/mapred/local,/data1/hbase/filesystem/mapred/local,/data2/hbase/filesystem/mapred/local,/data3/hbase/filesystem/mapred/local</value>
Post by 11 Nov.
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/data1/hbase/filesystem/tmp</value>
</property>
<property>
<name>mapred.task.timeout</name>
<value>3600000</value>
<description>The number of milliseconds before a task will be
terminated if it neither reads an input, writes an output, nor
updates its status string.
</description>
</property>
<property>
<name>ipc.client.idlethreshold</name>
<value>4000</value>
<description>Defines the threshold number of connections after which
connections will be inspected for idleness.
</description>
</property>
<property>
<name>ipc.client.connection.maxidletime</name>
<value>120000</value>
<description>The maximum time in msec after which a client will bring
down
Post by 11 Nov.
the
connection to the server.
</description>
</property>
<property>
<value>-Xmx256m -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode</value>
</property>
</configuration>
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hbase.master</name>
<value>192.168.33.204:62000</value>
<description>The host and port that the HBase master runs at.
A value of 'local' runs the master and a regionserver in
a single process.
</description>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://192.168.33.204:11004/hbase</value>
<description>The directory shared by region servers.
Should be fully-qualified to include the filesystem to use.
E.g: hdfs://NAMENODE_SERVER:PORT/HBASE_ROOTDIR
</description>
</property>
<property>
<name>hbase.master.info.port</name>
<value>62010</value>
<description>The port for the hbase master web UI
Set to -1 if you do not want the info server to run.
</description>
</property>
<property>
<name>hbase.master.info.bindAddress</name>
<value>0.0.0.0</value>
<description>The address for the hbase master web UI
</description>
</property>
<property>
<name>hbase.regionserver</name>
<value>0.0.0.0:62020</value>
<description>The host and port a HBase region server runs at.
</description>
</property>
<property>
<name>hbase.regionserver.info.port</name>
<value>62030</value>
<description>The port for the hbase regionserver web UI
Set to -1 if you do not want the info server to run.
</description>
</property>
<property>
<name>hbase.regionserver.info.bindAddress</name>
<value>0.0.0.0</value>
<description>The address for the hbase regionserver web UI
</description>
</property>
<property>
<name>hbase.regionserver.handler.count</name>
<value>20</value>
</property>
<property>
<name>hbase.master.lease.period</name>
<value>180000</value>
</property>
</configuration>
Here is a slice of the error log file on one of the failed
2009-04-13 15:20:26,077 FATAL
org.apache.hadoop.hbase.regionserver.HRegionServer: OutOfMemoryError,
aborting.
java.lang.OutOfMemoryError: Java heap space
2009-04-13 15:20:48,062 INFO
request=0, regions=121, stores=121, storefiles=5188,
storefileIndexSize=195,
Post by 11 Nov.
memcacheSize=214, usedHeap=4991, maxHeap=4991
2009-04-13 15:20:48,062 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
server on 62020
2009-04-13 15:20:48,063 INFO
regionserver/0:0:0:0:0:0:0:0:62020.logFlusher exiting
2009-04-13 15:20:48,201 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Stopping infoServer
2009-04-13 15:20:48,228 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
192.168.33.206:47754: output error
2009-04-13 15:20:48,229 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 5 on 62020 caught: java.nio.channels.ClosedByInterruptException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
Post by 11 Nov.
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
2009-04-13 15:20:48,229 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 5 on 62020: exiting
2009-04-13 15:20:48,297 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
IPC
Post by 11 Nov.
Server Responder
2009-04-13 15:20:48,552 INFO org.apache.zookeeper.ClientCnxn: Attempting
connection to server /192.168.33.204:2181
2009-04-13 15:20:48,552 WARN org.apache.zookeeper.ClientCnxn: Exception
java.io.IOException: TIMED OUT
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:837)
2009-04-13 15:20:48,555 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
error: java.io.IOException: Server not running, aborting
java.io.IOException: Server not running, aborting
at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2809)
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:2304)
Post by 11 Nov.
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
Post by 11 Nov.
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
Post by 11 Nov.
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
2009-04-13 15:20:48,561 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
192.168.33.208:47852: output error
2009-04-13 15:20:48,561 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
192.168.33.221:37020: output error
2009-04-13 15:20:48,561 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 0 on 62020 caught: java.nio.channels.ClosedByInterruptException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
Post by 11 Nov.
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
2009-04-13 15:20:48,561 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 0 on 62020: exiting
2009-04-13 15:20:48,561 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 7 on 62020 caught: java.nio.channels.ClosedByInterruptException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
Post by 11 Nov.
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
2009-04-13 15:20:48,655 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 7 on 62020: exiting
2009-04-13 15:20:48,692 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
output error
2009-04-13 15:20:48,877 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
output error
2009-04-13 15:20:48,877 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 16 on 62020 caught: java.nio.channels.ClosedByInterruptException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
Post by 11 Nov.
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
2009-04-13 15:20:48,877 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 16 on 62020: exiting
2009-04-13 15:20:48,877 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
192.168.33.236:45479: output error
2009-04-13 15:20:49,008 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 17 on 62020 caught: java.nio.channels.ClosedByInterruptException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
Post by 11 Nov.
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
2009-04-13 15:20:49,008 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 17 on 62020: exiting
2009-04-13 15:20:48,654 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
192.168.33.219:40059: output error
processNode /hugetable09/hugetable/acl.lock error!KeeperErrorCode =
ConnectionLoss
2009-04-13 15:20:48,649 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
192.168.33.254:51617: output error
2009-04-13 15:20:48,649 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
192.168.33.238:51231: error: java.io.IOException: Server not running,
aborting
java.io.IOException: Server not running, aborting
at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2809)
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:2304)
Post by 11 Nov.
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
Post by 11 Nov.
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
Post by 11 Nov.
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
2009-04-13 15:20:48,648 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
192.168.33.209:43520: output error
2009-04-13 15:20:49,225 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 4 on 62020 caught: java.nio.channels.ClosedByInterruptException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
Post by 11 Nov.
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
2009-04-13 15:20:49,226 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 4 on 62020: exiting
2009-04-13 15:20:48,648 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
output error
2009-04-13 15:20:48,647 INFO org.mortbay.util.ThreadedServer: Stopping
Acceptor ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=62030]
2009-04-13 15:20:49,266 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 9 on 62020 caught: java.nio.channels.ClosedByInterruptException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
Post by 11 Nov.
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
2009-04-13 15:20:49,266 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 9 on 62020: exiting
2009-04-13 15:20:48,646 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
192.168.33.210:44154: error: java.io.IOException: Server not running,
aborting
java.io.IOException: Server not running, aborting
at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2809)
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:2304)
Post by 11 Nov.
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
Post by 11 Nov.
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
Post by 11 Nov.
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
2009-04-13 15:20:48,572 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
192.168.33.217:60476: output error
2009-04-13 15:20:49,272 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
192.168.33.210:44154: output error
2009-04-13 15:20:49,272 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 8 on 62020 caught: java.nio.channels.ClosedByInterruptException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
Post by 11 Nov.
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
2009-04-13 15:20:49,272 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 8 on 62020: exiting
2009-04-13 15:20:49,263 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
192.168.33.238:51231: output error
2009-04-13 15:20:49,225 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 1 on 62020 caught: java.nio.channels.ClosedByInterruptException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
Post by 11 Nov.
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
2009-04-13 15:20:49,068 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 14 on 62020 caught: java.nio.channels.ClosedByInterruptException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
Post by 11 Nov.
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
2009-04-13 15:20:49,345 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 14 on 62020: exiting
2009-04-13 15:20:49,048 ERROR
java.lang.OutOfMemoryError: Java heap space
2009-04-13 15:20:49,484 FATAL
org.apache.hadoop.hbase.regionserver.HRegionServer: OutOfMemoryError,
aborting.
java.lang.OutOfMemoryError: Java heap space
at
java.util.concurrent.ConcurrentHashMap$Values.iterator(ConcurrentHashMap.java:1187)
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.getGlobalMemcacheSize(HRegionServer.java:2863)
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.MemcacheFlusher.reclaimMemcacheMemory(MemcacheFlusher.java:260)
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:2307)
Post by 11 Nov.
at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
Post by 11 Nov.
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
2009-04-13 15:20:49,488 INFO
request=0, regions=121, stores=121, storefiles=5188,
storefileIndexSize=195,
Post by 11 Nov.
memcacheSize=214, usedHeap=4985, maxHeap=4991
2009-04-13 15:20:49,489 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
error: java.io.IOException: java.lang.OutOfMemoryError: Java heap space
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
at
org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1334)
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1324)
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:2320)
Post by 11 Nov.
at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
Post by 11 Nov.
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
Caused by: java.lang.OutOfMemoryError: Java heap space
at
java.util.concurrent.ConcurrentHashMap$Values.iterator(ConcurrentHashMap.java:1187)
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.getGlobalMemcacheSize(HRegionServer.java:2863)
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.MemcacheFlusher.reclaimMemcacheMemory(MemcacheFlusher.java:260)
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:2307)
Post by 11 Nov.
... 5 more
2009-04-13 15:20:49,490 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
output error
2009-04-13 15:20:49,047 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
IPC
Post by 11 Nov.
Server listener on 62020
2009-04-13 15:20:49,493 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 15 on 62020 caught: java.nio.channels.ClosedChannelException
at
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
Any suggenstion is welcomed! Thanks a lot!
11 Nov.
2009-04-14 08:41:57 UTC
Permalink
hi JD,
I tried your solution by upgrading hbase to 0.19.1 and applying the
patch. The inserting mapreduce application has been running for more than
half an hour, we lost one region server and here is the log on the lost
region server:

2009-04-14 16:08:11,483 FATAL
org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Replay of hlog
required. Forcing server shutdown
org.apache.hadoop.hbase.DroppedSnapshotException: region:
CDR,000220285104,1239696381168
at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:897)
at
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:790)
at
org.apache.hadoop.hbase.regionserver.MemcacheFlusher.flushRegion(MemcacheFlusher.java:228)
at
org.apache.hadoop.hbase.regionserver.MemcacheFlusher.run(MemcacheFlusher.java:138)
Caused by: java.lang.ClassCastException: [B cannot be cast to
org.apache.hadoop.hbase.HStoreKey
at
org.apache.hadoop.hbase.regionserver.HStore.internalFlushCache(HStore.java:679)
at
org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:636)
at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:882)
... 3 more
2009-04-14 16:08:11,553 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
request=0.0, regions=13, stores=13, storefiles=63, storefileIndexSize=6,
memcacheSize=206, usedHeap=631, maxHeap=4991
2009-04-14 16:08:11,553 INFO
org.apache.hadoop.hbase.regionserver.MemcacheFlusher:
regionserver/0:0:0:0:0:0:0:0:62020.cacheFlusher exiting
2009-04-14 16:08:12,502 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 14 on 62020, call batchUpdates([***@7075ae,
[Lorg.apache.hadoop.hbase.io.BatchUpdate;@573df2bb) from
192.168.33.211:33093: error: java.io.IOException: Server not running,
aborting
java.io.IOException: Server not running, aborting
at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2109)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:1618)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:912)
2009-04-14 16:08:12,502 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 14 on 62020, call batchUpdates([***@240affbc,
[Lorg.apache.hadoop.hbase.io.BatchUpdate;@4e1ba220) from
192.168.33.212:48018: error: java.io.IOException: Server not running,
aborting
java.io.IOException: Server not running, aborting
at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2109)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:1618)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:912)
2009-04-14 16:08:12,502 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 14 on 62020, call batchUpdates([***@78310aef,
[Lorg.apache.hadoop.hbase.io.BatchUpdate;@5bc50e8e) from
192.168.33.253:48798: error: java.io.IOException: Server not running,
aborting
java.io.IOException: Server not running, aborting
at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2109)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:1618)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:912)
2009-04-14 16:08:12,503 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 14 on 62020, call batchUpdates([***@663ebbb3,
[Lorg.apache.hadoop.hbase.io.BatchUpdate;@20951936) from 192.168.34.2:52907:
error: java.io.IOException: Server not running, aborting
java.io.IOException: Server not running, aborting
at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2109)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:1618)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:912)
2009-04-14 16:08:12,503 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 14 on 62020, call batchUpdates([***@1caa38f0,
[Lorg.apache.hadoop.hbase.io.BatchUpdate;@6b802343) from
192.168.33.238:34167: error: java.io.IOException: Server not running,
aborting
java.io.IOException: Server not running, aborting
at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2109)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:1618)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:912)
2009-04-14 16:08:12,503 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 14 on 62020, call batchUpdates([***@298b3ad8,
[Lorg.apache.hadoop.hbase.io.BatchUpdate;@73c45036) from
192.168.33.236:45877: error: java.io.IOException: Server not running,
aborting
java.io.IOException: Server not running, aborting
at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2109)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:1618)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:912)
2009-04-14 16:08:12,503 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 14 on 62020, call batchUpdates([***@5d6e449a,
[Lorg.apache.hadoop.hbase.io.BatchUpdate;@725a0a61) from
192.168.33.254:35363: error: java.io.IOException: Server not running,
aborting
java.io.IOException: Server not running, aborting
at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2109)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:1618)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:912)
2009-04-14 16:08:13,370 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
server on 62020
2009-04-14 16:08:13,370 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 5 on 62020: exiting
2009-04-14 16:08:13,370 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Stopping infoServer
2009-04-14 16:08:13,370 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 16 on 62020: exiting
2009-04-14 16:08:13,370 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 3 on 62020: exiting
2009-04-14 16:08:13,370 INFO org.mortbay.util.ThreadedServer: Stopping
Acceptor ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=62030]
2009-04-14 16:08:13,371 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC
Server listener on 62020
2009-04-14 16:08:13,371 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 1 on 62020: exiting
2009-04-14 16:08:13,371 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 2 on 62020: exiting
2009-04-14 16:08:13,371 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 4 on 62020: exiting
2009-04-14 16:08:13,371 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 6 on 62020: exiting
2009-04-14 16:08:13,371 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 10 on 62020: exiting
2009-04-14 16:08:13,371 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 8 on 62020: exiting
2009-04-14 16:08:13,370 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 0 on 62020: exiting
2009-04-14 16:08:13,371 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 11 on 62020: exiting
2009-04-14 16:08:13,372 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 9 on 62020: exiting
2009-04-14 16:08:13,372 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 13 on 62020: exiting
2009-04-14 16:08:13,372 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 14 on 62020: exiting
2009-04-14 16:08:13,372 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 15 on 62020: exiting
2009-04-14 16:08:13,372 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 17 on 62020: exiting
2009-04-14 16:08:13,372 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 19 on 62020: exiting
2009-04-14 16:08:13,372 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 18 on 62020: exiting
2009-04-14 16:08:13,372 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 12 on 62020: exiting
2009-04-14 16:08:13,372 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC
Server Responder
2009-04-14 16:08:13,372 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 7 on 62020: exiting
2009-04-14 16:08:13,464 INFO org.mortbay.http.SocketListener: Stopped
SocketListener on 0.0.0.0:62030
2009-04-14 16:08:13,471 INFO org.mortbay.util.Container: Stopped
HttpContext[/logs,/logs]
2009-04-14 16:08:13,471 INFO org.mortbay.util.Container: Stopped
***@460c5e9c
2009-04-14 16:08:14,887 INFO
org.apache.hadoop.hbase.regionserver.LogFlusher:
regionserver/0:0:0:0:0:0:0:0:62020.logFlusher exiting
2009-04-14 16:08:14,890 INFO org.apache.hadoop.hbase.Leases:
regionserver/0:0:0:0:0:0:0:0:62020.leaseChecker closing leases
2009-04-14 16:08:14,890 INFO org.mortbay.util.Container: Stopped
WebApplicationContext[/static,/static]
2009-04-14 16:08:14,890 INFO org.apache.hadoop.hbase.Leases:
regionserver/0:0:0:0:0:0:0:0:62020.leaseChecker closed leases
2009-04-14 16:08:14,890 INFO org.mortbay.util.Container: Stopped
***@62c2ee15
2009-04-14 16:08:14,896 INFO org.mortbay.util.Container: Stopped
WebApplicationContext[/,/]
2009-04-14 16:08:14,896 INFO org.mortbay.util.Container: Stopped
***@3f829e6f
2009-04-14 16:08:14,896 INFO org.apache.hadoop.hbase.regionserver.LogRoller:
LogRoller exiting.
2009-04-14 16:08:14,896 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker:
regionserver/0:0:0:0:0:0:0:0:62020.majorCompactionChecker exiting
2009-04-14 16:08:14,969 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: On abort, closed hlog
2009-04-14 16:08:14,969 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Closed CDR,000145028698,1239695232467
2009-04-14 16:08:14,970 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Closed CDR,000485488629,1239696366886
2009-04-14 16:08:14,970 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Closed CDR,000030226388,1239695919978
2009-04-14 16:08:14,971 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Closed CDR,000045007972,1239696394474
2009-04-14 16:08:14,971 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Closed CDR,000370014326,1239695407460
2009-04-14 16:08:17,790 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread exiting
2009-04-14 16:08:46,566 INFO org.apache.hadoop.hbase.regionserver.HRegion:
compaction completed on region CDR,000315256623,1239695638429 in 1mins, 3sec
2009-04-14 16:08:46,566 INFO
org.apache.hadoop.hbase.regionserver.CompactSplitThread:
regionserver/0:0:0:0:0:0:0:0:62020.compactor exiting
2009-04-14 16:08:46,567 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Closed CDR,000315256623,1239695638429
2009-04-14 16:08:46,568 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Closed CDR,000555259592,1239696091451
2009-04-14 16:08:46,569 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Closed CDR,000575345572,1239696111244
2009-04-14 16:08:46,570 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Closed CDR,000515619625,1239696375751
2009-04-14 16:08:46,570 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Closed CDR,000525154897,1239695988209
2009-04-14 16:08:46,570 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Closed CDR,000220285104,1239696381168
2009-04-14 16:08:46,571 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Closed CDR,000045190615,1239696394474
2009-04-14 16:08:46,572 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Closed CDR,000555161660,1239696091451
2009-04-14 16:08:46,572 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: aborting server at:
192.168.33.215:62020
2009-04-14 16:08:46,684 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer:
regionserver/0:0:0:0:0:0:0:0:62020 exiting
2009-04-14 16:08:46,713 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown
thread.
2009-04-14 16:08:46,714 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread complete


I restarted this region server and now it seems that it works just fine.
Post by 11 Nov.
hi Jean-Daniel,
As you said, we were inserting data using sequential pattern, and if we
use random pattern there would not be such prolem.
I'm trying hbase 0.19.1 and the patch now.
Thanks!
I see that your region server had 5188 store files in 121 store, I'm
Post by Jean-Daniel Cryans
99% sure that it's the cause of your OOME. Luckily for you, we've been
- Upgrade to HBase 0.19.1
- Apply the latest patch in
https://issues.apache.org/jira/browse/HBASE-1058 (the v3)
Then you should be good. As to what caused this huge number of store
files, I wouldn't be surprised if your data was uploaded sequentially
so that would mean that whatever the number of regions (hence the
level of distribution) in your table, only 1 region gets the load.
This implies that another work around to your problem would be to
insert with a more randomized pattern.
Thx for trying either solution,
J-D
Post by 11 Nov.
hi coleagues,
We are doing data inserting on 32 nodes hbase cluster using mapreduce
framework recently, but the operation always gets failed because of
regionserver exceptions. We issued 4 map task on the same node
simultaneously, and exploit the BatchUpdate() function to handle work of
inserting data.
We had been suffered from such problem since last month, which only
took
Post by 11 Nov.
place on relatively large clusters at high concurrent inserting rate. We
are
Post by 11 Nov.
using hadoop-0.19.2 on current svn, and it's the head revision on svn
last
Post by 11 Nov.
week. We are using hbase 0.19.0.
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.33.204:11004/</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>192.168.33.204:11005</value>
</property>
<property>
<name>dfs.secondary.http.address</name>
<value>0.0.0.0:51100</value>
<description>
The secondary namenode http server address and port.
If the port is 0 then the server will start on a free port.
</description>
</property>
<property>
<name>dfs.datanode.address</name>
<value>0.0.0.0:51110</value>
<description>
The address where the datanode server will listen to.
If the port is 0 then the server will start on a free port.
</description>
</property>
<property>
<name>dfs.datanode.http.address</name>
<value>0.0.0.0:51175</value>
<description>
The datanode http server address and port.
If the port is 0 then the server will start on a free port.
</description>
</property>
<property>
<name>dfs.datanode.ipc.address</name>
<value>0.0.0.0:11010</value>
<description>
The datanode ipc server address and port.
If the port is 0 then the server will start on a free port.
</description>
</property>
<property>
<name>dfs.datanode.handler.count</name>
<value>30</value>
<description>The number of server threads for the
datanode.</description>
Post by 11 Nov.
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>30</value>
<description>The number of server threads for the
namenode.</description>
Post by 11 Nov.
</property>
<property>
<name>mapred.job.tracker.handler.count</name>
<value>30</value>
</property>
<property>
<name>mapred.reduce.parallel.copies</name>
<value>30</value>
</property>
<property>
<name>dfs.http.address</name>
<value>0.0.0.0:51170</value>
<description>
The address and the base port where the dfs namenode web ui will
listen
Post by 11 Nov.
on.
If the port is 0 then the server will start on a free port.
</description>
</property>
<property>
<name>dfs.datanode.max.xcievers</name>
<value>8192</value>
<description>
</description>
</property>
<property>
<name>dfs.datanode.socket.write.timeout</name>
<value>0</value>
<description>
</description>
</property>
<property>
<name>dfs.datanode.https.address</name>
<value>0.0.0.0:50477</value>
</property>
<property>
<name>dfs.https.address</name>
<value>0.0.0.0:50472</value>
</property>
<property>
<name>mapred.job.tracker.http.address</name>
<value>0.0.0.0:51130</value>
<description>
The job tracker http server address and port the server will listen
on.
Post by 11 Nov.
If the port is 0 then the server will start on a free port.
</description>
</property>
<property>
<name>mapred.task.tracker.http.address</name>
<value>0.0.0.0:51160</value>
<description>
The task tracker http server address and port.
If the port is 0 then the server will start on a free port.
</description>
</property>
<property>
<name>mapred.map.tasks</name>
<value>3</value>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>2</value>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>4</value>
<description>
The maximum number of map tasks that will be run simultaneously
by a
Post by 11 Nov.
task tracker.
</description>
</property>
<property>
<name>dfs.name.dir</name>
<value>/data0/hbase/filesystem/dfs/name,/data1/hbase/filesystem/dfs/name,/data2/hbase/filesystem/dfs/name,/data3/hbase/filesystem/dfs/name</value>
Post by 11 Nov.
</property>
<property>
<name>dfs.data.dir</name>
<value>/data0/hbase/filesystem/dfs/data,/data1/hbase/filesystem/dfs/data,/data2/hbase/filesystem/dfs/data,/data3/hbase/filesystem/dfs/data</value>
Post by 11 Nov.
</property>
<property>
<name>fs.checkpoint.dir</name>
<value>/data0/hbase/filesystem/dfs/namesecondary,/data1/hbase/filesystem/dfs/namesecondary,/data2/hbase/filesystem/dfs/namesecondary,/data3/hbase/filesystem/dfs/namesecondary</value>
Post by 11 Nov.
</property>
<property>
<name>mapred.system.dir</name>
<value>/data1/hbase/filesystem/mapred/system</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/data0/hbase/filesystem/mapred/local,/data1/hbase/filesystem/mapred/local,/data2/hbase/filesystem/mapred/local,/data3/hbase/filesystem/mapred/local</value>
Post by 11 Nov.
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/data1/hbase/filesystem/tmp</value>
</property>
<property>
<name>mapred.task.timeout</name>
<value>3600000</value>
<description>The number of milliseconds before a task will be
terminated if it neither reads an input, writes an output, nor
updates its status string.
</description>
</property>
<property>
<name>ipc.client.idlethreshold</name>
<value>4000</value>
<description>Defines the threshold number of connections after which
connections will be inspected for idleness.
</description>
</property>
<property>
<name>ipc.client.connection.maxidletime</name>
<value>120000</value>
<description>The maximum time in msec after which a client will bring
down
Post by 11 Nov.
the
connection to the server.
</description>
</property>
<property>
<value>-Xmx256m -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode</value>
</property>
</configuration>
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hbase.master</name>
<value>192.168.33.204:62000</value>
<description>The host and port that the HBase master runs at.
A value of 'local' runs the master and a regionserver in
a single process.
</description>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://192.168.33.204:11004/hbase</value>
<description>The directory shared by region servers.
Should be fully-qualified to include the filesystem to use.
E.g: hdfs://NAMENODE_SERVER:PORT/HBASE_ROOTDIR
</description>
</property>
<property>
<name>hbase.master.info.port</name>
<value>62010</value>
<description>The port for the hbase master web UI
Set to -1 if you do not want the info server to run.
</description>
</property>
<property>
<name>hbase.master.info.bindAddress</name>
<value>0.0.0.0</value>
<description>The address for the hbase master web UI
</description>
</property>
<property>
<name>hbase.regionserver</name>
<value>0.0.0.0:62020</value>
<description>The host and port a HBase region server runs at.
</description>
</property>
<property>
<name>hbase.regionserver.info.port</name>
<value>62030</value>
<description>The port for the hbase regionserver web UI
Set to -1 if you do not want the info server to run.
</description>
</property>
<property>
<name>hbase.regionserver.info.bindAddress</name>
<value>0.0.0.0</value>
<description>The address for the hbase regionserver web UI
</description>
</property>
<property>
<name>hbase.regionserver.handler.count</name>
<value>20</value>
</property>
<property>
<name>hbase.master.lease.period</name>
<value>180000</value>
</property>
</configuration>
Here is a slice of the error log file on one of the failed
2009-04-13 15:20:26,077 FATAL
org.apache.hadoop.hbase.regionserver.HRegionServer: OutOfMemoryError,
aborting.
java.lang.OutOfMemoryError: Java heap space
2009-04-13 15:20:48,062 INFO
request=0, regions=121, stores=121, storefiles=5188,
storefileIndexSize=195,
Post by 11 Nov.
memcacheSize=214, usedHeap=4991, maxHeap=4991
2009-04-13 15:20:48,062 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
server on 62020
2009-04-13 15:20:48,063 INFO
regionserver/0:0:0:0:0:0:0:0:62020.logFlusher exiting
2009-04-13 15:20:48,201 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Stopping infoServer
2009-04-13 15:20:48,228 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
192.168.33.206:47754: output error
2009-04-13 15:20:48,229 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 5 on 62020 caught: java.nio.channels.ClosedByInterruptException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
Post by 11 Nov.
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
Post by 11 Nov.
2009-04-13 15:20:48,229 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 5 on 62020: exiting
2009-04-13 15:20:48,297 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
IPC
Post by 11 Nov.
Server Responder
2009-04-13 15:20:48,552 INFO org.apache.zookeeper.ClientCnxn: Attempting
connection to server /192.168.33.204:2181
2009-04-13 15:20:48,552 WARN org.apache.zookeeper.ClientCnxn: Exception
java.io.IOException: TIMED OUT
at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:837)
Post by 11 Nov.
2009-04-13 15:20:48,555 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
error: java.io.IOException: Server not running, aborting
java.io.IOException: Server not running, aborting
at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2809)
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:2304)
Post by 11 Nov.
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
Post by 11 Nov.
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
Post by 11 Nov.
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
Post by 11 Nov.
2009-04-13 15:20:48,561 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
192.168.33.208:47852: output error
2009-04-13 15:20:48,561 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
192.168.33.221:37020: output error
2009-04-13 15:20:48,561 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 0 on 62020 caught: java.nio.channels.ClosedByInterruptException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
Post by 11 Nov.
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
Post by 11 Nov.
2009-04-13 15:20:48,561 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 0 on 62020: exiting
2009-04-13 15:20:48,561 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 7 on 62020 caught: java.nio.channels.ClosedByInterruptException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
Post by 11 Nov.
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
Post by 11 Nov.
2009-04-13 15:20:48,655 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 7 on 62020: exiting
2009-04-13 15:20:48,692 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
output error
2009-04-13 15:20:48,877 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
output error
2009-04-13 15:20:48,877 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 16 on 62020 caught: java.nio.channels.ClosedByInterruptException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
Post by 11 Nov.
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
Post by 11 Nov.
2009-04-13 15:20:48,877 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 16 on 62020: exiting
2009-04-13 15:20:48,877 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
192.168.33.236:45479: output error
2009-04-13 15:20:49,008 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 17 on 62020 caught: java.nio.channels.ClosedByInterruptException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
Post by 11 Nov.
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
Post by 11 Nov.
2009-04-13 15:20:49,008 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 17 on 62020: exiting
2009-04-13 15:20:48,654 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
192.168.33.219:40059: output error
processNode /hugetable09/hugetable/acl.lock error!KeeperErrorCode =
ConnectionLoss
2009-04-13 15:20:48,649 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
192.168.33.254:51617: output error
2009-04-13 15:20:48,649 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
192.168.33.238:51231: error: java.io.IOException: Server not running,
aborting
java.io.IOException: Server not running, aborting
at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2809)
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:2304)
Post by 11 Nov.
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
Post by 11 Nov.
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
Post by 11 Nov.
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
Post by 11 Nov.
2009-04-13 15:20:48,648 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
192.168.33.209:43520: output error
2009-04-13 15:20:49,225 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 4 on 62020 caught: java.nio.channels.ClosedByInterruptException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
Post by 11 Nov.
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
Post by 11 Nov.
2009-04-13 15:20:49,226 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 4 on 62020: exiting
2009-04-13 15:20:48,648 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
output error
2009-04-13 15:20:48,647 INFO org.mortbay.util.ThreadedServer: Stopping
Acceptor ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=62030]
2009-04-13 15:20:49,266 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 9 on 62020 caught: java.nio.channels.ClosedByInterruptException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
Post by 11 Nov.
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
Post by 11 Nov.
2009-04-13 15:20:49,266 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 9 on 62020: exiting
2009-04-13 15:20:48,646 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
192.168.33.210:44154: error: java.io.IOException: Server not running,
aborting
java.io.IOException: Server not running, aborting
at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2809)
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:2304)
Post by 11 Nov.
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
Post by 11 Nov.
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
Post by 11 Nov.
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
Post by 11 Nov.
2009-04-13 15:20:48,572 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
192.168.33.217:60476: output error
2009-04-13 15:20:49,272 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
192.168.33.210:44154: output error
2009-04-13 15:20:49,272 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 8 on 62020 caught: java.nio.channels.ClosedByInterruptException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
Post by 11 Nov.
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
Post by 11 Nov.
2009-04-13 15:20:49,272 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 8 on 62020: exiting
2009-04-13 15:20:49,263 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
192.168.33.238:51231: output error
2009-04-13 15:20:49,225 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 1 on 62020 caught: java.nio.channels.ClosedByInterruptException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
Post by 11 Nov.
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
Post by 11 Nov.
2009-04-13 15:20:49,068 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 14 on 62020 caught: java.nio.channels.ClosedByInterruptException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
Post by 11 Nov.
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
Post by 11 Nov.
2009-04-13 15:20:49,345 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 14 on 62020: exiting
2009-04-13 15:20:49,048 ERROR
java.lang.OutOfMemoryError: Java heap space
2009-04-13 15:20:49,484 FATAL
org.apache.hadoop.hbase.regionserver.HRegionServer: OutOfMemoryError,
aborting.
java.lang.OutOfMemoryError: Java heap space
at
java.util.concurrent.ConcurrentHashMap$Values.iterator(ConcurrentHashMap.java:1187)
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.getGlobalMemcacheSize(HRegionServer.java:2863)
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.MemcacheFlusher.reclaimMemcacheMemory(MemcacheFlusher.java:260)
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:2307)
Post by 11 Nov.
at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
Post by 11 Nov.
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
Post by 11 Nov.
2009-04-13 15:20:49,488 INFO
request=0, regions=121, stores=121, storefiles=5188,
storefileIndexSize=195,
Post by 11 Nov.
memcacheSize=214, usedHeap=4985, maxHeap=4991
2009-04-13 15:20:49,489 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
error: java.io.IOException: java.lang.OutOfMemoryError: Java heap space
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
at
org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1334)
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1324)
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:2320)
Post by 11 Nov.
at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
Post by 11 Nov.
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
Post by 11 Nov.
Caused by: java.lang.OutOfMemoryError: Java heap space
at
java.util.concurrent.ConcurrentHashMap$Values.iterator(ConcurrentHashMap.java:1187)
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.getGlobalMemcacheSize(HRegionServer.java:2863)
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.MemcacheFlusher.reclaimMemcacheMemory(MemcacheFlusher.java:260)
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:2307)
Post by 11 Nov.
... 5 more
2009-04-13 15:20:49,490 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
output error
2009-04-13 15:20:49,047 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
IPC
Post by 11 Nov.
Server listener on 62020
2009-04-13 15:20:49,493 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 15 on 62020 caught: java.nio.channels.ClosedChannelException
at
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
Post by 11 Nov.
Any suggenstion is welcomed! Thanks a lot!
11 Nov.
2009-04-14 10:03:55 UTC
Permalink
Hi all,
The insert operation is still executing, but there are region servers
getting down now and then. The log info shows that they are shutdown for
different reasons. Here is another failed region server's log:

2009-04-14 16:17:08,718 INFO org.apache.hadoop.hbase.regionserver.HLog:
removing old log file
/hbase/log_192.168.33.213_1239694262099_62020/hlog.dat.1239696959813 whose
highest sequence/edit id is 122635282
2009-04-14 16:17:14,932 INFO org.apache.hadoop.hdfs.DFSClient:
org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
replicated
yet:/hbase/log_192.168.33.213_1239694262099_62020/hlog.dat.1239697028652
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1266)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)

at org.apache.hadoop.ipc.Client.call(Client.java:697)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at $Proxy1.addBlock(Unknown Source)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.addBlock(Unknown Source)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2823)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2705)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2182)

2009-04-14 16:17:14,932 WARN org.apache.hadoop.hdfs.DFSClient:
NotReplicatedYetException sleeping
/hbase/log_192.168.33.213_1239694262099_62020/hlog.dat.1239697028652 retries
left 4
2009-04-14 16:17:15,499 INFO org.apache.hadoop.hbase.regionserver.HLog:
Closed
hdfs://compute-11-5.local:11004/hbase/log_192.168.33.213_1239694262099_62020/hlog.dat.1239697021646,
entries=100003. New log writer:
/hbase/log_192.168.33.213_1239694262099_62020/hlog.dat.1239697035433

.................................


2009-04-14 17:18:44,259 WARN org.apache.hadoop.hdfs.DFSClient:
NotReplicatedYetException sleeping
/hbase/log_192.168.33.213_1239694262099_62020/hlog.dat.1239700723643 retries
left 4
2009-04-14 17:18:44,663 INFO org.apache.hadoop.hdfs.DFSClient:
org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
replicated
yet:/hbase/log_192.168.33.213_1239694262099_62020/hlog.dat.1239700723643
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1266)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)

at org.apache.hadoop.ipc.Client.call(Client.java:697)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at $Proxy1.addBlock(Unknown Source)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.addBlock(Unknown Source)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2823)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2705)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2182)

2009-04-14 17:18:44,663 WARN org.apache.hadoop.hdfs.DFSClient:
NotReplicatedYetException sleeping


There are 8 cores on each node, and we configured 4 map tasks to run
simultaneously. Are we running at too high comcurrent rate?
Post by 11 Nov.
hi JD,
I tried your solution by upgrading hbase to 0.19.1 and applying the
patch. The inserting mapreduce application has been running for more than
half an hour, we lost one region server and here is the log on the lost
2009-04-14 16:08:11,483 FATAL
org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Replay of hlog
required. Forcing server shutdown
CDR,000220285104,1239696381168
at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:897)
at
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:790)
at
org.apache.hadoop.hbase.regionserver.MemcacheFlusher.flushRegion(MemcacheFlusher.java:228)
at
org.apache.hadoop.hbase.regionserver.MemcacheFlusher.run(MemcacheFlusher.java:138)
Caused by: java.lang.ClassCastException: [B cannot be cast to
org.apache.hadoop.hbase.HStoreKey
at
org.apache.hadoop.hbase.regionserver.HStore.internalFlushCache(HStore.java:679)
at
org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:636)
at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:882)
... 3 more
2009-04-14 16:08:11,553 INFO
request=0.0, regions=13, stores=13, storefiles=63, storefileIndexSize=6,
memcacheSize=206, usedHeap=631, maxHeap=4991
2009-04-14 16:08:11,553 INFO
regionserver/0:0:0:0:0:0:0:0:62020.cacheFlusher exiting
2009-04-14 16:08:12,502 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
192.168.33.211:33093: error: java.io.IOException: Server not running,
aborting
java.io.IOException: Server not running, aborting
at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2109)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:1618)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:912)
2009-04-14 16:08:12,502 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
192.168.33.212:48018: error: java.io.IOException: Server not running,
aborting
java.io.IOException: Server not running, aborting
at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2109)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:1618)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:912)
2009-04-14 16:08:12,502 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
192.168.33.253:48798: error: java.io.IOException: Server not running,
aborting
java.io.IOException: Server not running, aborting
at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2109)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:1618)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:912)
2009-04-14 16:08:12,503 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
192.168.34.2:52907: error: java.io.IOException: Server not running,
aborting
java.io.IOException: Server not running, aborting
at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2109)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:1618)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:912)
2009-04-14 16:08:12,503 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
192.168.33.238:34167: error: java.io.IOException: Server not running,
aborting
java.io.IOException: Server not running, aborting
at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2109)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:1618)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:912)
2009-04-14 16:08:12,503 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
192.168.33.236:45877: error: java.io.IOException: Server not running,
aborting
java.io.IOException: Server not running, aborting
at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2109)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:1618)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:912)
2009-04-14 16:08:12,503 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
192.168.33.254:35363: error: java.io.IOException: Server not running,
aborting
java.io.IOException: Server not running, aborting
at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2109)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:1618)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:912)
2009-04-14 16:08:13,370 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
server on 62020
2009-04-14 16:08:13,370 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 5 on 62020: exiting
2009-04-14 16:08:13,370 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Stopping infoServer
2009-04-14 16:08:13,370 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 16 on 62020: exiting
2009-04-14 16:08:13,370 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 3 on 62020: exiting
2009-04-14 16:08:13,370 INFO org.mortbay.util.ThreadedServer: Stopping
Acceptor ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=62030]
2009-04-14 16:08:13,371 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
IPC Server listener on 62020
2009-04-14 16:08:13,371 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 1 on 62020: exiting
2009-04-14 16:08:13,371 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 2 on 62020: exiting
2009-04-14 16:08:13,371 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 4 on 62020: exiting
2009-04-14 16:08:13,371 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 6 on 62020: exiting
2009-04-14 16:08:13,371 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 10 on 62020: exiting
2009-04-14 16:08:13,371 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 8 on 62020: exiting
2009-04-14 16:08:13,370 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 0 on 62020: exiting
2009-04-14 16:08:13,371 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 11 on 62020: exiting
2009-04-14 16:08:13,372 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 9 on 62020: exiting
2009-04-14 16:08:13,372 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 13 on 62020: exiting
2009-04-14 16:08:13,372 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 14 on 62020: exiting
2009-04-14 16:08:13,372 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 15 on 62020: exiting
2009-04-14 16:08:13,372 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 17 on 62020: exiting
2009-04-14 16:08:13,372 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 19 on 62020: exiting
2009-04-14 16:08:13,372 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 18 on 62020: exiting
2009-04-14 16:08:13,372 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 12 on 62020: exiting
2009-04-14 16:08:13,372 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
IPC Server Responder
2009-04-14 16:08:13,372 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 7 on 62020: exiting
2009-04-14 16:08:13,464 INFO org.mortbay.http.SocketListener: Stopped
SocketListener on 0.0.0.0:62030
2009-04-14 16:08:13,471 INFO org.mortbay.util.Container: Stopped
HttpContext[/logs,/logs]
2009-04-14 16:08:13,471 INFO org.mortbay.util.Container: Stopped
2009-04-14 16:08:14,887 INFO
regionserver/0:0:0:0:0:0:0:0:62020.logFlusher exiting
regionserver/0:0:0:0:0:0:0:0:62020.leaseChecker closing leases
2009-04-14 16:08:14,890 INFO org.mortbay.util.Container: Stopped
WebApplicationContext[/static,/static]
regionserver/0:0:0:0:0:0:0:0:62020.leaseChecker closed leases
2009-04-14 16:08:14,890 INFO org.mortbay.util.Container: Stopped
2009-04-14 16:08:14,896 INFO org.mortbay.util.Container: Stopped
WebApplicationContext[/,/]
2009-04-14 16:08:14,896 INFO org.mortbay.util.Container: Stopped
2009-04-14 16:08:14,896 INFO
org.apache.hadoop.hbase.regionserver.LogRoller: LogRoller exiting.
2009-04-14 16:08:14,896 INFO
regionserver/0:0:0:0:0:0:0:0:62020.majorCompactionChecker exiting
2009-04-14 16:08:14,969 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: On abort, closed hlog
Closed CDR,000145028698,1239695232467
Closed CDR,000485488629,1239696366886
Closed CDR,000030226388,1239695919978
Closed CDR,000045007972,1239696394474
Closed CDR,000370014326,1239695407460
2009-04-14 16:08:17,790 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread exiting
compaction completed on region CDR,000315256623,1239695638429 in 1mins, 3sec
2009-04-14 16:08:46,566 INFO
regionserver/0:0:0:0:0:0:0:0:62020.compactor exiting
Closed CDR,000315256623,1239695638429
Closed CDR,000555259592,1239696091451
Closed CDR,000575345572,1239696111244
Closed CDR,000515619625,1239696375751
Closed CDR,000525154897,1239695988209
Closed CDR,000220285104,1239696381168
Closed CDR,000045190615,1239696394474
Closed CDR,000555161660,1239696091451
2009-04-14 16:08:46,572 INFO
192.168.33.215:62020
2009-04-14 16:08:46,684 INFO
regionserver/0:0:0:0:0:0:0:0:62020 exiting
2009-04-14 16:08:46,713 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown
thread.
2009-04-14 16:08:46,714 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread complete
I restarted this region server and now it seems that it works just fine.
hi Jean-Daniel,
Post by 11 Nov.
As you said, we were inserting data using sequential pattern, and if
we use random pattern there would not be such prolem.
I'm trying hbase 0.19.1 and the patch now.
Thanks!
I see that your region server had 5188 store files in 121 store, I'm
Post by Jean-Daniel Cryans
99% sure that it's the cause of your OOME. Luckily for you, we've been
- Upgrade to HBase 0.19.1
- Apply the latest patch in
https://issues.apache.org/jira/browse/HBASE-1058 (the v3)
Then you should be good. As to what caused this huge number of store
files, I wouldn't be surprised if your data was uploaded sequentially
so that would mean that whatever the number of regions (hence the
level of distribution) in your table, only 1 region gets the load.
This implies that another work around to your problem would be to
insert with a more randomized pattern.
Thx for trying either solution,
J-D
Post by 11 Nov.
hi coleagues,
We are doing data inserting on 32 nodes hbase cluster using
mapreduce
Post by 11 Nov.
framework recently, but the operation always gets failed because of
regionserver exceptions. We issued 4 map task on the same node
simultaneously, and exploit the BatchUpdate() function to handle work
of
Post by 11 Nov.
inserting data.
We had been suffered from such problem since last month, which only
took
Post by 11 Nov.
place on relatively large clusters at high concurrent inserting rate.
We are
Post by 11 Nov.
using hadoop-0.19.2 on current svn, and it's the head revision on svn
last
Post by 11 Nov.
week. We are using hbase 0.19.0.
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.33.204:11004/</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>192.168.33.204:11005</value>
</property>
<property>
<name>dfs.secondary.http.address</name>
<value>0.0.0.0:51100</value>
<description>
The secondary namenode http server address and port.
If the port is 0 then the server will start on a free port.
</description>
</property>
<property>
<name>dfs.datanode.address</name>
<value>0.0.0.0:51110</value>
<description>
The address where the datanode server will listen to.
If the port is 0 then the server will start on a free port.
</description>
</property>
<property>
<name>dfs.datanode.http.address</name>
<value>0.0.0.0:51175</value>
<description>
The datanode http server address and port.
If the port is 0 then the server will start on a free port.
</description>
</property>
<property>
<name>dfs.datanode.ipc.address</name>
<value>0.0.0.0:11010</value>
<description>
The datanode ipc server address and port.
If the port is 0 then the server will start on a free port.
</description>
</property>
<property>
<name>dfs.datanode.handler.count</name>
<value>30</value>
<description>The number of server threads for the
datanode.</description>
Post by 11 Nov.
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>30</value>
<description>The number of server threads for the
namenode.</description>
Post by 11 Nov.
</property>
<property>
<name>mapred.job.tracker.handler.count</name>
<value>30</value>
</property>
<property>
<name>mapred.reduce.parallel.copies</name>
<value>30</value>
</property>
<property>
<name>dfs.http.address</name>
<value>0.0.0.0:51170</value>
<description>
The address and the base port where the dfs namenode web ui will
listen
Post by 11 Nov.
on.
If the port is 0 then the server will start on a free port.
</description>
</property>
<property>
<name>dfs.datanode.max.xcievers</name>
<value>8192</value>
<description>
</description>
</property>
<property>
<name>dfs.datanode.socket.write.timeout</name>
<value>0</value>
<description>
</description>
</property>
<property>
<name>dfs.datanode.https.address</name>
<value>0.0.0.0:50477</value>
</property>
<property>
<name>dfs.https.address</name>
<value>0.0.0.0:50472</value>
</property>
<property>
<name>mapred.job.tracker.http.address</name>
<value>0.0.0.0:51130</value>
<description>
The job tracker http server address and port the server will listen
on.
Post by 11 Nov.
If the port is 0 then the server will start on a free port.
</description>
</property>
<property>
<name>mapred.task.tracker.http.address</name>
<value>0.0.0.0:51160</value>
<description>
The task tracker http server address and port.
If the port is 0 then the server will start on a free port.
</description>
</property>
<property>
<name>mapred.map.tasks</name>
<value>3</value>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>2</value>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>4</value>
<description>
The maximum number of map tasks that will be run simultaneously
by a
Post by 11 Nov.
task tracker.
</description>
</property>
<property>
<name>dfs.name.dir</name>
<value>/data0/hbase/filesystem/dfs/name,/data1/hbase/filesystem/dfs/name,/data2/hbase/filesystem/dfs/name,/data3/hbase/filesystem/dfs/name</value>
Post by 11 Nov.
</property>
<property>
<name>dfs.data.dir</name>
<value>/data0/hbase/filesystem/dfs/data,/data1/hbase/filesystem/dfs/data,/data2/hbase/filesystem/dfs/data,/data3/hbase/filesystem/dfs/data</value>
Post by 11 Nov.
</property>
<property>
<name>fs.checkpoint.dir</name>
<value>/data0/hbase/filesystem/dfs/namesecondary,/data1/hbase/filesystem/dfs/namesecondary,/data2/hbase/filesystem/dfs/namesecondary,/data3/hbase/filesystem/dfs/namesecondary</value>
Post by 11 Nov.
</property>
<property>
<name>mapred.system.dir</name>
<value>/data1/hbase/filesystem/mapred/system</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/data0/hbase/filesystem/mapred/local,/data1/hbase/filesystem/mapred/local,/data2/hbase/filesystem/mapred/local,/data3/hbase/filesystem/mapred/local</value>
Post by 11 Nov.
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/data1/hbase/filesystem/tmp</value>
</property>
<property>
<name>mapred.task.timeout</name>
<value>3600000</value>
<description>The number of milliseconds before a task will be
terminated if it neither reads an input, writes an output, nor
updates its status string.
</description>
</property>
<property>
<name>ipc.client.idlethreshold</name>
<value>4000</value>
<description>Defines the threshold number of connections after which
connections will be inspected for idleness.
</description>
</property>
<property>
<name>ipc.client.connection.maxidletime</name>
<value>120000</value>
<description>The maximum time in msec after which a client will bring
down
Post by 11 Nov.
the
connection to the server.
</description>
</property>
<property>
<value>-Xmx256m -XX:+UseConcMarkSweepGC
-XX:+CMSIncrementalMode</value>
Post by 11 Nov.
</property>
</configuration>
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hbase.master</name>
<value>192.168.33.204:62000</value>
<description>The host and port that the HBase master runs at.
A value of 'local' runs the master and a regionserver in
a single process.
</description>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://192.168.33.204:11004/hbase</value>
<description>The directory shared by region servers.
Should be fully-qualified to include the filesystem to use.
E.g: hdfs://NAMENODE_SERVER:PORT/HBASE_ROOTDIR
</description>
</property>
<property>
<name>hbase.master.info.port</name>
<value>62010</value>
<description>The port for the hbase master web UI
Set to -1 if you do not want the info server to run.
</description>
</property>
<property>
<name>hbase.master.info.bindAddress</name>
<value>0.0.0.0</value>
<description>The address for the hbase master web UI
</description>
</property>
<property>
<name>hbase.regionserver</name>
<value>0.0.0.0:62020</value>
<description>The host and port a HBase region server runs at.
</description>
</property>
<property>
<name>hbase.regionserver.info.port</name>
<value>62030</value>
<description>The port for the hbase regionserver web UI
Set to -1 if you do not want the info server to run.
</description>
</property>
<property>
<name>hbase.regionserver.info.bindAddress</name>
<value>0.0.0.0</value>
<description>The address for the hbase regionserver web UI
</description>
</property>
<property>
<name>hbase.regionserver.handler.count</name>
<value>20</value>
</property>
<property>
<name>hbase.master.lease.period</name>
<value>180000</value>
</property>
</configuration>
Here is a slice of the error log file on one of the failed
2009-04-13 15:20:26,077 FATAL
org.apache.hadoop.hbase.regionserver.HRegionServer: OutOfMemoryError,
aborting.
java.lang.OutOfMemoryError: Java heap space
2009-04-13 15:20:48,062 INFO
request=0, regions=121, stores=121, storefiles=5188,
storefileIndexSize=195,
Post by 11 Nov.
memcacheSize=214, usedHeap=4991, maxHeap=4991
Stopping
Post by 11 Nov.
server on 62020
2009-04-13 15:20:48,063 INFO
regionserver/0:0:0:0:0:0:0:0:62020.logFlusher exiting
2009-04-13 15:20:48,201 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Stopping infoServer
2009-04-13 15:20:48,228 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
192.168.33.206:47754: output error
2009-04-13 15:20:48,229 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 5 on 62020 caught: java.nio.channels.ClosedByInterruptException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
Post by 11 Nov.
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
Post by 11 Nov.
2009-04-13 15:20:48,229 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 5 on 62020: exiting
Stopping IPC
Post by 11 Nov.
Server Responder
Attempting
Post by 11 Nov.
connection to server /192.168.33.204:2181
2009-04-13 15:20:48,552 WARN org.apache.zookeeper.ClientCnxn: Exception
java.io.IOException: TIMED OUT
at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:837)
Post by 11 Nov.
2009-04-13 15:20:48,555 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
error: java.io.IOException: Server not running, aborting
java.io.IOException: Server not running, aborting
at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2809)
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:2304)
Post by 11 Nov.
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
Post by 11 Nov.
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
Post by 11 Nov.
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
Post by 11 Nov.
2009-04-13 15:20:48,561 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
192.168.33.208:47852: output error
2009-04-13 15:20:48,561 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
192.168.33.221:37020: output error
2009-04-13 15:20:48,561 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 0 on 62020 caught: java.nio.channels.ClosedByInterruptException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
Post by 11 Nov.
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
Post by 11 Nov.
2009-04-13 15:20:48,561 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 0 on 62020: exiting
2009-04-13 15:20:48,561 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 7 on 62020 caught: java.nio.channels.ClosedByInterruptException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
Post by 11 Nov.
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
Post by 11 Nov.
2009-04-13 15:20:48,655 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 7 on 62020: exiting
2009-04-13 15:20:48,692 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
output error
2009-04-13 15:20:48,877 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
output error
2009-04-13 15:20:48,877 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
java.nio.channels.ClosedByInterruptException
Post by 11 Nov.
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
Post by 11 Nov.
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
Post by 11 Nov.
2009-04-13 15:20:48,877 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 16 on 62020: exiting
2009-04-13 15:20:48,877 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
192.168.33.236:45479: output error
2009-04-13 15:20:49,008 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
java.nio.channels.ClosedByInterruptException
Post by 11 Nov.
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
Post by 11 Nov.
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
Post by 11 Nov.
2009-04-13 15:20:49,008 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 17 on 62020: exiting
2009-04-13 15:20:48,654 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
192.168.33.219:40059: output error
2009-04-13 15:20:48,654 ERROR
processNode /hugetable09/hugetable/acl.lock error!KeeperErrorCode =
ConnectionLoss
2009-04-13 15:20:48,649 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
192.168.33.254:51617: output error
2009-04-13 15:20:48,649 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
192.168.33.238:51231: error: java.io.IOException: Server not running,
aborting
java.io.IOException: Server not running, aborting
at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2809)
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:2304)
Post by 11 Nov.
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
Post by 11 Nov.
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
Post by 11 Nov.
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
Post by 11 Nov.
2009-04-13 15:20:48,648 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
192.168.33.209:43520: output error
2009-04-13 15:20:49,225 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 4 on 62020 caught: java.nio.channels.ClosedByInterruptException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
Post by 11 Nov.
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
Post by 11 Nov.
2009-04-13 15:20:49,226 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 4 on 62020: exiting
2009-04-13 15:20:48,648 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
output error
2009-04-13 15:20:48,647 INFO org.mortbay.util.ThreadedServer: Stopping
Acceptor ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=62030]
2009-04-13 15:20:49,266 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 9 on 62020 caught: java.nio.channels.ClosedByInterruptException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
Post by 11 Nov.
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
Post by 11 Nov.
2009-04-13 15:20:49,266 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 9 on 62020: exiting
2009-04-13 15:20:48,646 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
192.168.33.210:44154: error: java.io.IOException: Server not running,
aborting
java.io.IOException: Server not running, aborting
at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2809)
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:2304)
Post by 11 Nov.
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
Post by 11 Nov.
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
Post by 11 Nov.
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
Post by 11 Nov.
2009-04-13 15:20:48,572 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
192.168.33.217:60476: output error
2009-04-13 15:20:49,272 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
192.168.33.210:44154: output error
2009-04-13 15:20:49,272 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 8 on 62020 caught: java.nio.channels.ClosedByInterruptException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
Post by 11 Nov.
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
Post by 11 Nov.
2009-04-13 15:20:49,272 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 8 on 62020: exiting
2009-04-13 15:20:49,263 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
192.168.33.238:51231: output error
2009-04-13 15:20:49,225 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 1 on 62020 caught: java.nio.channels.ClosedByInterruptException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
Post by 11 Nov.
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
Post by 11 Nov.
2009-04-13 15:20:49,068 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
java.nio.channels.ClosedByInterruptException
Post by 11 Nov.
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
Post by 11 Nov.
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
Post by 11 Nov.
2009-04-13 15:20:49,345 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 14 on 62020: exiting
2009-04-13 15:20:49,048 ERROR
java.lang.OutOfMemoryError: Java heap space
2009-04-13 15:20:49,484 FATAL
org.apache.hadoop.hbase.regionserver.HRegionServer: OutOfMemoryError,
aborting.
java.lang.OutOfMemoryError: Java heap space
at
java.util.concurrent.ConcurrentHashMap$Values.iterator(ConcurrentHashMap.java:1187)
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.getGlobalMemcacheSize(HRegionServer.java:2863)
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.MemcacheFlusher.reclaimMemcacheMemory(MemcacheFlusher.java:260)
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:2307)
Post by 11 Nov.
at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
Post by 11 Nov.
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
Post by 11 Nov.
2009-04-13 15:20:49,488 INFO
request=0, regions=121, stores=121, storefiles=5188,
storefileIndexSize=195,
Post by 11 Nov.
memcacheSize=214, usedHeap=4985, maxHeap=4991
2009-04-13 15:20:49,489 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
error: java.io.IOException: java.lang.OutOfMemoryError: Java heap space
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
at
org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1334)
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1324)
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:2320)
Post by 11 Nov.
at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
Post by 11 Nov.
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
Post by 11 Nov.
Caused by: java.lang.OutOfMemoryError: Java heap space
at
java.util.concurrent.ConcurrentHashMap$Values.iterator(ConcurrentHashMap.java:1187)
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.getGlobalMemcacheSize(HRegionServer.java:2863)
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.MemcacheFlusher.reclaimMemcacheMemory(MemcacheFlusher.java:260)
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:2307)
Post by 11 Nov.
... 5 more
2009-04-13 15:20:49,490 WARN org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
output error
Stopping IPC
Post by 11 Nov.
Server listener on 62020
2009-04-13 15:20:49,493 INFO org.apache.hadoop.ipc.HBaseServer: IPC
Server
Post by 11 Nov.
handler 15 on 62020 caught: java.nio.channels.ClosedChannelException
at
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)
Post by 11 Nov.
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
Post by 11 Nov.
Any suggenstion is welcomed! Thanks a lot!
Andrew Purtell
2009-04-16 05:01:44 UTC
Permalink
Hi,

DFS trouble. Have you taken the recommended steps according
to this Wiki page:

http://wiki.apache.org/hadoop/Hbase/Troubleshooting

?

Try the steps for #5, #6, and #7.

And/or, try adding more data nodes to spread the load.

Hope that helps,

- Andy
Post by 11 Nov.
2009-04-14 16:17:08,718 INFO
removing old log file
/hbase/log_192.168.33.213_1239694262099_62020/hlog.dat.1239696959813 whose
highest sequence/edit id is 122635282
2009-04-14 16:17:14,932 INFO
org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
replicated
yet:/hbase/log_192.168.33.213_1239694262099_62020/hlog.dat.1239697028652
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1266)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown
Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
at
org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
at org.apache.hadoop.ipc.Client.call(Client.java:697)
at
org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at $Proxy1.addBlock(Unknown Source)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown
Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.addBlock(Unknown Source)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2823)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2705)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2182)
2009-04-14 16:17:14,932 WARN
NotReplicatedYetException sleeping
/hbase/log_192.168.33.213_1239694262099_62020/hlog.dat.1239697028652 retries
left 4
2009-04-14 16:17:15,499 INFO
Closed
hdfs://compute-11-5.local:11004/hbase/log_192.168.33.213_1239694262099_62020/hlog.dat.1239697021646,
/hbase/log_192.168.33.213_1239694262099_62020/hlog.dat.1239697035433
.................................
2009-04-14 17:18:44,259 WARN
NotReplicatedYetException sleeping
/hbase/log_192.168.33.213_1239694262099_62020/hlog.dat.1239700723643 retries
left 4
2009-04-14 17:18:44,663 INFO
org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
replicated
yet:/hbase/log_192.168.33.213_1239694262099_62020/hlog.dat.1239700723643
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1266)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown
Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
at
org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
at org.apache.hadoop.ipc.Client.call(Client.java:697)
at
org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at $Proxy1.addBlock(Unknown Source)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown
Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.addBlock(Unknown Source)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2823)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2705)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2182)
2009-04-14 17:18:44,663 WARN
NotReplicatedYetException sleeping
There are 8 cores on each node, and we configured 4 map
tasks to run
simultaneously. Are we running at too high comcurrent rate?
Post by 11 Nov.
hi JD,
I tried your solution by upgrading hbase to 0.19.1
and applying the
Post by 11 Nov.
patch. The inserting mapreduce application has been
running for more than
Post by 11 Nov.
half an hour, we lost one region server and here is
the log on the lost
Post by 11 Nov.
2009-04-14 16:08:11,483 FATAL
Replay of hlog
Post by 11 Nov.
required. Forcing server shutdown
CDR,000220285104,1239696381168
at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:897)
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:790)
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.MemcacheFlusher.flushRegion(MemcacheFlusher.java:228)
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.MemcacheFlusher.run(MemcacheFlusher.java:138)
Post by 11 Nov.
Caused by: java.lang.ClassCastException: [B cannot be
cast to
Post by 11 Nov.
org.apache.hadoop.hbase.HStoreKey
at
org.apache.hadoop.hbase.regionserver.HStore.internalFlushCache(HStore.java:679)
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:636)
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:882)
Post by 11 Nov.
... 3 more
2009-04-14 16:08:11,553 INFO
request=0.0, regions=13, stores=13, storefiles=63,
storefileIndexSize=6,
Post by 11 Nov.
memcacheSize=206, usedHeap=631, maxHeap=4991
2009-04-14 16:08:11,553 INFO
regionserver/0:0:0:0:0:0:0:0:62020.cacheFlusher
exiting
Post by 11 Nov.
2009-04-14 16:08:12,502 INFO
org.apache.hadoop.ipc.HBaseServer: IPC Server
from
Server not running,
Post by 11 Nov.
aborting
java.io.IOException: Server not running, aborting
at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2109)
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:1618)
Post by 11 Nov.
at
sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
Post by 11 Nov.
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
Post by 11 Nov.
at
java.lang.reflect.Method.invoke(Method.java:597)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:912)
Post by 11 Nov.
2009-04-14 16:08:12,502 INFO
org.apache.hadoop.ipc.HBaseServer: IPC Server
from
Server not running,
Post by 11 Nov.
aborting
java.io.IOException: Server not running, aborting
at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2109)
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:1618)
Post by 11 Nov.
at
sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
Post by 11 Nov.
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
Post by 11 Nov.
at
java.lang.reflect.Method.invoke(Method.java:597)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:912)
Post by 11 Nov.
2009-04-14 16:08:12,502 INFO
org.apache.hadoop.ipc.HBaseServer: IPC Server
from
Server not running,
Post by 11 Nov.
aborting
java.io.IOException: Server not running, aborting
at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2109)
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:1618)
Post by 11 Nov.
at
sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
Post by 11 Nov.
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
Post by 11 Nov.
at
java.lang.reflect.Method.invoke(Method.java:597)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:912)
Post by 11 Nov.
2009-04-14 16:08:12,503 INFO
org.apache.hadoop.ipc.HBaseServer: IPC Server
from
Post by 11 Nov.
192.168.34.2:52907: error: java.io.IOException: Server
not running,
Post by 11 Nov.
aborting
java.io.IOException: Server not running, aborting
at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2109)
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:1618)
Post by 11 Nov.
at
sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
Post by 11 Nov.
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
Post by 11 Nov.
at
java.lang.reflect.Method.invoke(Method.java:597)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:912)
Post by 11 Nov.
2009-04-14 16:08:12,503 INFO
org.apache.hadoop.ipc.HBaseServer: IPC Server
from
Server not running,
Post by 11 Nov.
aborting
java.io.IOException: Server not running, aborting
at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2109)
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:1618)
Post by 11 Nov.
at
sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
Post by 11 Nov.
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
Post by 11 Nov.
at
java.lang.reflect.Method.invoke(Method.java:597)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:912)
Post by 11 Nov.
2009-04-14 16:08:12,503 INFO
org.apache.hadoop.ipc.HBaseServer: IPC Server
from
Server not running,
Post by 11 Nov.
aborting
java.io.IOException: Server not running, aborting
at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2109)
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:1618)
Post by 11 Nov.
at
sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
Post by 11 Nov.
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
Post by 11 Nov.
at
java.lang.reflect.Method.invoke(Method.java:597)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:912)
Post by 11 Nov.
2009-04-14 16:08:12,503 INFO
org.apache.hadoop.ipc.HBaseServer: IPC Server
from
Server not running,
Post by 11 Nov.
aborting
java.io.IOException: Server not running, aborting
at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2109)
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:1618)
Post by 11 Nov.
at
sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
Post by 11 Nov.
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
Post by 11 Nov.
at
java.lang.reflect.Method.invoke(Method.java:597)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:912)
Post by 11 Nov.
2009-04-14 16:08:13,370 INFO
org.apache.hadoop.ipc.HBaseServer: Stopping
Post by 11 Nov.
server on 62020
2009-04-14 16:08:13,370 INFO
org.apache.hadoop.ipc.HBaseServer: IPC Server
Post by 11 Nov.
handler 5 on 62020: exiting
2009-04-14 16:08:13,370 INFO
Stopping infoServer
Post by 11 Nov.
2009-04-14 16:08:13,370 INFO
org.apache.hadoop.ipc.HBaseServer: IPC Server
Post by 11 Nov.
handler 16 on 62020: exiting
2009-04-14 16:08:13,370 INFO
org.apache.hadoop.ipc.HBaseServer: IPC Server
Post by 11 Nov.
handler 3 on 62020: exiting
2009-04-14 16:08:13,370 INFO
org.mortbay.util.ThreadedServer: Stopping
Post by 11 Nov.
Acceptor
ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=62030]
Post by 11 Nov.
2009-04-14 16:08:13,371 INFO
org.apache.hadoop.ipc.HBaseServer: Stopping
Post by 11 Nov.
IPC Server listener on 62020
2009-04-14 16:08:13,371 INFO
org.apache.hadoop.ipc.HBaseServer: IPC Server
Post by 11 Nov.
handler 1 on 62020: exiting
2009-04-14 16:08:13,371 INFO
org.apache.hadoop.ipc.HBaseServer: IPC Server
Post by 11 Nov.
handler 2 on 62020: exiting
2009-04-14 16:08:13,371 INFO
org.apache.hadoop.ipc.HBaseServer: IPC Server
Post by 11 Nov.
handler 4 on 62020: exiting
2009-04-14 16:08:13,371 INFO
org.apache.hadoop.ipc.HBaseServer: IPC Server
Post by 11 Nov.
handler 6 on 62020: exiting
2009-04-14 16:08:13,371 INFO
org.apache.hadoop.ipc.HBaseServer: IPC Server
Post by 11 Nov.
handler 10 on 62020: exiting
2009-04-14 16:08:13,371 INFO
org.apache.hadoop.ipc.HBaseServer: IPC Server
Post by 11 Nov.
handler 8 on 62020: exiting
2009-04-14 16:08:13,370 INFO
org.apache.hadoop.ipc.HBaseServer: IPC Server
Post by 11 Nov.
handler 0 on 62020: exiting
2009-04-14 16:08:13,371 INFO
org.apache.hadoop.ipc.HBaseServer: IPC Server
Post by 11 Nov.
handler 11 on 62020: exiting
2009-04-14 16:08:13,372 INFO
org.apache.hadoop.ipc.HBaseServer: IPC Server
Post by 11 Nov.
handler 9 on 62020: exiting
2009-04-14 16:08:13,372 INFO
org.apache.hadoop.ipc.HBaseServer: IPC Server
Post by 11 Nov.
handler 13 on 62020: exiting
2009-04-14 16:08:13,372 INFO
org.apache.hadoop.ipc.HBaseServer: IPC Server
Post by 11 Nov.
handler 14 on 62020: exiting
2009-04-14 16:08:13,372 INFO
org.apache.hadoop.ipc.HBaseServer: IPC Server
Post by 11 Nov.
handler 15 on 62020: exiting
2009-04-14 16:08:13,372 INFO
org.apache.hadoop.ipc.HBaseServer: IPC Server
Post by 11 Nov.
handler 17 on 62020: exiting
2009-04-14 16:08:13,372 INFO
org.apache.hadoop.ipc.HBaseServer: IPC Server
Post by 11 Nov.
handler 19 on 62020: exiting
2009-04-14 16:08:13,372 INFO
org.apache.hadoop.ipc.HBaseServer: IPC Server
Post by 11 Nov.
handler 18 on 62020: exiting
2009-04-14 16:08:13,372 INFO
org.apache.hadoop.ipc.HBaseServer: IPC Server
Post by 11 Nov.
handler 12 on 62020: exiting
2009-04-14 16:08:13,372 INFO
org.apache.hadoop.ipc.HBaseServer: Stopping
Post by 11 Nov.
IPC Server Responder
2009-04-14 16:08:13,372 INFO
org.apache.hadoop.ipc.HBaseServer: IPC Server
Post by 11 Nov.
handler 7 on 62020: exiting
2009-04-14 16:08:13,464 INFO
org.mortbay.http.SocketListener: Stopped
Post by 11 Nov.
SocketListener on 0.0.0.0:62030
2009-04-14 16:08:13,471 INFO
org.mortbay.util.Container: Stopped
Post by 11 Nov.
HttpContext[/logs,/logs]
2009-04-14 16:08:13,471 INFO
org.mortbay.util.Container: Stopped
Post by 11 Nov.
2009-04-14 16:08:14,887 INFO
regionserver/0:0:0:0:0:0:0:0:62020.logFlusher exiting
2009-04-14 16:08:14,890 INFO
regionserver/0:0:0:0:0:0:0:0:62020.leaseChecker
closing leases
Post by 11 Nov.
2009-04-14 16:08:14,890 INFO
org.mortbay.util.Container: Stopped
Post by 11 Nov.
WebApplicationContext[/static,/static]
2009-04-14 16:08:14,890 INFO
regionserver/0:0:0:0:0:0:0:0:62020.leaseChecker closed
leases
Post by 11 Nov.
2009-04-14 16:08:14,890 INFO
org.mortbay.util.Container: Stopped
Post by 11 Nov.
2009-04-14 16:08:14,896 INFO
org.mortbay.util.Container: Stopped
Post by 11 Nov.
WebApplicationContext[/,/]
2009-04-14 16:08:14,896 INFO
org.mortbay.util.Container: Stopped
Post by 11 Nov.
2009-04-14 16:08:14,896 INFO
LogRoller exiting.
Post by 11 Nov.
2009-04-14 16:08:14,896 INFO
regionserver/0:0:0:0:0:0:0:0:62020.majorCompactionChecker
exiting
Post by 11 Nov.
2009-04-14 16:08:14,969 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: On
abort, closed hlog
Post by 11 Nov.
2009-04-14 16:08:14,969 INFO
Closed CDR,000145028698,1239695232467
2009-04-14 16:08:14,970 INFO
Closed CDR,000485488629,1239696366886
2009-04-14 16:08:14,970 INFO
Closed CDR,000030226388,1239695919978
2009-04-14 16:08:14,971 INFO
Closed CDR,000045007972,1239696394474
2009-04-14 16:08:14,971 INFO
Closed CDR,000370014326,1239695407460
2009-04-14 16:08:17,790 INFO
worker thread exiting
Post by 11 Nov.
2009-04-14 16:08:46,566 INFO
compaction completed on region
CDR,000315256623,1239695638429 in 1mins, 3sec
Post by 11 Nov.
2009-04-14 16:08:46,566 INFO
regionserver/0:0:0:0:0:0:0:0:62020.compactor exiting
2009-04-14 16:08:46,567 INFO
Closed CDR,000315256623,1239695638429
2009-04-14 16:08:46,568 INFO
Closed CDR,000555259592,1239696091451
2009-04-14 16:08:46,569 INFO
Closed CDR,000575345572,1239696111244
2009-04-14 16:08:46,570 INFO
Closed CDR,000515619625,1239696375751
2009-04-14 16:08:46,570 INFO
Closed CDR,000525154897,1239695988209
2009-04-14 16:08:46,570 INFO
Closed CDR,000220285104,1239696381168
2009-04-14 16:08:46,571 INFO
Closed CDR,000045190615,1239696394474
2009-04-14 16:08:46,572 INFO
Closed CDR,000555161660,1239696091451
2009-04-14 16:08:46,572 INFO
192.168.33.215:62020
2009-04-14 16:08:46,684 INFO
regionserver/0:0:0:0:0:0:0:0:62020 exiting
2009-04-14 16:08:46,713 INFO
Starting shutdown
Post by 11 Nov.
thread.
2009-04-14 16:08:46,714 INFO
Shutdown thread complete
Post by 11 Nov.
I restarted this region server and now it seems
that it works just
Post by 11 Nov.
fine.
hi Jean-Daniel,
Post by 11 Nov.
As you said, we were inserting data using
sequential pattern, and if
Post by 11 Nov.
Post by 11 Nov.
we use random pattern there would not be such
prolem.
Post by 11 Nov.
Post by 11 Nov.
I'm trying hbase 0.19.1 and the patch now.
Thanks!
2009/4/13 Jean-Daniel Cryans
I see that your region server had 5188 store files
in 121 store, I'm
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
99% sure that it's the cause of your OOME.
Luckily for you, we've been
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
working on this issue since last week. What
- Upgrade to HBase 0.19.1
- Apply the latest patch in
https://issues.apache.org/jira/browse/HBASE-1058 (the v3)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Then you should be good. As to what caused
this huge number of store
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
files, I wouldn't be surprised if your
data was uploaded sequentially
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
so that would mean that whatever the number of
regions (hence the
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
level of distribution) in your table, only 1
region gets the load.
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
This implies that another work around to your
problem would be to
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
insert with a more randomized pattern.
Thx for trying either solution,
J-D
On Mon, Apr 13, 2009 at 8:28 AM, 11 Nov.
Post by 11 Nov.
hi coleagues,
We are doing data inserting on 32
nodes hbase cluster using
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
mapreduce
Post by 11 Nov.
framework recently, but the operation
always gets failed because of
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
regionserver exceptions. We issued 4 map
task on the same node
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
simultaneously, and exploit the
BatchUpdate() function to handle work
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
of
Post by 11 Nov.
inserting data.
We had been suffered from such problem
since last month, which only
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
took
Post by 11 Nov.
place on relatively large clusters at
high concurrent inserting rate.
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
We are
Post by 11 Nov.
using hadoop-0.19.2 on current svn, and
it's the head revision on svn
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
last
Post by 11 Nov.
week. We are using hbase 0.19.0.
Here is the configure file of
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.33.204:11004/</value>
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
</property>
<property>
<name>mapred.job.tracker</name>
<value>192.168.33.204:11005</value>
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
</property>
<property>
<name>dfs.secondary.http.address</name>
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
<value>0.0.0.0:51100</value>
<description>
The secondary namenode http server
address and port.
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
If the port is 0 then the server will
start on a free port.
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
</description>
</property>
<property>
<name>dfs.datanode.address</name>
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
<value>0.0.0.0:51110</value>
<description>
The address where the datanode server
will listen to.
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
If the port is 0 then the server will
start on a free port.
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
</description>
</property>
<property>
<name>dfs.datanode.http.address</name>
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
<value>0.0.0.0:51175</value>
<description>
The datanode http server address and
port.
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
If the port is 0 then the server will
start on a free port.
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
</description>
</property>
<property>
<name>dfs.datanode.ipc.address</name>
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
<value>0.0.0.0:11010</value>
<description>
The datanode ipc server address and
port.
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
If the port is 0 then the server will
start on a free port.
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
</description>
</property>
<property>
<name>dfs.datanode.handler.count</name>
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
<value>30</value>
<description>The number of server
threads for the
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
datanode.</description>
Post by 11 Nov.
</property>
<property>
<name>dfs.namenode.handler.count</name>
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
<value>30</value>
<description>The number of server
threads for the
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
namenode.</description>
Post by 11 Nov.
</property>
<property>
<name>mapred.job.tracker.handler.count</name>
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
<value>30</value>
</property>
<property>
<name>mapred.reduce.parallel.copies</name>
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
<value>30</value>
</property>
<property>
<name>dfs.http.address</name>
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
<value>0.0.0.0:51170</value>
<description>
The address and the base port where
the dfs namenode web ui will
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
listen
Post by 11 Nov.
on.
If the port is 0 then the server will
start on a free port.
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
</description>
</property>
<property>
<name>dfs.datanode.max.xcievers</name>
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
<value>8192</value>
<description>
</description>
</property>
<property>
<name>dfs.datanode.socket.write.timeout</name>
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
<value>0</value>
<description>
</description>
</property>
<property>
<name>dfs.datanode.https.address</name>
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
<value>0.0.0.0:50477</value>
</property>
<property>
<name>dfs.https.address</name>
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
<value>0.0.0.0:50472</value>
</property>
<property>
<name>mapred.job.tracker.http.address</name>
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
<value>0.0.0.0:51130</value>
<description>
The job tracker http server address
and port the server will listen
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
on.
Post by 11 Nov.
If the port is 0 then the server will
start on a free port.
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
</description>
</property>
<property>
<name>mapred.task.tracker.http.address</name>
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
<value>0.0.0.0:51160</value>
<description>
The task tracker http server address
and port.
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
If the port is 0 then the server will
start on a free port.
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
</description>
</property>
<property>
<name>mapred.map.tasks</name>
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
<value>3</value>
</property>
<property>
<name>mapred.reduce.tasks</name>
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
<value>2</value>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
<value>4</value>
<description>
The maximum number of map tasks
that will be run simultaneously
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
by a
Post by 11 Nov.
task tracker.
</description>
</property>
<property>
<name>dfs.name.dir</name>
<value>/data0/hbase/filesystem/dfs/name,/data1/hbase/filesystem/dfs/name,/data2/hbase/filesystem/dfs/name,/data3/hbase/filesystem/dfs/name</value>
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
</property>
<property>
<name>dfs.data.dir</name>
<value>/data0/hbase/filesystem/dfs/data,/data1/hbase/filesystem/dfs/data,/data2/hbase/filesystem/dfs/data,/data3/hbase/filesystem/dfs/data</value>
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
</property>
<property>
<name>fs.checkpoint.dir</name>
<value>/data0/hbase/filesystem/dfs/namesecondary,/data1/hbase/filesystem/dfs/namesecondary,/data2/hbase/filesystem/dfs/namesecondary,/data3/hbase/filesystem/dfs/namesecondary</value>
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
</property>
<property>
<name>mapred.system.dir</name>
<value>/data1/hbase/filesystem/mapred/system</value>
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
</property>
<property>
<name>mapred.local.dir</name>
<value>/data0/hbase/filesystem/mapred/local,/data1/hbase/filesystem/mapred/local,/data2/hbase/filesystem/mapred/local,/data3/hbase/filesystem/mapred/local</value>
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/data1/hbase/filesystem/tmp</value>
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
</property>
<property>
<name>mapred.task.timeout</name>
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
<value>3600000</value>
<description>The number of
milliseconds before a task will be
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
terminated if it neither reads an input,
writes an output, nor
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
updates its status string.
</description>
</property>
<property>
<name>ipc.client.idlethreshold</name>
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
<value>4000</value>
<description>Defines the threshold
number of connections after which
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
connections will be
inspected for idleness.
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
</description>
</property>
<property>
<name>ipc.client.connection.maxidletime</name>
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
<value>120000</value>
<description>The maximum time in
msec after which a client will bring
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
down
Post by 11 Nov.
the
connection to the server.
</description>
</property>
<property>
<value>-Xmx256m
-XX:+UseConcMarkSweepGC
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
-XX:+CMSIncrementalMode</value>
Post by 11 Nov.
</property>
</configuration>
And here is the hbase-site.xml config
<?xml version="1.0"?>
<?xml-stylesheet
type="text/xsl"
href="configuration.xsl"?>
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
<configuration>
<property>
<name>hbase.master</name>
<value>192.168.33.204:62000</value>
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
<description>The host and port
that the HBase master runs at.
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
A value of 'local' runs the
master and a regionserver in
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
a single process.
</description>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://192.168.33.204:11004/hbase</value>
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
<description>The directory
shared by region servers.
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
Should be fully-qualified to include
the filesystem to use.
hdfs://NAMENODE_SERVER:PORT/HBASE_ROOTDIR
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
</description>
</property>
<property>
<name>hbase.master.info.port</name>
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
<value>62010</value>
<description>The port for the
hbase master web UI
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
Set to -1 if you do not want the info
server to run.
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
</description>
</property>
<property>
<name>hbase.master.info.bindAddress</name>
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
<value>0.0.0.0</value>
<description>The address for the
hbase master web UI
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
</description>
</property>
<property>
<name>hbase.regionserver</name>
<value>0.0.0.0:62020</value>
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
<description>The host and port a
HBase region server runs at.
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
</description>
</property>
<property>
<name>hbase.regionserver.info.port</name>
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
<value>62030</value>
<description>The port for the
hbase regionserver web UI
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
Set to -1 if you do not want the info
server to run.
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
</description>
</property>
<property>
<name>hbase.regionserver.info.bindAddress</name>
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
<value>0.0.0.0</value>
<description>The address for the
hbase regionserver web UI
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
</description>
</property>
<property>
<name>hbase.regionserver.handler.count</name>
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
<value>20</value>
</property>
<property>
<name>hbase.master.lease.period</name>
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
<value>180000</value>
</property>
</configuration>
Here is a slice of the error log file
on one of the failed
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
regionservers, which lose response after
2009-04-13 15:20:26,077 FATAL
OutOfMemoryError,
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
aborting.
java.lang.OutOfMemoryError: Java heap
space
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
2009-04-13 15:20:48,062 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
request=0, regions=121, stores=121,
storefiles=5188,
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
storefileIndexSize=195,
Post by 11 Nov.
memcacheSize=214, usedHeap=4991,
maxHeap=4991
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
2009-04-13 15:20:48,062 INFO
Stopping
Post by 11 Nov.
server on 62020
2009-04-13 15:20:48,063 INFO
regionserver/0:0:0:0:0:0:0:0:62020.logFlusher exiting
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
2009-04-13 15:20:48,201 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Stopping
infoServer
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
2009-04-13 15:20:48,228 WARN
org.apache.hadoop.ipc.HBaseServer: IPC
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Server
Post by 11 Nov.
192.168.33.206:47754: output error
2009-04-13 15:20:48,229 INFO
org.apache.hadoop.ipc.HBaseServer: IPC
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Server
java.nio.channels.ClosedByInterruptException
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
2009-04-13 15:20:48,229 INFO
org.apache.hadoop.ipc.HBaseServer: IPC
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Server
Post by 11 Nov.
handler 5 on 62020: exiting
2009-04-13 15:20:48,297 INFO
Stopping IPC
Post by 11 Nov.
Server Responder
2009-04-13 15:20:48,552 INFO
Attempting
Post by 11 Nov.
connection to server /192.168.33.204:2181
2009-04-13 15:20:48,552 WARN
org.apache.zookeeper.ClientCnxn: Exception
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
closing session 0x0 to
java.io.IOException: TIMED OUT
at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:837)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
2009-04-13 15:20:48,555 INFO
org.apache.hadoop.ipc.HBaseServer: IPC
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Server
Post by 11 Nov.
handler 9 on 62020, call
error: java.io.IOException: Server not
running, aborting
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
java.io.IOException: Server not running,
aborting
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2809)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:2304)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
java.lang.reflect.Method.invoke(Method.java:597)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
2009-04-13 15:20:48,561 WARN
org.apache.hadoop.ipc.HBaseServer: IPC
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Server
Post by 11 Nov.
192.168.33.208:47852: output error
2009-04-13 15:20:48,561 WARN
org.apache.hadoop.ipc.HBaseServer: IPC
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Server
Post by 11 Nov.
192.168.33.221:37020: output error
2009-04-13 15:20:48,561 INFO
org.apache.hadoop.ipc.HBaseServer: IPC
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Server
java.nio.channels.ClosedByInterruptException
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
2009-04-13 15:20:48,561 INFO
org.apache.hadoop.ipc.HBaseServer: IPC
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Server
Post by 11 Nov.
handler 0 on 62020: exiting
2009-04-13 15:20:48,561 INFO
org.apache.hadoop.ipc.HBaseServer: IPC
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Server
java.nio.channels.ClosedByInterruptException
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
2009-04-13 15:20:48,655 INFO
org.apache.hadoop.ipc.HBaseServer: IPC
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Server
Post by 11 Nov.
handler 7 on 62020: exiting
2009-04-13 15:20:48,692 WARN
org.apache.hadoop.ipc.HBaseServer: IPC
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Server
Post by 11 Nov.
output error
2009-04-13 15:20:48,877 WARN
org.apache.hadoop.ipc.HBaseServer: IPC
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Server
Post by 11 Nov.
output error
2009-04-13 15:20:48,877 INFO
org.apache.hadoop.ipc.HBaseServer: IPC
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Server
java.nio.channels.ClosedByInterruptException
Post by 11 Nov.
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
2009-04-13 15:20:48,877 INFO
org.apache.hadoop.ipc.HBaseServer: IPC
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Server
Post by 11 Nov.
handler 16 on 62020: exiting
2009-04-13 15:20:48,877 WARN
org.apache.hadoop.ipc.HBaseServer: IPC
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Server
Post by 11 Nov.
192.168.33.236:45479: output error
2009-04-13 15:20:49,008 INFO
org.apache.hadoop.ipc.HBaseServer: IPC
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Server
java.nio.channels.ClosedByInterruptException
Post by 11 Nov.
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
2009-04-13 15:20:49,008 INFO
org.apache.hadoop.ipc.HBaseServer: IPC
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Server
Post by 11 Nov.
handler 17 on 62020: exiting
2009-04-13 15:20:48,654 WARN
org.apache.hadoop.ipc.HBaseServer: IPC
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Server
Post by 11 Nov.
192.168.33.219:40059: output error
2009-04-13 15:20:48,654 ERROR
processNode
/hugetable09/hugetable/acl.lock error!KeeperErrorCode =
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
ConnectionLoss
2009-04-13 15:20:48,649 WARN
org.apache.hadoop.ipc.HBaseServer: IPC
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Server
Post by 11 Nov.
192.168.33.254:51617: output error
2009-04-13 15:20:48,649 INFO
org.apache.hadoop.ipc.HBaseServer: IPC
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Server
Post by 11 Nov.
handler 12 on 62020, call
java.io.IOException: Server not running,
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
aborting
java.io.IOException: Server not running,
aborting
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2809)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:2304)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
java.lang.reflect.Method.invoke(Method.java:597)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
2009-04-13 15:20:48,648 WARN
org.apache.hadoop.ipc.HBaseServer: IPC
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Server
Post by 11 Nov.
192.168.33.209:43520: output error
2009-04-13 15:20:49,225 INFO
org.apache.hadoop.ipc.HBaseServer: IPC
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Server
java.nio.channels.ClosedByInterruptException
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
2009-04-13 15:20:49,226 INFO
org.apache.hadoop.ipc.HBaseServer: IPC
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Server
Post by 11 Nov.
handler 4 on 62020: exiting
2009-04-13 15:20:48,648 WARN
org.apache.hadoop.ipc.HBaseServer: IPC
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Server
Post by 11 Nov.
output error
2009-04-13 15:20:48,647 INFO
org.mortbay.util.ThreadedServer: Stopping
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
Acceptor
ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=62030]
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
2009-04-13 15:20:49,266 INFO
org.apache.hadoop.ipc.HBaseServer: IPC
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Server
java.nio.channels.ClosedByInterruptException
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
2009-04-13 15:20:49,266 INFO
org.apache.hadoop.ipc.HBaseServer: IPC
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Server
Post by 11 Nov.
handler 9 on 62020: exiting
2009-04-13 15:20:48,646 INFO
org.apache.hadoop.ipc.HBaseServer: IPC
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Server
Post by 11 Nov.
handler 2 on 62020, call
java.io.IOException: Server not running,
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
aborting
java.io.IOException: Server not running,
aborting
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2809)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:2304)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
java.lang.reflect.Method.invoke(Method.java:597)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
2009-04-13 15:20:48,572 WARN
org.apache.hadoop.ipc.HBaseServer: IPC
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Server
Post by 11 Nov.
192.168.33.217:60476: output error
2009-04-13 15:20:49,272 WARN
org.apache.hadoop.ipc.HBaseServer: IPC
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Server
Post by 11 Nov.
192.168.33.210:44154: output error
2009-04-13 15:20:49,272 INFO
org.apache.hadoop.ipc.HBaseServer: IPC
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Server
java.nio.channels.ClosedByInterruptException
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
2009-04-13 15:20:49,272 INFO
org.apache.hadoop.ipc.HBaseServer: IPC
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Server
Post by 11 Nov.
handler 8 on 62020: exiting
2009-04-13 15:20:49,263 WARN
org.apache.hadoop.ipc.HBaseServer: IPC
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Server
Post by 11 Nov.
192.168.33.238:51231: output error
2009-04-13 15:20:49,225 INFO
org.apache.hadoop.ipc.HBaseServer: IPC
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Server
java.nio.channels.ClosedByInterruptException
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
2009-04-13 15:20:49,068 INFO
org.apache.hadoop.ipc.HBaseServer: IPC
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Server
java.nio.channels.ClosedByInterruptException
Post by 11 Nov.
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
2009-04-13 15:20:49,345 INFO
org.apache.hadoop.ipc.HBaseServer: IPC
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Server
Post by 11 Nov.
handler 14 on 62020: exiting
2009-04-13 15:20:49,048 ERROR
java.lang.OutOfMemoryError: Java heap
space
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
2009-04-13 15:20:49,484 FATAL
OutOfMemoryError,
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
aborting.
java.lang.OutOfMemoryError: Java heap
space
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
java.util.concurrent.ConcurrentHashMap$Values.iterator(ConcurrentHashMap.java:1187)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.getGlobalMemcacheSize(HRegionServer.java:2863)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.MemcacheFlusher.reclaimMemcacheMemory(MemcacheFlusher.java:260)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:2307)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
java.lang.reflect.Method.invoke(Method.java:597)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
2009-04-13 15:20:49,488 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
request=0, regions=121, stores=121,
storefiles=5188,
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
storefileIndexSize=195,
Post by 11 Nov.
memcacheSize=214, usedHeap=4985,
maxHeap=4991
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
2009-04-13 15:20:49,489 INFO
org.apache.hadoop.ipc.HBaseServer: IPC
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Server
Post by 11 Nov.
handler 15 on 62020, call
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1334)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1324)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:2320)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
java.lang.reflect.Method.invoke(Method.java:597)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
Java heap space
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
java.util.concurrent.ConcurrentHashMap$Values.iterator(ConcurrentHashMap.java:1187)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.getGlobalMemcacheSize(HRegionServer.java:2863)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.MemcacheFlusher.reclaimMemcacheMemory(MemcacheFlusher.java:260)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:2307)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
... 5 more
2009-04-13 15:20:49,490 WARN
org.apache.hadoop.ipc.HBaseServer: IPC
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Server
Post by 11 Nov.
output error
2009-04-13 15:20:49,047 INFO
Stopping IPC
Post by 11 Nov.
Server listener on 62020
2009-04-13 15:20:49,493 INFO
org.apache.hadoop.ipc.HBaseServer: IPC
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Server
java.nio.channels.ClosedChannelException
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1085)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer.access$1900(HBaseServer.java:70)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:593)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:657)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:923)
Post by 11 Nov.
Post by 11 Nov.
Post by Jean-Daniel Cryans
Post by 11 Nov.
Any suggenstion is welcomed! Thanks a
lot!
Andrew Purtell
2009-04-14 07:32:21 UTC
Permalink
I put up a v4 on this issue. You should use that one
instead. Please be advised this is still experimental.

- Andy
From: Jean-Daniel Cryans
Subject: Re: Region Server lost response when doing BatchUpdate
Date: Monday, April 13, 2009, 5:40 AM
I see that your region server had 5188 store files in 121
store, I'm 99% sure that it's the cause of your OOME.
Luckily for you, we've been working on this issue since
- Upgrade to HBase 0.19.1
- Apply the latest patch in
https://issues.apache.org/jira/browse/HBASE-1058 (the v3)
Loading...