Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warning ("TextImporter: Need to throttle, HBase isn't keeping up.") during large import may be permanent/fatal #173

Open
mbranden opened this issue Feb 26, 2013 · 8 comments
Labels

Comments

@mbranden
Copy link

I'm doing some tests on scaling and data import. I'm using a local (non-clustered) hbase with opentsdb. Two metrics defined, maybe six tag names and under 1000 total possible tag values. During an import of about 8e6 data points, 'tsdb' passed about 2e6 and then started throwing stack traces as fast as possible. The exceptions involved (captured below) suggested 'tsdb' might be throttling back on hbase requests but the rapid stream said otherwise. Workaround was to do these imports in chunks of 1e6 data points and those ran without any problems.

So this may be a works-as-intended feature. But if not, here are the details:

Caused by RPC: null
Caused by RPC: PutRequest(table="tsdb", key=[0, 0, 1, 81, 40, 6, -128, 0, 0, 1, 0, 1, 41, 0, 0, 2, 0, 0, 4, 0, 0, 3, 0, 0, 3, 0, 0, 4, 0, 0, 15, 0, 0, 5, 0, 0, 16], family="t", qualifiers=[[-67, -117]], values=["<\xDD/\x1B"], timestamp=9223372036854775807, lockid=-1, durable=false, bufferable=true, attempt=0, region=RegionInfo(table="tsdb", region_name="tsdb,,1361288133012.99acbf016aabf8f48d33904aa6be4052.", stop_key=""))
        at org.hbase.async.NotServingRegionException.make(NotServingRegionException.java:68) ~[asynchbase-1.4.0.jar:9b7f3f9]
        at org.hbase.async.NotServingRegionException.make(NotServingRegionException.java:33) ~[asynchbase-1.4.0.jar:9b7f3f9]
        at org.hbase.async.MultiAction.deserializeMultiResponse(MultiAction.java:546) ~[asynchbase-1.4.0.jar:9b7f3f9]
        at org.hbase.async.MultiAction.responseFromBuffer(MultiAction.java:493) ~[asynchbase-1.4.0.jar:9b7f3f9]
        at org.hbase.async.RegionClient.deserializeObject(RegionClient.java:1288) [asynchbase-1.4.0.jar:9b7f3f9]
        at org.hbase.async.RegionClient.deserialize(RegionClient.java:1235) [asynchbase-1.4.0.jar:9b7f3f9]
        at org.hbase.async.RegionClient.decode(RegionClient.java:1153) [asynchbase-1.4.0.jar:9b7f3f9]
        ... 19 common frames omitted
2013-02-25 17:02:08,011 WARN  [New I/O  worker #1] TextImporter: Need to throttle, HBase isn't keeping up.
org.hbase.async.PleaseThrottleException: 10000 RPCs waiting on "tsdb,,1361288133012.99acbf016aabf8f48d33904aa6be4052." to come back online
        at org.hbase.async.HBaseClient.handleNSRE(HBaseClient.java:2193) [asynchbase-1.4.0.jar:9b7f3f9]
        at org.hbase.async.RegionClient$1MultiActionCallback.call(RegionClient.java:715) [asynchbase-1.4.0.jar:9b7f3f9]
        at com.stumbleupon.async.Deferred.doCall(Deferred.java:1262) [suasync-1.3.1.jar:5682660]
        at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1241) [suasync-1.3.1.jar:5682660]
        at com.stumbleupon.async.Deferred.callback(Deferred.java:989) [suasync-1.3.1.jar:5682660]
        at org.hbase.async.HBaseRpc.callback(HBaseRpc.java:450) [asynchbase-1.4.0.jar:9b7f3f9]
        at org.hbase.async.RegionClient.decode(RegionClient.java:1185) [asynchbase-1.4.0.jar:9b7f3f9]
        at org.hbase.async.RegionClient.decode(RegionClient.java:82) [asynchbase-1.4.0.jar:9b7f3f9]
        at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:502) [netty-3.5.9.Final.jar:na]
        at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:487) [netty-3.5.9.Final.jar:na]
        at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:75) [netty-3.5.9.Final.jar:na]
        at org.hbase.async.RegionClient.handleUpstream(RegionClient.java:1008) [asynchbase-1.4.0.jar:9b7f3f9]
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:565) [netty-3.5.9.Final.jar:na]
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) [netty-3.5.9.Final.jar:na]
        at org.hbase.async.HBaseClient$RegionClientPipeline.sendUpstream(HBaseClient.java:2430) [asynchbase-1.4.0.jar:9b7f3f9]
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) [netty-3.5.9.Final.jar:na]
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) [netty-3.5.9.Final.jar:na]
        at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:84) [netty-3.5.9.Final.jar:na]
        at org.jboss.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:472) [netty-3.5.9.Final.jar:na]
        at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:333) [netty-3.5.9.Final.jar:na]
        at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:35) [netty-3.5.9.Final.jar:na]
        at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:102) [netty-3.5.9.Final.jar:na]
        at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) [netty-3.5.9.Final.jar:na]
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [na:1.6.0_26]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [na:1.6.0_26]
        at java.lang.Thread.run(Thread.java:662) [na:1.6.0_26]
Caused by: org.hbase.async.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Region is not online: tsdb,,1361288133012.99acbf016aabf8f48d33904aa6be4052.
        at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3266)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3543)
        at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
@tsuna
Copy link
Member

tsuna commented Feb 27, 2013

The throttling logic in the batch importer isn't very good. This is not working as intended, it's definitely a bug / an annoyance. I have an uncommitted change somewhere to improve that. Let's see if I can dig it.

If you want to do any sort of scale / performance testing with HBase, you need to pre-split your table. This is also true for OpenTSDB. Have you done that?

@mbranden
Copy link
Author

No, definitely did not pre-split in this case. I'll have to dig into that...

@elsbrock
Copy link

@tsuna, you probably meant this one: #47

@tsuna
Copy link
Member

tsuna commented Mar 22, 2013

I have a better one somewhere that uses a semaphore to better control the number of outstanding RPCs in flight.

@CamJN
Copy link
Contributor

CamJN commented Dec 9, 2015

I'm hitting this really hard right now? Any advice?

@tailorck
Copy link

Bump. Tsuna have you been able to commit your semaphore solution?

@manolama
Copy link
Member

If anyone has some time, it wouldn't be too difficult to throw a Guava rate limiter in the import path and that would allow for backoff and catch up.

@johnwhumphreys
Copy link

We're hitting this as well; is there a pending solution, or is this a dead end? I see comments from 2013, so I'm not expecting much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants