Improving on MTTR of cluster [Hbase

Discussion:

Improving on MTTR of cluster [Hbase - 1.1.13]

sahil aggarwal

2018-09-10 15:52:01 UTC

Hi,

My cluster has around 50k regions and 130 RS. In case of unclean shutdown,
the cluster take around 40 50 mins to come up(mostly slow on region
assignment from observation). Trying to optimize it found following
possible configs:

*hbase.assignment.usezk:* which will co-host meta table and Hmaster and
avoid zk interaction for region assignment.
*hbase.master.distributed.log.replay:* to replay the edit logs in
distributed manner.

Testing *hbase.assignment.usezk* alone on small cluster(2200 regions, 4 RS)
gave following results:

hbase.assignment.usezk=true -> 12 mins
hbase.assignment.usezk=false -> 9 mins

From this blog
<https://blogs.apache.org/hbase/entry/hbase_zk_less_region_assignment>, i
was expecting better results so probably I am missing something. Will
appreciate any pointers.

Thanks,
Sahil

Ted Yu

2018-09-10 16:02:26 UTC

Permalink

For the second config you mentioned, hbase.master.distributed.log.replay,
see http://hbase.apache.org/book.html#upgrade2.0.distributed.log.replay

FYI

Post by sahil aggarwal
Hi,
My cluster has around 50k regions and 130 RS. In case of unclean shutdown,
the cluster take around 40 50 mins to come up(mostly slow on region
assignment from observation). Trying to optimize it found following
*hbase.assignment.usezk:* which will co-host meta table and Hmaster and
avoid zk interaction for region assignment.
*hbase.master.distributed.log.replay:* to replay the edit logs in
distributed manner.
Testing *hbase.assignment.usezk* alone on small cluster(2200 regions, 4 RS)
hbase.assignment.usezk=true -> 12 mins
hbase.assignment.usezk=false -> 9 mins
From this blog
<https://blogs.apache.org/hbase/entry/hbase_zk_less_region_assignment>, i
was expecting better results so probably I am missing something. Will
appreciate any pointers.
Thanks,
Sahil

sahil aggarwal

2018-09-11 12:03:08 UTC

Permalink

Thanks Ted.

Even regarding the field hbase.assignment.usezk=true, it seems like it
requires hbase:meta and hmaster to be co-hosted but
http://hbase.apache.org/2.0/book.html#upgrade2.0.regions.on.master this
says that "Master hosting regions" feature broken and unsupported.

Is there anything else I can tap into to speedup region assignment?

Post by Ted Yu
For the second config you mentioned, hbase.master.distributed.log.replay,
see http://hbase.apache.org/book.html#upgrade2.0.distributed.log.replay
FYI

Post by sahil aggarwal
Hi,
My cluster has around 50k regions and 130 RS. In case of unclean

shutdown,

Post by sahil aggarwal
the cluster take around 40 50 mins to come up(mostly slow on region
assignment from observation). Trying to optimize it found following
*hbase.assignment.usezk:* which will co-host meta table and Hmaster and
avoid zk interaction for region assignment.
*hbase.master.distributed.log.replay:* to replay the edit logs in
distributed manner.
Testing *hbase.assignment.usezk* alone on small cluster(2200 regions, 4

RS)

Post by sahil aggarwal
hbase.assignment.usezk=true -> 12 mins
hbase.assignment.usezk=false -> 9 mins
From this blog
<https://blogs.apache.org/hbase/entry/hbase_zk_less_region_assignment>,

Post by sahil aggarwal
was expecting better results so probably I am missing something. Will
appreciate any pointers.
Thanks,
Sahil

sahil aggarwal

2018-09-28 03:43:10 UTC

Permalink

just FYI, increasing hbase.regionserver.executor.openregion.threads helped
significantly(from 20-25 mins to <2mins for ~2200 regions on 4 RS).

Have created patch to document this
https://jira.apache.org/jira/projects/HBASE/issues/HBASE-21186?filter=myopenissues
.

Post by sahil aggarwal
Thanks Ted.
Even regarding the field hbase.assignment.usezk=true, it seems like it
requires hbase:meta and hmaster to be co-hosted but
http://hbase.apache.org/2.0/book.html#upgrade2.0.regions.on.master this
says that "Master hosting regions" feature broken and unsupported.
Is there anything else I can tap into to speedup region assignment?

Post by Ted Yu
For the second config you mentioned, hbase.master.distributed.log.replay,
see http://hbase.apache.org/book.html#upgrade2.0.distributed.log.replay
FYI

Post by sahil aggarwal
Hi,
My cluster has around 50k regions and 130 RS. In case of unclean

shutdown,

RS)

Post by sahil aggarwal
hbase.assignment.usezk=true -> 12 mins
hbase.assignment.usezk=false -> 9 mins
From this blog
<https://blogs.apache.org/hbase/entry/hbase_zk_less_region_assignment>,

Post by sahil aggarwal
was expecting better results so probably I am missing something. Will
appreciate any pointers.
Thanks,
Sahil