Discussion:
Improving on MTTR of cluster [Hbase - 1.1.13]
sahil aggarwal
2018-09-10 15:52:01 UTC
Permalink
Hi,

My cluster has around 50k regions and 130 RS. In case of unclean shutdown,
the cluster take around 40 50 mins to come up(mostly slow on region
assignment from observation). Trying to optimize it found following
possible configs:

*hbase.assignment.usezk:* which will co-host meta table and Hmaster and
avoid zk interaction for region assignment.
*hbase.master.distributed.log.replay:* to replay the edit logs in
distributed manner.


Testing *hbase.assignment.usezk* alone on small cluster(2200 regions, 4 RS)
gave following results:

hbase.assignment.usezk=true -> 12 mins
hbase.assignment.usezk=false -> 9 mins


From this blog
<https://blogs.apache.org/hbase/entry/hbase_zk_less_region_assignment>, i
was expecting better results so probably I am missing something. Will
appreciate any pointers.

Thanks,
Sahil
Ted Yu
2018-09-10 16:02:26 UTC
Permalink
For the second config you mentioned, hbase.master.distributed.log.replay,
see http://hbase.apache.org/book.html#upgrade2.0.distributed.log.replay

FYI
Post by sahil aggarwal
Hi,
My cluster has around 50k regions and 130 RS. In case of unclean shutdown,
the cluster take around 40 50 mins to come up(mostly slow on region
assignment from observation). Trying to optimize it found following
*hbase.assignment.usezk:* which will co-host meta table and Hmaster and
avoid zk interaction for region assignment.
*hbase.master.distributed.log.replay:* to replay the edit logs in
distributed manner.
Testing *hbase.assignment.usezk* alone on small cluster(2200 regions, 4 RS)
hbase.assignment.usezk=true -> 12 mins
hbase.assignment.usezk=false -> 9 mins
From this blog
<https://blogs.apache.org/hbase/entry/hbase_zk_less_region_assignment>, i
was expecting better results so probably I am missing something. Will
appreciate any pointers.
Thanks,
Sahil
sahil aggarwal
2018-09-11 12:03:08 UTC
Permalink
Thanks Ted.

Even regarding the field hbase.assignment.usezk=true, it seems like it
requires hbase:meta and hmaster to be co-hosted but
http://hbase.apache.org/2.0/book.html#upgrade2.0.regions.on.master this
says that "Master hosting regions" feature broken and unsupported.

Is there anything else I can tap into to speedup region assignment?
Post by Ted Yu
For the second config you mentioned, hbase.master.distributed.log.replay,
see http://hbase.apache.org/book.html#upgrade2.0.distributed.log.replay
FYI
Post by sahil aggarwal
Hi,
My cluster has around 50k regions and 130 RS. In case of unclean
shutdown,
Post by sahil aggarwal
the cluster take around 40 50 mins to come up(mostly slow on region
assignment from observation). Trying to optimize it found following
*hbase.assignment.usezk:* which will co-host meta table and Hmaster and
avoid zk interaction for region assignment.
*hbase.master.distributed.log.replay:* to replay the edit logs in
distributed manner.
Testing *hbase.assignment.usezk* alone on small cluster(2200 regions, 4
RS)
Post by sahil aggarwal
hbase.assignment.usezk=true -> 12 mins
hbase.assignment.usezk=false -> 9 mins
From this blog
<https://blogs.apache.org/hbase/entry/hbase_zk_less_region_assignment>,
i
Post by sahil aggarwal
was expecting better results so probably I am missing something. Will
appreciate any pointers.
Thanks,
Sahil
sahil aggarwal
2018-09-28 03:43:10 UTC
Permalink
just FYI, increasing hbase.regionserver.executor.openregion.threads helped
significantly(from 20-25 mins to <2mins for ~2200 regions on 4 RS).

Have created patch to document this
https://jira.apache.org/jira/projects/HBASE/issues/HBASE-21186?filter=myopenissues
.
Post by sahil aggarwal
Thanks Ted.
Even regarding the field hbase.assignment.usezk=true, it seems like it
requires hbase:meta and hmaster to be co-hosted but
http://hbase.apache.org/2.0/book.html#upgrade2.0.regions.on.master this
says that "Master hosting regions" feature broken and unsupported.
Is there anything else I can tap into to speedup region assignment?
Post by Ted Yu
For the second config you mentioned, hbase.master.distributed.log.replay,
see http://hbase.apache.org/book.html#upgrade2.0.distributed.log.replay
FYI
Post by sahil aggarwal
Hi,
My cluster has around 50k regions and 130 RS. In case of unclean
shutdown,
Post by sahil aggarwal
the cluster take around 40 50 mins to come up(mostly slow on region
assignment from observation). Trying to optimize it found following
*hbase.assignment.usezk:* which will co-host meta table and Hmaster and
avoid zk interaction for region assignment.
*hbase.master.distributed.log.replay:* to replay the edit logs in
distributed manner.
Testing *hbase.assignment.usezk* alone on small cluster(2200 regions, 4
RS)
Post by sahil aggarwal
hbase.assignment.usezk=true -> 12 mins
hbase.assignment.usezk=false -> 9 mins
From this blog
<https://blogs.apache.org/hbase/entry/hbase_zk_less_region_assignment>,
i
Post by sahil aggarwal
was expecting better results so probably I am missing something. Will
appreciate any pointers.
Thanks,
Sahil
Loading...