Discussion:
question on column families
Antonio Si
2018-11-13 23:34:24 UTC
Permalink
Hi,

I would like to confirm my understand.

Let's say I have 13 column families in a hbase table. 11 of those column
families have no data, which 2 column families have large amount of data.

My understanding is that the size of memstore, which is 128M in my env,
will be shared across all column families even though there is no data in
that column families. Is my understanding correct?

Thanks in advance.

Antonio.
Allan Yang
2018-11-14 03:09:33 UTC
Permalink
No, Every column family has its own memstore. Each one is 128MB in your
case. When flushing, the flusher will choose those memstore who satisfy
certain conditions, so it is possible that not every column family(Store)
will flush the memstore.
Best Regards
Allan Yang
Post by Antonio Si
Hi,
I would like to confirm my understand.
Let's say I have 13 column families in a hbase table. 11 of those column
families have no data, which 2 column families have large amount of data.
My understanding is that the size of memstore, which is 128M in my env,
will be shared across all column families even though there is no data in
that column families. Is my understanding correct?
Thanks in advance.
Antonio.
Antonio Si
2018-11-14 03:19:51 UTC
Permalink
Thanks Allan.

Then, why is it a problem of having too many column families? If there are
column
families with no data, would that cause any issues?

Thanks.

Antonio.
Post by Allan Yang
No, Every column family has its own memstore. Each one is 128MB in your
case. When flushing, the flusher will choose those memstore who satisfy
certain conditions, so it is possible that not every column family(Store)
will flush the memstore.
Best Regards
Allan Yang
Post by Antonio Si
Hi,
I would like to confirm my understand.
Let's say I have 13 column families in a hbase table. 11 of those column
families have no data, which 2 column families have large amount of data.
My understanding is that the size of memstore, which is 128M in my env,
will be shared across all column families even though there is no data in
that column families. Is my understanding correct?
Thanks in advance.
Antonio.
Stack
2018-11-14 16:40:27 UTC
Permalink
Post by Antonio Si
Thanks Allan.
Then, why is it a problem of having too many column families? If there are
column
families with no data, would that cause any issues?
Thanks.
`We have this note in the refguide [1]. It is a bit stale though.

Each CF consumes at least some resources. Then flush can be dumb and flush
more than just the loaded memstores (making for many small files in HDFS
which in turn need compacting...). Do you see this phenomenon?

If careful when querying and writing, then the CFs can work autonomously
and you can do more than what the refguide bounds.

S

1. http://hbase.apache.org/book.html#number.of.cfs
Post by Antonio Si
Antonio.
Post by Allan Yang
No, Every column family has its own memstore. Each one is 128MB in your
case. When flushing, the flusher will choose those memstore who satisfy
certain conditions, so it is possible that not every column family(Store)
will flush the memstore.
Best Regards
Allan Yang
Post by Antonio Si
Hi,
I would like to confirm my understand.
Let's say I have 13 column families in a hbase table. 11 of those
column
Post by Allan Yang
Post by Antonio Si
families have no data, which 2 column families have large amount of
data.
Post by Allan Yang
Post by Antonio Si
My understanding is that the size of memstore, which is 128M in my env,
will be shared across all column families even though there is no data
in
Post by Allan Yang
Post by Antonio Si
that column families. Is my understanding correct?
Thanks in advance.
Antonio.
Loading...