Discussion:
# of dfs replications when using hbase
(too old to reply)
Rong-en Fan
2008-04-10 11:58:05 UTC
Permalink
Hi,

I'm running hadoop hdfs 0.16.2 and hbase 0.1.0. I noticed that
even if I set dfs.repliaction to 1 in hadoop's config, it seems that
avg. block replication reported by hadoop fsck is ~3. I thought
it will 1 as my dfs configuration said so?

So, does hbase will set its own replication factor and ignore what
dfs's default is?

Thanks,
Rong-En Fan
stack
2008-04-10 16:43:16 UTC
Permalink
Post by Rong-en Fan
Hi,
I'm running hadoop hdfs 0.16.2 and hbase 0.1.0. I noticed that
even if I set dfs.repliaction to 1 in hadoop's config, it seems that
avg. block replication reported by hadoop fsck is ~3. I thought
it will 1 as my dfs configuration said so?
You restarted after making the config. change?
Post by Rong-en Fan
So, does hbase will set its own replication factor and ignore what
dfs's default is?
HBase does not (currently) do any configuration of HDFS; it runs with
whatever you point it at.

St.Ack
Rong-en Fan
2008-04-10 16:57:14 UTC
Permalink
Post by stack
Post by Rong-en Fan
Hi,
I'm running hadoop hdfs 0.16.2 and hbase 0.1.0. I noticed that
even if I set dfs.repliaction to 1 in hadoop's config, it seems that
avg. block replication reported by hadoop fsck is ~3. I thought
it will 1 as my dfs configuration said so?
You restarted after making the config. change?
I did so. I even rm -rf on dfs's dir and do namenode -format
before starting my dfs. hadoop fsck reports the default replication
is 1, avg. block replication is 2.9x after I wrote some data into
hbase. The underlying dfs is used by hbase. No other apps on
it.
Post by stack
Post by Rong-en Fan
So, does hbase will set its own replication factor and ignore what
dfs's default is?
HBase does not (currently) do any configuration of HDFS; it runs with
whatever you point it at.
Hmm... as far as I understand the hadoop FileSystem, you can
specify # of replication when creating a file. But I did not find hbase
use it, correct?

Regards,
Rong-En Fan
Post by stack
St.Ack
stack
2008-04-10 17:14:12 UTC
Permalink
Post by Rong-en Fan
I did so. I even rm -rf on dfs's dir and do namenode -format
before starting my dfs. hadoop fsck reports the default replication
is 1, avg. block replication is 2.9x after I wrote some data into
hbase. The underlying dfs is used by hbase. No other apps on
it.
What if you add a file using './bin/hadoop fs ....' -- i.e. don't have
hbase in the mix at all -- does the file show as replicated?


If you copy your hadoop-conf.xml to $HBASE_HOME/conf, does it then do
the right thing? Maybe whats happening is that hbase writing files,
we're using hadoop defaults.
Post by Rong-en Fan
Hmm... as far as I understand the hadoop FileSystem, you can
specify # of replication when creating a file. But I did not find hbase
use it, correct?
We don't do it explicitly, but as I suggest above, we're probably using
defaults instead of your custom config.

St.Ack
Rong-en Fan
2008-04-11 01:32:35 UTC
Permalink
Post by stack
Post by Rong-en Fan
I did so. I even rm -rf on dfs's dir and do namenode -format
before starting my dfs. hadoop fsck reports the default replication
is 1, avg. block replication is 2.9x after I wrote some data into
hbase. The underlying dfs is used by hbase. No other apps on
it.
What if you add a file using './bin/hadoop fs ....' -- i.e. don't have
hbase in the mix at all -- does the file show as replicated?
It's 1 replication.
Post by stack
If you copy your hadoop-conf.xml to $HBASE_HOME/conf, does it then do the
right thing? Maybe whats happening is that hbase writing files, we're using
hadoop defaults.
Yes, I can verify by doing so, HBase respects my customized config. Shall I file
a JIRA against HBase or Hadoop itself?
Post by stack
Post by Rong-en Fan
Hmm... as far as I understand the hadoop FileSystem, you can
specify # of replication when creating a file. But I did not find hbase
use it, correct?
We don't do it explicitly, but as I suggest above, we're probably using
defaults instead of your custom config.
St.Ack
Thanks,
Rong-En Fan
Rong-en Fan
2008-04-11 02:55:53 UTC
Permalink
Post by Rong-en Fan
Post by stack
Post by Rong-en Fan
I did so. I even rm -rf on dfs's dir and do namenode -format
before starting my dfs. hadoop fsck reports the default replication
is 1, avg. block replication is 2.9x after I wrote some data into
hbase. The underlying dfs is used by hbase. No other apps on
it.
What if you add a file using './bin/hadoop fs ....' -- i.e. don't have
hbase in the mix at all -- does the file show as replicated?
It's 1 replication.
Post by stack
If you copy your hadoop-conf.xml to $HBASE_HOME/conf, does it then do the
right thing? Maybe whats happening is that hbase writing files, we're using
hadoop defaults.
Yes, I can verify by doing so, HBase respects my customized config. Shall I file
a JIRA against HBase or Hadoop itself?
When HBase was in hadoop/contrib, the hbase script set both HADOOP_CONF_DIR
and HBASE_CONF_DIR to CLASSPATH, so that dfs's configuration can be loaded
correctly. However, when moved out hadoop/contrib, it only sets HBASE_CONF_DIR.

I can think of several possible solutions:

1) set HADOOP_CONF_DIR in hbase-env.sh, then add HADOOP_CONF_DIR to
CLASSPATH as before
2) Instruct user to create links for hadoop-*.xml if they want to
customize some dfs settings.
3) If only a small set of dfs confs are related to dfs's client, maybe
they can be set via
hbase-site.xml, then hbase sets these for us when create a FileSystem obj.

Regards,
Rong-En Fan
Post by Rong-en Fan
Post by stack
Post by Rong-en Fan
Hmm... as far as I understand the hadoop FileSystem, you can
specify # of replication when creating a file. But I did not find hbase
use it, correct?
We don't do it explicitly, but as I suggest above, we're probably using
defaults instead of your custom config.
St.Ack
Thanks,
Rong-En Fan
stack
2008-04-11 03:55:20 UTC
Permalink
Post by Rong-en Fan
Post by Rong-en Fan
Post by stack
If you copy your hadoop-conf.xml to $HBASE_HOME/conf, does it then do the
right thing? Maybe whats happening is that hbase writing files, we're using
hadoop defaults.
Yes, I can verify by doing so, HBase respects my customized config. Shall I file
a JIRA against HBase or Hadoop itself?
When HBase was in hadoop/contrib, the hbase script set both HADOOP_CONF_DIR
and HBASE_CONF_DIR to CLASSPATH, so that dfs's configuration can be loaded
correctly. However, when moved out hadoop/contrib, it only sets HBASE_CONF_DIR.
1) set HADOOP_CONF_DIR in hbase-env.sh, then add HADOOP_CONF_DIR to
CLASSPATH as before
2) Instruct user to create links for hadoop-*.xml if they want to
customize some dfs settings.
3) If only a small set of dfs confs are related to dfs's client, maybe
they can be set via
hbase-site.xml, then hbase sets these for us when create a FileSystem obj.
Thanks for finding this oversight of ours Rong-en. Please file a JIRA.
Make it a blocker for branch and trunk. At a minimum, we should improve
our doc. so includes all the suggestions you make above.
Thank you,
St.Ack

Continue reading on narkive:
Loading...