Discussion:
A data loss scenario with a single region server going down
(too old to reply)
George P. Stathis
2010-09-19 13:57:49 UTC
Permalink
Hi folks. I'd like to run the following data loss scenario by you to see if
we are doing something obviously wrong with our setup here.

Setup:

- Hadoop 0.20.1
- HBase 0.20.3
- 1 Master Node running Nameserver, SecondaryNameserver, JobTracker,
HMaster and 1 Zookeeper (no zookeeper quorum right now)
- 4 child nodes running a Datanode, TaskTracker and RegionServer each
- dfs.replication is set to 2
- Host: Amazon EC2

Up until yesterday, we were frequently experiencing
HBASE-2077<https://issues.apache.org/jira/browse/HBASE-2077>,
which kept bringing our RegionServers down. What we realized though is that
we were losing data (a few hours worth) with just one out of four
regionservers going down. This is problematic since we are supposed to
replicate at x2 out of 4 nodes, so at least one other node should be able to
theoretically serve the data that the downed regionserver can't.

Questions:

- When a regionserver goes down unexpectedly, the only data that
theoretically gets lost was whatever didn't make it to the WAL, right? Or
wrong? E.g.
http://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-log.html
- We ran a hadoop fsck on our cluster and verified the replication factor
as well as that the were no under replicated blocks. So why was our data not
available from another node?
- If the log gets rolled every 60 minutes by default (we haven't touched
the defaults), how can we lose data from up to 24 hours ago?
- When the downed regionserver comes back up, shouldn't that data be
available again? Ours wasn't.
- In such scenarios, is there a recommended approach for restoring the
regionserver that goes down? We just brought them back up by logging on the
node itself an manually restarting them first. Now we have automated crons
that listen for their ports and restart them if they go down within two
minutes.
- Are there way to recover such lost data?
- Are versions 0.89 / 0.90 addressing any of these issues?
- Curiosity question: when a regionserver goes down, does the master try
to replicate that node's data on another node to satisfy the dfs.replication
ratio?

For now, we have upgraded our HBase to 0.20.6, which is supposed to contain
the HBASE-2077 <https://issues.apache.org/jira/browse/HBASE-2077> fix (but
no one has verified yet). Lars' blog also suggests that Hadoop 0.21.0 is the
way to go to avoid the file append issues but it's not production ready
yet. Should we stick to 0.20.1? Upgrade to 0.20.2?

Any tips here are definitely appreciated. I'll be happy to provide more
information as well.

-GS
Todd Lipcon
2010-09-19 21:58:58 UTC
Permalink
Hi George,

The data loss problems you mentioned below are known issues when running on
stock Apache 0.20.x hadoop.

You should consider upgrading to CDH3b2, which includes a number of HDFS
patches that allow HBase to durably store data. You'll also have to upgrade
to HBase 0.89 - we ship a version as part of CDH that will work well.

Thanks
-Todd
Post by George P. Stathis
Hi folks. I'd like to run the following data loss scenario by you to see if
we are doing something obviously wrong with our setup here.
- Hadoop 0.20.1
- HBase 0.20.3
- 1 Master Node running Nameserver, SecondaryNameserver, JobTracker,
HMaster and 1 Zookeeper (no zookeeper quorum right now)
- 4 child nodes running a Datanode, TaskTracker and RegionServer each
- dfs.replication is set to 2
- Host: Amazon EC2
Up until yesterday, we were frequently experiencing
HBASE-2077<https://issues.apache.org/jira/browse/HBASE-2077>,
which kept bringing our RegionServers down. What we realized though is that
we were losing data (a few hours worth) with just one out of four
regionservers going down. This is problematic since we are supposed to
replicate at x2 out of 4 nodes, so at least one other node should be able to
theoretically serve the data that the downed regionserver can't.
- When a regionserver goes down unexpectedly, the only data that
theoretically gets lost was whatever didn't make it to the WAL, right? Or
wrong? E.g.
http://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-log.html
- We ran a hadoop fsck on our cluster and verified the replication factor
as well as that the were no under replicated blocks. So why was our data not
available from another node?
- If the log gets rolled every 60 minutes by default (we haven't touched
the defaults), how can we lose data from up to 24 hours ago?
- When the downed regionserver comes back up, shouldn't that data be
available again? Ours wasn't.
- In such scenarios, is there a recommended approach for restoring the
regionserver that goes down? We just brought them back up by logging on the
node itself an manually restarting them first. Now we have automated crons
that listen for their ports and restart them if they go down within two
minutes.
- Are there way to recover such lost data?
- Are versions 0.89 / 0.90 addressing any of these issues?
- Curiosity question: when a regionserver goes down, does the master try
to replicate that node's data on another node to satisfy the
dfs.replication
ratio?
For now, we have upgraded our HBase to 0.20.6, which is supposed to contain
the HBASE-2077 <https://issues.apache.org/jira/browse/HBASE-2077> fix (but
no one has verified yet). Lars' blog also suggests that Hadoop 0.21.0 is the
way to go to avoid the file append issues but it's not production ready
yet. Should we stick to 0.20.1? Upgrade to 0.20.2?
Any tips here are definitely appreciated. I'll be happy to provide more
information as well.
-GS
--
Todd Lipcon
Software Engineer, Cloudera
George P. Stathis
2010-09-20 19:39:28 UTC
Permalink
Thanks Todd. We are not quite ready to move to 0.89 yet. We have made custom
modifications to the transactional contrib sources which are now taken out
of 0.89. We are planning on moving to 0.90 when it comes out and at that
point, either migrate our customizations, or move back to the out-of-the box
features (which will require a re-write of our code).

We are well aware of the CDH distros but at the time we started with hbase,
there was none that included HBase. I think CDH3 the first one to include
HBase, correct? And is 0.89 the only one supported?

Moreover, are we saying that there is no way to prevent stock hbase 0.20.6
and hadoop 0.20.2 from losing data when a single node goes down? It does not
matter if the data is replicated, it will still get lost?

-GS
Post by Todd Lipcon
Hi George,
The data loss problems you mentioned below are known issues when running on
stock Apache 0.20.x hadoop.
You should consider upgrading to CDH3b2, which includes a number of HDFS
patches that allow HBase to durably store data. You'll also have to upgrade
to HBase 0.89 - we ship a version as part of CDH that will work well.
Thanks
-Todd
Post by George P. Stathis
Hi folks. I'd like to run the following data loss scenario by you to see
if
Post by George P. Stathis
we are doing something obviously wrong with our setup here.
- Hadoop 0.20.1
- HBase 0.20.3
- 1 Master Node running Nameserver, SecondaryNameserver, JobTracker,
HMaster and 1 Zookeeper (no zookeeper quorum right now)
- 4 child nodes running a Datanode, TaskTracker and RegionServer each
- dfs.replication is set to 2
- Host: Amazon EC2
Up until yesterday, we were frequently experiencing
HBASE-2077<https://issues.apache.org/jira/browse/HBASE-2077>,
which kept bringing our RegionServers down. What we realized though is
that
Post by George P. Stathis
we were losing data (a few hours worth) with just one out of four
regionservers going down. This is problematic since we are supposed to
replicate at x2 out of 4 nodes, so at least one other node should be able to
theoretically serve the data that the downed regionserver can't.
- When a regionserver goes down unexpectedly, the only data that
theoretically gets lost was whatever didn't make it to the WAL, right?
Or
Post by George P. Stathis
wrong? E.g.
http://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-log.html
Post by George P. Stathis
- We ran a hadoop fsck on our cluster and verified the replication
factor
Post by George P. Stathis
as well as that the were no under replicated blocks. So why was our
data
Post by George P. Stathis
not
available from another node?
- If the log gets rolled every 60 minutes by default (we haven't
touched
Post by George P. Stathis
the defaults), how can we lose data from up to 24 hours ago?
- When the downed regionserver comes back up, shouldn't that data be
available again? Ours wasn't.
- In such scenarios, is there a recommended approach for restoring the
regionserver that goes down? We just brought them back up by logging on the
node itself an manually restarting them first. Now we have automated crons
that listen for their ports and restart them if they go down within two
minutes.
- Are there way to recover such lost data?
- Are versions 0.89 / 0.90 addressing any of these issues?
- Curiosity question: when a regionserver goes down, does the master
try
Post by George P. Stathis
to replicate that node's data on another node to satisfy the dfs.replication
ratio?
For now, we have upgraded our HBase to 0.20.6, which is supposed to
contain
Post by George P. Stathis
the HBASE-2077 <https://issues.apache.org/jira/browse/HBASE-2077> fix
(but
Post by George P. Stathis
no one has verified yet). Lars' blog also suggests that Hadoop 0.21.0 is the
way to go to avoid the file append issues but it's not production ready
yet. Should we stick to 0.20.1? Upgrade to 0.20.2?
Any tips here are definitely appreciated. I'll be happy to provide more
information as well.
-GS
--
Todd Lipcon
Software Engineer, Cloudera
Ryan Rawson
2010-09-20 19:52:33 UTC
Permalink
Hey,

The problem is that the stock 0.20 hadoop wont let you read from a
non-closed file. It will report that length as 0. So if a
regionserver crashes, that last WAL log that is still open becomes 0
length and the data within in unreadable. That specifically is the
problem of data loss. You could always make it so your regionservers
rarely crash - this is possible btw and I did it for over a year.

But you will want to run CDH3 or the append-branch releases to get the
series of patches that fix this hole. It also happens that only 0.89
runs on it. I would like to avoid the hadoop "everyone uses 0.20
forever" problem and talk about what we could do to help you get on
0.89. Over here at SU we've made a commitment to the future of 0.89
and are running it in production. Let us know what else you'd need.

-ryan

On Mon, Sep 20, 2010 at 12:39 PM, George P. Stathis
Post by George P. Stathis
Thanks Todd. We are not quite ready to move to 0.89 yet. We have made custom
modifications to the transactional contrib sources which are now taken out
of 0.89. We are planning on moving to 0.90 when it comes out and at that
point, either migrate our customizations, or move back to the out-of-the box
features (which will require a re-write of our code).
We are well aware of the CDH distros but at the time we started with hbase,
there was none that included HBase. I think CDH3 the first one to include
HBase, correct? And is 0.89 the only one supported?
Moreover, are we saying that there is no way to prevent stock hbase 0.20.6
and hadoop 0.20.2 from losing data when a single node goes down? It does not
matter if the data is replicated, it will still get lost?
-GS
Post by Todd Lipcon
Hi George,
The data loss problems you mentioned below are known issues when running on
stock Apache 0.20.x hadoop.
You should consider upgrading to CDH3b2, which includes a number of HDFS
patches that allow HBase to durably store data. You'll also have to upgrade
to HBase 0.89 - we ship a version as part of CDH that will work well.
Thanks
-Todd
Post by George P. Stathis
Hi folks. I'd like to run the following data loss scenario by you to see
if
Post by George P. Stathis
we are doing something obviously wrong with our setup here.
  - Hadoop 0.20.1
  - HBase 0.20.3
  - 1 Master Node running Nameserver, SecondaryNameserver, JobTracker,
  HMaster and 1 Zookeeper (no zookeeper quorum right now)
  - 4 child nodes running a Datanode, TaskTracker and RegionServer each
  - dfs.replication is set to 2
  - Host: Amazon EC2
Up until yesterday, we were frequently experiencing
HBASE-2077<https://issues.apache.org/jira/browse/HBASE-2077>,
which kept bringing our RegionServers down. What we realized though is
that
Post by George P. Stathis
we were losing data (a few hours worth) with just one out of four
regionservers going down. This is problematic since we are supposed to
replicate at x2 out of 4 nodes, so at least one other node should be able to
theoretically serve the data that the downed regionserver can't.
  - When a regionserver goes down unexpectedly, the only data that
  theoretically gets lost was whatever didn't make it to the WAL, right?
Or
Post by George P. Stathis
  wrong? E.g.
http://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-log.html
Post by George P. Stathis
  - We ran a hadoop fsck on our cluster and verified the replication
factor
Post by George P. Stathis
  as well as that the were no under replicated blocks. So why was our
data
Post by George P. Stathis
not
  available from another node?
  - If the log gets rolled every 60 minutes by default (we haven't
touched
Post by George P. Stathis
  the defaults), how can we lose data from up to 24 hours ago?
  - When the downed regionserver comes back up, shouldn't that data be
  available again? Ours wasn't.
  - In such scenarios, is there a recommended approach for restoring the
  regionserver that goes down? We just brought them back up by logging on
the
  node itself an manually restarting them first. Now we have automated
crons
  that listen for their ports and restart them if they go down within two
  minutes.
  - Are there way to recover such lost data?
  - Are versions 0.89 / 0.90 addressing any of these issues?
  - Curiosity question: when a regionserver goes down, does the master
try
Post by George P. Stathis
  to replicate that node's data on another node to satisfy the
dfs.replication
  ratio?
For now, we have upgraded our HBase to 0.20.6, which is supposed to
contain
Post by George P. Stathis
the HBASE-2077 <https://issues.apache.org/jira/browse/HBASE-2077> fix
(but
Post by George P. Stathis
no one has verified yet). Lars' blog also suggests that Hadoop 0.21.0 is the
way to go to avoid the  file append issues but it's not production ready
yet. Should we stick to 0.20.1? Upgrade to 0.20.2?
Any tips here are definitely appreciated. I'll be happy to provide more
information as well.
-GS
--
Todd Lipcon
Software Engineer, Cloudera
George P. Stathis
2010-09-20 20:21:05 UTC
Permalink
Thanks for the response Ryan. I have no doubt that 0.89 can be used in
production and that it has strong support. I just wanted to avoid moving to
it now because we have limited resources and it would put a dent in our
roadmap if we were to fast track the migration now. Specifically, we are
using HBASE-2438 and HBASE-2426 to support pagination across indexes. So we
either have to migrate those to 0.89 or somehow go stock and be able to
support pagination across region servers.

Of course, if the choice is between migrating or losing more data, data
safety comes first. But if we can buy two or three more months of time and
avoid region server crashes (like you did for a year), maybe we can go that
route for now. What do we need to do achieve that?

-GS

PS: Out of curiosity, I understand the WAL log append issue for a single
regionserver when it comes to losing the data on a single node. But if that
data is also being replicated on another region server, why wouldn't it be
available there? Or is the WAL log shared across multiple region servers
(maybe that's what I'm missing)?
Post by Ryan Rawson
Hey,
The problem is that the stock 0.20 hadoop wont let you read from a
non-closed file. It will report that length as 0. So if a
regionserver crashes, that last WAL log that is still open becomes 0
length and the data within in unreadable. That specifically is the
problem of data loss. You could always make it so your regionservers
rarely crash - this is possible btw and I did it for over a year.
But you will want to run CDH3 or the append-branch releases to get the
series of patches that fix this hole. It also happens that only 0.89
runs on it. I would like to avoid the hadoop "everyone uses 0.20
forever" problem and talk about what we could do to help you get on
0.89. Over here at SU we've made a commitment to the future of 0.89
and are running it in production. Let us know what else you'd need.
-ryan
On Mon, Sep 20, 2010 at 12:39 PM, George P. Stathis
Post by George P. Stathis
Thanks Todd. We are not quite ready to move to 0.89 yet. We have made
custom
Post by George P. Stathis
modifications to the transactional contrib sources which are now taken
out
Post by George P. Stathis
of 0.89. We are planning on moving to 0.90 when it comes out and at that
point, either migrate our customizations, or move back to the out-of-the
box
Post by George P. Stathis
features (which will require a re-write of our code).
We are well aware of the CDH distros but at the time we started with
hbase,
Post by George P. Stathis
there was none that included HBase. I think CDH3 the first one to include
HBase, correct? And is 0.89 the only one supported?
Moreover, are we saying that there is no way to prevent stock hbase
0.20.6
Post by George P. Stathis
and hadoop 0.20.2 from losing data when a single node goes down? It does
not
Post by George P. Stathis
matter if the data is replicated, it will still get lost?
-GS
Post by Todd Lipcon
Hi George,
The data loss problems you mentioned below are known issues when running
on
Post by George P. Stathis
Post by Todd Lipcon
stock Apache 0.20.x hadoop.
You should consider upgrading to CDH3b2, which includes a number of HDFS
patches that allow HBase to durably store data. You'll also have to
upgrade
Post by George P. Stathis
Post by Todd Lipcon
to HBase 0.89 - we ship a version as part of CDH that will work well.
Thanks
-Todd
On Sun, Sep 19, 2010 at 6:57 AM, George P. Stathis <
Post by George P. Stathis
Hi folks. I'd like to run the following data loss scenario by you to
see
Post by George P. Stathis
Post by Todd Lipcon
if
Post by George P. Stathis
we are doing something obviously wrong with our setup here.
- Hadoop 0.20.1
- HBase 0.20.3
- 1 Master Node running Nameserver, SecondaryNameserver, JobTracker,
HMaster and 1 Zookeeper (no zookeeper quorum right now)
- 4 child nodes running a Datanode, TaskTracker and RegionServer
each
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
- dfs.replication is set to 2
- Host: Amazon EC2
Up until yesterday, we were frequently experiencing
HBASE-2077<https://issues.apache.org/jira/browse/HBASE-2077>,
which kept bringing our RegionServers down. What we realized though is
that
Post by George P. Stathis
we were losing data (a few hours worth) with just one out of four
regionservers going down. This is problematic since we are supposed to
replicate at x2 out of 4 nodes, so at least one other node should be
able
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
to
theoretically serve the data that the downed regionserver can't.
- When a regionserver goes down unexpectedly, the only data that
theoretically gets lost was whatever didn't make it to the WAL,
right?
Post by George P. Stathis
Post by Todd Lipcon
Or
Post by George P. Stathis
wrong? E.g.
http://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-log.html
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
- We ran a hadoop fsck on our cluster and verified the replication
factor
Post by George P. Stathis
as well as that the were no under replicated blocks. So why was our
data
Post by George P. Stathis
not
available from another node?
- If the log gets rolled every 60 minutes by default (we haven't
touched
Post by George P. Stathis
the defaults), how can we lose data from up to 24 hours ago?
- When the downed regionserver comes back up, shouldn't that data be
available again? Ours wasn't.
- In such scenarios, is there a recommended approach for restoring
the
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
regionserver that goes down? We just brought them back up by logging
on
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
the
node itself an manually restarting them first. Now we have automated crons
that listen for their ports and restart them if they go down within
two
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
minutes.
- Are there way to recover such lost data?
- Are versions 0.89 / 0.90 addressing any of these issues?
- Curiosity question: when a regionserver goes down, does the master
try
Post by George P. Stathis
to replicate that node's data on another node to satisfy the dfs.replication
ratio?
For now, we have upgraded our HBase to 0.20.6, which is supposed to
contain
Post by George P. Stathis
the HBASE-2077 <https://issues.apache.org/jira/browse/HBASE-2077> fix
(but
Post by George P. Stathis
no one has verified yet). Lars' blog also suggests that Hadoop 0.21.0
is
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
the
way to go to avoid the file append issues but it's not production
ready
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
yet. Should we stick to 0.20.1? Upgrade to 0.20.2?
Any tips here are definitely appreciated. I'll be happy to provide
more
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
information as well.
-GS
--
Todd Lipcon
Software Engineer, Cloudera
Ryan Rawson
2010-09-20 20:55:48 UTC
Permalink
When you say replication what exactly do you mean? In normal HDFS, as
you write the data is sent to 3 nodes yes, but with the flaw I
outlined, it doesnt matter because the datanodes and namenode will
pretend a data block just didnt exist if it wasnt closed properly.

So even with the most careful white glove handling of hbase, you will
eventually have a crash and you will lose data w/o 0.89/CDH3 et. al.
You can circumvent this by storing the data elsewhere and spooling
into hbase, or perhaps just not minding if you lose data (yes those
applications exist).

Looking at those JIRAs in question, the first is already on trunk
which is 0.89. The second isn't alas. At this point the
transactional hbase just isnt being actively maintained by any
committer and we are reliant on kind people's contributions. So I
can't promise when it will hit 0.89/0.90.

-ryan
Post by George P. Stathis
Thanks for the response Ryan. I have no doubt that 0.89 can be used in
production and that it has strong support. I just wanted to avoid moving to
it now because we have limited resources and it would put a dent in our
roadmap if we were to fast track the migration now. Specifically, we are
using HBASE-2438 and HBASE-2426 to support pagination across indexes. So we
either have to migrate those to 0.89 or somehow go stock and be able to
support pagination across region servers.
Of course, if the choice is between migrating or losing more data, data
safety comes first. But if we can buy two or three more months of time and
avoid region server crashes (like you did for a year), maybe we can go that
route for now. What do we need to do achieve that?
-GS
PS: Out of curiosity, I understand the WAL log append issue for a single
regionserver when it comes to losing the data on a single node. But if that
data is also being replicated on another region server, why wouldn't it be
available there? Or is the WAL log shared across multiple region servers
(maybe that's what I'm missing)?
Post by Ryan Rawson
Hey,
The problem is that the stock 0.20 hadoop wont let you read from a
non-closed file.  It will report that length as 0.  So if a
regionserver crashes, that last WAL log that is still open becomes 0
length and the data within in unreadable.  That specifically is the
problem of data loss.  You could always make it so your regionservers
rarely crash - this is possible btw and I did it for over a year.
But you will want to run CDH3 or the append-branch releases to get the
series of patches that fix this hole.  It also happens that only 0.89
runs on it.  I would like to avoid the hadoop "everyone uses 0.20
forever" problem and talk about what we could do to help you get on
0.89.  Over here at SU we've made a commitment to the future of 0.89
and are running it in production.  Let us know what else you'd need.
-ryan
On Mon, Sep 20, 2010 at 12:39 PM, George P. Stathis
Post by George P. Stathis
Thanks Todd. We are not quite ready to move to 0.89 yet. We have made
custom
Post by George P. Stathis
modifications to the transactional contrib sources which are now taken
out
Post by George P. Stathis
of 0.89. We are planning on moving to 0.90 when it comes out and at that
point, either migrate our customizations, or move back to the out-of-the
box
Post by George P. Stathis
features (which will require a re-write of our code).
We are well aware of the CDH distros but at the time we started with
hbase,
Post by George P. Stathis
there was none that included HBase. I think CDH3 the first one to include
HBase, correct? And is 0.89 the only one supported?
Moreover, are we saying that there is no way to prevent stock hbase
0.20.6
Post by George P. Stathis
and hadoop 0.20.2 from losing data when a single node goes down? It does
not
Post by George P. Stathis
matter if the data is replicated, it will still get lost?
-GS
Post by Todd Lipcon
Hi George,
The data loss problems you mentioned below are known issues when running
on
Post by George P. Stathis
Post by Todd Lipcon
stock Apache 0.20.x hadoop.
You should consider upgrading to CDH3b2, which includes a number of HDFS
patches that allow HBase to durably store data. You'll also have to
upgrade
Post by George P. Stathis
Post by Todd Lipcon
to HBase 0.89 - we ship a version as part of CDH that will work well.
Thanks
-Todd
On Sun, Sep 19, 2010 at 6:57 AM, George P. Stathis <
Post by George P. Stathis
Hi folks. I'd like to run the following data loss scenario by you to
see
Post by George P. Stathis
Post by Todd Lipcon
if
Post by George P. Stathis
we are doing something obviously wrong with our setup here.
  - Hadoop 0.20.1
  - HBase 0.20.3
  - 1 Master Node running Nameserver, SecondaryNameserver, JobTracker,
  HMaster and 1 Zookeeper (no zookeeper quorum right now)
  - 4 child nodes running a Datanode, TaskTracker and RegionServer
each
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
  - dfs.replication is set to 2
  - Host: Amazon EC2
Up until yesterday, we were frequently experiencing
HBASE-2077<https://issues.apache.org/jira/browse/HBASE-2077>,
which kept bringing our RegionServers down. What we realized though is
that
Post by George P. Stathis
we were losing data (a few hours worth) with just one out of four
regionservers going down. This is problematic since we are supposed to
replicate at x2 out of 4 nodes, so at least one other node should be
able
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
to
theoretically serve the data that the downed regionserver can't.
  - When a regionserver goes down unexpectedly, the only data that
  theoretically gets lost was whatever didn't make it to the WAL,
right?
Post by George P. Stathis
Post by Todd Lipcon
Or
Post by George P. Stathis
  wrong? E.g.
http://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-log.html
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
  - We ran a hadoop fsck on our cluster and verified the replication
factor
Post by George P. Stathis
  as well as that the were no under replicated blocks. So why was our
data
Post by George P. Stathis
not
  available from another node?
  - If the log gets rolled every 60 minutes by default (we haven't
touched
Post by George P. Stathis
  the defaults), how can we lose data from up to 24 hours ago?
  - When the downed regionserver comes back up, shouldn't that data be
  available again? Ours wasn't.
  - In such scenarios, is there a recommended approach for restoring
the
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
  regionserver that goes down? We just brought them back up by logging
on
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
the
  node itself an manually restarting them first. Now we have automated
crons
  that listen for their ports and restart them if they go down within
two
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
  minutes.
  - Are there way to recover such lost data?
  - Are versions 0.89 / 0.90 addressing any of these issues?
  - Curiosity question: when a regionserver goes down, does the master
try
Post by George P. Stathis
  to replicate that node's data on another node to satisfy the
dfs.replication
  ratio?
For now, we have upgraded our HBase to 0.20.6, which is supposed to
contain
Post by George P. Stathis
the HBASE-2077 <https://issues.apache.org/jira/browse/HBASE-2077> fix
(but
Post by George P. Stathis
no one has verified yet). Lars' blog also suggests that Hadoop 0.21.0
is
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
the
way to go to avoid the  file append issues but it's not production
ready
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
yet. Should we stick to 0.20.1? Upgrade to 0.20.2?
Any tips here are definitely appreciated. I'll be happy to provide
more
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
information as well.
-GS
--
Todd Lipcon
Software Engineer, Cloudera
George P. Stathis
2010-09-21 00:19:28 UTC
Permalink
Post by Ryan Rawson
When you say replication what exactly do you mean? In normal HDFS, as
you write the data is sent to 3 nodes yes, but with the flaw I
outlined, it doesnt matter because the datanodes and namenode will
pretend a data block just didnt exist if it wasnt closed properly.
That's the part I was not understanding. I do now. Thanks.
Post by Ryan Rawson
So even with the most careful white glove handling of hbase, you will
eventually have a crash and you will lose data w/o 0.89/CDH3 et. al.
You can circumvent this by storing the data elsewhere and spooling
into hbase, or perhaps just not minding if you lose data (yes those
applications exist).
Looking at those JIRAs in question, the first is already on trunk
which is 0.89. The second isn't alas. At this point the
transactional hbase just isnt being actively maintained by any
committer and we are reliant on kind people's contributions. So I
can't promise when it will hit 0.89/0.90.
Are you aware of any indexing alternatives in 0.89?
Post by Ryan Rawson
-ryan
Post by George P. Stathis
Thanks for the response Ryan. I have no doubt that 0.89 can be used in
production and that it has strong support. I just wanted to avoid moving
to
Post by George P. Stathis
it now because we have limited resources and it would put a dent in our
roadmap if we were to fast track the migration now. Specifically, we are
using HBASE-2438 and HBASE-2426 to support pagination across indexes. So
we
Post by George P. Stathis
either have to migrate those to 0.89 or somehow go stock and be able to
support pagination across region servers.
Of course, if the choice is between migrating or losing more data, data
safety comes first. But if we can buy two or three more months of time
and
Post by George P. Stathis
avoid region server crashes (like you did for a year), maybe we can go
that
Post by George P. Stathis
route for now. What do we need to do achieve that?
-GS
PS: Out of curiosity, I understand the WAL log append issue for a single
regionserver when it comes to losing the data on a single node. But if
that
Post by George P. Stathis
data is also being replicated on another region server, why wouldn't it
be
Post by George P. Stathis
available there? Or is the WAL log shared across multiple region servers
(maybe that's what I'm missing)?
Post by Ryan Rawson
Hey,
The problem is that the stock 0.20 hadoop wont let you read from a
non-closed file. It will report that length as 0. So if a
regionserver crashes, that last WAL log that is still open becomes 0
length and the data within in unreadable. That specifically is the
problem of data loss. You could always make it so your regionservers
rarely crash - this is possible btw and I did it for over a year.
But you will want to run CDH3 or the append-branch releases to get the
series of patches that fix this hole. It also happens that only 0.89
runs on it. I would like to avoid the hadoop "everyone uses 0.20
forever" problem and talk about what we could do to help you get on
0.89. Over here at SU we've made a commitment to the future of 0.89
and are running it in production. Let us know what else you'd need.
-ryan
On Mon, Sep 20, 2010 at 12:39 PM, George P. Stathis
Post by George P. Stathis
Thanks Todd. We are not quite ready to move to 0.89 yet. We have made
custom
Post by George P. Stathis
modifications to the transactional contrib sources which are now taken
out
Post by George P. Stathis
of 0.89. We are planning on moving to 0.90 when it comes out and at
that
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
point, either migrate our customizations, or move back to the
out-of-the
Post by George P. Stathis
Post by Ryan Rawson
box
Post by George P. Stathis
features (which will require a re-write of our code).
We are well aware of the CDH distros but at the time we started with
hbase,
Post by George P. Stathis
there was none that included HBase. I think CDH3 the first one to
include
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
HBase, correct? And is 0.89 the only one supported?
Moreover, are we saying that there is no way to prevent stock hbase
0.20.6
Post by George P. Stathis
and hadoop 0.20.2 from losing data when a single node goes down? It
does
Post by George P. Stathis
Post by Ryan Rawson
not
Post by George P. Stathis
matter if the data is replicated, it will still get lost?
-GS
Post by Todd Lipcon
Hi George,
The data loss problems you mentioned below are known issues when
running
Post by George P. Stathis
Post by Ryan Rawson
on
Post by George P. Stathis
Post by Todd Lipcon
stock Apache 0.20.x hadoop.
You should consider upgrading to CDH3b2, which includes a number of
HDFS
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
patches that allow HBase to durably store data. You'll also have to
upgrade
Post by George P. Stathis
Post by Todd Lipcon
to HBase 0.89 - we ship a version as part of CDH that will work well.
Thanks
-Todd
On Sun, Sep 19, 2010 at 6:57 AM, George P. Stathis <
Post by George P. Stathis
Hi folks. I'd like to run the following data loss scenario by you
to
Post by George P. Stathis
Post by Ryan Rawson
see
Post by George P. Stathis
Post by Todd Lipcon
if
Post by George P. Stathis
we are doing something obviously wrong with our setup here.
- Hadoop 0.20.1
- HBase 0.20.3
- 1 Master Node running Nameserver, SecondaryNameserver,
JobTracker,
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
HMaster and 1 Zookeeper (no zookeeper quorum right now)
- 4 child nodes running a Datanode, TaskTracker and RegionServer
each
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
- dfs.replication is set to 2
- Host: Amazon EC2
Up until yesterday, we were frequently experiencing
HBASE-2077<https://issues.apache.org/jira/browse/HBASE-2077>,
which kept bringing our RegionServers down. What we realized though
is
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
that
Post by George P. Stathis
we were losing data (a few hours worth) with just one out of four
regionservers going down. This is problematic since we are supposed
to
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
replicate at x2 out of 4 nodes, so at least one other node should
be
Post by George P. Stathis
Post by Ryan Rawson
able
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
to
theoretically serve the data that the downed regionserver can't.
- When a regionserver goes down unexpectedly, the only data that
theoretically gets lost was whatever didn't make it to the WAL,
right?
Post by George P. Stathis
Post by Todd Lipcon
Or
Post by George P. Stathis
wrong? E.g.
http://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-log.html
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
- We ran a hadoop fsck on our cluster and verified the
replication
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
factor
Post by George P. Stathis
as well as that the were no under replicated blocks. So why was
our
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
data
Post by George P. Stathis
not
available from another node?
- If the log gets rolled every 60 minutes by default (we haven't
touched
Post by George P. Stathis
the defaults), how can we lose data from up to 24 hours ago?
- When the downed regionserver comes back up, shouldn't that data
be
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
available again? Ours wasn't.
- In such scenarios, is there a recommended approach for
restoring
Post by George P. Stathis
Post by Ryan Rawson
the
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
regionserver that goes down? We just brought them back up by
logging
Post by George P. Stathis
Post by Ryan Rawson
on
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
the
node itself an manually restarting them first. Now we have
automated
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
crons
that listen for their ports and restart them if they go down
within
Post by George P. Stathis
Post by Ryan Rawson
two
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
minutes.
- Are there way to recover such lost data?
- Are versions 0.89 / 0.90 addressing any of these issues?
- Curiosity question: when a regionserver goes down, does the
master
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
try
Post by George P. Stathis
to replicate that node's data on another node to satisfy the
dfs.replication
ratio?
For now, we have upgraded our HBase to 0.20.6, which is supposed to
contain
Post by George P. Stathis
the HBASE-2077 <https://issues.apache.org/jira/browse/HBASE-2077>
fix
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
(but
Post by George P. Stathis
no one has verified yet). Lars' blog also suggests that Hadoop
0.21.0
Post by George P. Stathis
Post by Ryan Rawson
is
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
the
way to go to avoid the file append issues but it's not production
ready
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
yet. Should we stick to 0.20.1? Upgrade to 0.20.2?
Any tips here are definitely appreciated. I'll be happy to provide
more
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
information as well.
-GS
--
Todd Lipcon
Software Engineer, Cloudera
Ryan Rawson
2010-09-21 00:43:55 UTC
Permalink
hi,

sorry i dont. i think the current transactional/indexed person is
working on bringing it up to 0.89, perhaps they would enjoy your help
in testing or porting the code?

I'll poke a few people into replying.

-ryan
Post by George P. Stathis
When you say replication what exactly do you mean?  In normal HDFS, as
you write the data is sent to 3 nodes yes, but with the flaw I
outlined, it doesnt matter because the datanodes and namenode will
pretend a data block just didnt exist if it wasnt closed properly.
That's the part I was not understanding. I do now. Thanks.
So even with the most careful white glove handling of hbase, you will
eventually have a crash and you will lose data w/o 0.89/CDH3 et. al.
You can circumvent this by storing the data elsewhere and spooling
into hbase, or perhaps just not minding if you lose data (yes those
applications exist).
Looking at those JIRAs in question, the first is already on trunk
which is 0.89.  The second isn't alas.  At this point the
transactional hbase just isnt being actively maintained by any
committer and we are reliant on kind people's contributions.  So I
can't promise when it will hit 0.89/0.90.
Are you aware of any indexing alternatives in 0.89?
-ryan
Post by George P. Stathis
Thanks for the response Ryan. I have no doubt that 0.89 can be used in
production and that it has strong support. I just wanted to avoid moving
to
Post by George P. Stathis
it now because we have limited resources and it would put a dent in our
roadmap if we were to fast track the migration now. Specifically, we are
using HBASE-2438 and HBASE-2426 to support pagination across indexes. So
we
Post by George P. Stathis
either have to migrate those to 0.89 or somehow go stock and be able to
support pagination across region servers.
Of course, if the choice is between migrating or losing more data, data
safety comes first. But if we can buy two or three more months of time
and
Post by George P. Stathis
avoid region server crashes (like you did for a year), maybe we can go
that
Post by George P. Stathis
route for now. What do we need to do achieve that?
-GS
PS: Out of curiosity, I understand the WAL log append issue for a single
regionserver when it comes to losing the data on a single node. But if
that
Post by George P. Stathis
data is also being replicated on another region server, why wouldn't it
be
Post by George P. Stathis
available there? Or is the WAL log shared across multiple region servers
(maybe that's what I'm missing)?
Post by Ryan Rawson
Hey,
The problem is that the stock 0.20 hadoop wont let you read from a
non-closed file.  It will report that length as 0.  So if a
regionserver crashes, that last WAL log that is still open becomes 0
length and the data within in unreadable.  That specifically is the
problem of data loss.  You could always make it so your regionservers
rarely crash - this is possible btw and I did it for over a year.
But you will want to run CDH3 or the append-branch releases to get the
series of patches that fix this hole.  It also happens that only 0.89
runs on it.  I would like to avoid the hadoop "everyone uses 0.20
forever" problem and talk about what we could do to help you get on
0.89.  Over here at SU we've made a commitment to the future of 0.89
and are running it in production.  Let us know what else you'd need.
-ryan
On Mon, Sep 20, 2010 at 12:39 PM, George P. Stathis
Post by George P. Stathis
Thanks Todd. We are not quite ready to move to 0.89 yet. We have made
custom
Post by George P. Stathis
modifications to the transactional contrib sources which are now taken
out
Post by George P. Stathis
of 0.89. We are planning on moving to 0.90 when it comes out and at
that
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
point, either migrate our customizations, or move back to the
out-of-the
Post by George P. Stathis
Post by Ryan Rawson
box
Post by George P. Stathis
features (which will require a re-write of our code).
We are well aware of the CDH distros but at the time we started with
hbase,
Post by George P. Stathis
there was none that included HBase. I think CDH3 the first one to
include
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
HBase, correct? And is 0.89 the only one supported?
Moreover, are we saying that there is no way to prevent stock hbase
0.20.6
Post by George P. Stathis
and hadoop 0.20.2 from losing data when a single node goes down? It
does
Post by George P. Stathis
Post by Ryan Rawson
not
Post by George P. Stathis
matter if the data is replicated, it will still get lost?
-GS
Post by Todd Lipcon
Hi George,
The data loss problems you mentioned below are known issues when
running
Post by George P. Stathis
Post by Ryan Rawson
on
Post by George P. Stathis
Post by Todd Lipcon
stock Apache 0.20.x hadoop.
You should consider upgrading to CDH3b2, which includes a number of
HDFS
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
patches that allow HBase to durably store data. You'll also have to
upgrade
Post by George P. Stathis
Post by Todd Lipcon
to HBase 0.89 - we ship a version as part of CDH that will work well.
Thanks
-Todd
On Sun, Sep 19, 2010 at 6:57 AM, George P. Stathis <
Post by George P. Stathis
Hi folks. I'd like to run the following data loss scenario by you
to
Post by George P. Stathis
Post by Ryan Rawson
see
Post by George P. Stathis
Post by Todd Lipcon
if
Post by George P. Stathis
we are doing something obviously wrong with our setup here.
  - Hadoop 0.20.1
  - HBase 0.20.3
  - 1 Master Node running Nameserver, SecondaryNameserver,
JobTracker,
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
  HMaster and 1 Zookeeper (no zookeeper quorum right now)
  - 4 child nodes running a Datanode, TaskTracker and RegionServer
each
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
  - dfs.replication is set to 2
  - Host: Amazon EC2
Up until yesterday, we were frequently experiencing
HBASE-2077<https://issues.apache.org/jira/browse/HBASE-2077>,
which kept bringing our RegionServers down. What we realized though
is
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
that
Post by George P. Stathis
we were losing data (a few hours worth) with just one out of four
regionservers going down. This is problematic since we are supposed
to
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
replicate at x2 out of 4 nodes, so at least one other node should
be
Post by George P. Stathis
Post by Ryan Rawson
able
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
to
theoretically serve the data that the downed regionserver can't.
  - When a regionserver goes down unexpectedly, the only data that
  theoretically gets lost was whatever didn't make it to the WAL,
right?
Post by George P. Stathis
Post by Todd Lipcon
Or
Post by George P. Stathis
  wrong? E.g.
http://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-log.html
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
  - We ran a hadoop fsck on our cluster and verified the
replication
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
factor
Post by George P. Stathis
  as well as that the were no under replicated blocks. So why was
our
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
data
Post by George P. Stathis
not
  available from another node?
  - If the log gets rolled every 60 minutes by default (we haven't
touched
Post by George P. Stathis
  the defaults), how can we lose data from up to 24 hours ago?
  - When the downed regionserver comes back up, shouldn't that data
be
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
  available again? Ours wasn't.
  - In such scenarios, is there a recommended approach for
restoring
Post by George P. Stathis
Post by Ryan Rawson
the
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
  regionserver that goes down? We just brought them back up by
logging
Post by George P. Stathis
Post by Ryan Rawson
on
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
the
  node itself an manually restarting them first. Now we have
automated
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
crons
  that listen for their ports and restart them if they go down
within
Post by George P. Stathis
Post by Ryan Rawson
two
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
  minutes.
  - Are there way to recover such lost data?
  - Are versions 0.89 / 0.90 addressing any of these issues?
  - Curiosity question: when a regionserver goes down, does the
master
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
try
Post by George P. Stathis
  to replicate that node's data on another node to satisfy the
dfs.replication
  ratio?
For now, we have upgraded our HBase to 0.20.6, which is supposed to
contain
Post by George P. Stathis
the HBASE-2077 <https://issues.apache.org/jira/browse/HBASE-2077>
fix
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
(but
Post by George P. Stathis
no one has verified yet). Lars' blog also suggests that Hadoop
0.21.0
Post by George P. Stathis
Post by Ryan Rawson
is
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
the
way to go to avoid the  file append issues but it's not production
ready
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
yet. Should we stick to 0.20.1? Upgrade to 0.20.2?
Any tips here are definitely appreciated. I'll be happy to provide
more
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
information as well.
-GS
--
Todd Lipcon
Software Engineer, Cloudera
Stack
2010-09-23 03:59:46 UTC
Permalink
Hey George:

James Kennedy is working on getting transactional hbase working w/
hbase TRUNK. Watch HBASE-2641 for the drop of changes needed in core
to make it so his github THBase can use HBase core.

St.Ack
hi,
sorry i dont.  i think the current transactional/indexed person is
working on bringing it up to 0.89, perhaps they would enjoy your help
in testing or porting the code?
I'll poke a few people into replying.
-ryan
Post by George P. Stathis
When you say replication what exactly do you mean?  In normal HDFS, as
you write the data is sent to 3 nodes yes, but with the flaw I
outlined, it doesnt matter because the datanodes and namenode will
pretend a data block just didnt exist if it wasnt closed properly.
That's the part I was not understanding. I do now. Thanks.
So even with the most careful white glove handling of hbase, you will
eventually have a crash and you will lose data w/o 0.89/CDH3 et. al.
You can circumvent this by storing the data elsewhere and spooling
into hbase, or perhaps just not minding if you lose data (yes those
applications exist).
Looking at those JIRAs in question, the first is already on trunk
which is 0.89.  The second isn't alas.  At this point the
transactional hbase just isnt being actively maintained by any
committer and we are reliant on kind people's contributions.  So I
can't promise when it will hit 0.89/0.90.
Are you aware of any indexing alternatives in 0.89?
-ryan
Post by George P. Stathis
Thanks for the response Ryan. I have no doubt that 0.89 can be used in
production and that it has strong support. I just wanted to avoid moving
to
Post by George P. Stathis
it now because we have limited resources and it would put a dent in our
roadmap if we were to fast track the migration now. Specifically, we are
using HBASE-2438 and HBASE-2426 to support pagination across indexes. So
we
Post by George P. Stathis
either have to migrate those to 0.89 or somehow go stock and be able to
support pagination across region servers.
Of course, if the choice is between migrating or losing more data, data
safety comes first. But if we can buy two or three more months of time
and
Post by George P. Stathis
avoid region server crashes (like you did for a year), maybe we can go
that
Post by George P. Stathis
route for now. What do we need to do achieve that?
-GS
PS: Out of curiosity, I understand the WAL log append issue for a single
regionserver when it comes to losing the data on a single node. But if
that
Post by George P. Stathis
data is also being replicated on another region server, why wouldn't it
be
Post by George P. Stathis
available there? Or is the WAL log shared across multiple region servers
(maybe that's what I'm missing)?
Post by Ryan Rawson
Hey,
The problem is that the stock 0.20 hadoop wont let you read from a
non-closed file.  It will report that length as 0.  So if a
regionserver crashes, that last WAL log that is still open becomes 0
length and the data within in unreadable.  That specifically is the
problem of data loss.  You could always make it so your regionservers
rarely crash - this is possible btw and I did it for over a year.
But you will want to run CDH3 or the append-branch releases to get the
series of patches that fix this hole.  It also happens that only 0.89
runs on it.  I would like to avoid the hadoop "everyone uses 0.20
forever" problem and talk about what we could do to help you get on
0.89.  Over here at SU we've made a commitment to the future of 0.89
and are running it in production.  Let us know what else you'd need.
-ryan
On Mon, Sep 20, 2010 at 12:39 PM, George P. Stathis
Post by George P. Stathis
Thanks Todd. We are not quite ready to move to 0.89 yet. We have made
custom
Post by George P. Stathis
modifications to the transactional contrib sources which are now taken
out
Post by George P. Stathis
of 0.89. We are planning on moving to 0.90 when it comes out and at
that
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
point, either migrate our customizations, or move back to the
out-of-the
Post by George P. Stathis
Post by Ryan Rawson
box
Post by George P. Stathis
features (which will require a re-write of our code).
We are well aware of the CDH distros but at the time we started with
hbase,
Post by George P. Stathis
there was none that included HBase. I think CDH3 the first one to
include
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
HBase, correct? And is 0.89 the only one supported?
Moreover, are we saying that there is no way to prevent stock hbase
0.20.6
Post by George P. Stathis
and hadoop 0.20.2 from losing data when a single node goes down? It
does
Post by George P. Stathis
Post by Ryan Rawson
not
Post by George P. Stathis
matter if the data is replicated, it will still get lost?
-GS
Post by Todd Lipcon
Hi George,
The data loss problems you mentioned below are known issues when
running
Post by George P. Stathis
Post by Ryan Rawson
on
Post by George P. Stathis
Post by Todd Lipcon
stock Apache 0.20.x hadoop.
You should consider upgrading to CDH3b2, which includes a number of
HDFS
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
patches that allow HBase to durably store data. You'll also have to
upgrade
Post by George P. Stathis
Post by Todd Lipcon
to HBase 0.89 - we ship a version as part of CDH that will work well.
Thanks
-Todd
On Sun, Sep 19, 2010 at 6:57 AM, George P. Stathis <
Post by George P. Stathis
Hi folks. I'd like to run the following data loss scenario by you
to
Post by George P. Stathis
Post by Ryan Rawson
see
Post by George P. Stathis
Post by Todd Lipcon
if
Post by George P. Stathis
we are doing something obviously wrong with our setup here.
  - Hadoop 0.20.1
  - HBase 0.20.3
  - 1 Master Node running Nameserver, SecondaryNameserver,
JobTracker,
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
  HMaster and 1 Zookeeper (no zookeeper quorum right now)
  - 4 child nodes running a Datanode, TaskTracker and RegionServer
each
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
  - dfs.replication is set to 2
  - Host: Amazon EC2
Up until yesterday, we were frequently experiencing
HBASE-2077<https://issues.apache.org/jira/browse/HBASE-2077>,
which kept bringing our RegionServers down. What we realized though
is
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
that
Post by George P. Stathis
we were losing data (a few hours worth) with just one out of four
regionservers going down. This is problematic since we are supposed
to
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
replicate at x2 out of 4 nodes, so at least one other node should
be
Post by George P. Stathis
Post by Ryan Rawson
able
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
to
theoretically serve the data that the downed regionserver can't.
  - When a regionserver goes down unexpectedly, the only data that
  theoretically gets lost was whatever didn't make it to the WAL,
right?
Post by George P. Stathis
Post by Todd Lipcon
Or
Post by George P. Stathis
  wrong? E.g.
http://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-log.html
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
  - We ran a hadoop fsck on our cluster and verified the
replication
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
factor
Post by George P. Stathis
  as well as that the were no under replicated blocks. So why was
our
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
data
Post by George P. Stathis
not
  available from another node?
  - If the log gets rolled every 60 minutes by default (we haven't
touched
Post by George P. Stathis
  the defaults), how can we lose data from up to 24 hours ago?
  - When the downed regionserver comes back up, shouldn't that data
be
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
  available again? Ours wasn't.
  - In such scenarios, is there a recommended approach for
restoring
Post by George P. Stathis
Post by Ryan Rawson
the
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
  regionserver that goes down? We just brought them back up by
logging
Post by George P. Stathis
Post by Ryan Rawson
on
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
the
  node itself an manually restarting them first. Now we have
automated
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
crons
  that listen for their ports and restart them if they go down
within
Post by George P. Stathis
Post by Ryan Rawson
two
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
  minutes.
  - Are there way to recover such lost data?
  - Are versions 0.89 / 0.90 addressing any of these issues?
  - Curiosity question: when a regionserver goes down, does the
master
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
try
Post by George P. Stathis
  to replicate that node's data on another node to satisfy the
dfs.replication
  ratio?
For now, we have upgraded our HBase to 0.20.6, which is supposed to
contain
Post by George P. Stathis
the HBASE-2077 <https://issues.apache.org/jira/browse/HBASE-2077>
fix
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
(but
Post by George P. Stathis
no one has verified yet). Lars' blog also suggests that Hadoop
0.21.0
Post by George P. Stathis
Post by Ryan Rawson
is
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
the
way to go to avoid the  file append issues but it's not production
ready
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
yet. Should we stick to 0.20.1? Upgrade to 0.20.2?
Any tips here are definitely appreciated. I'll be happy to provide
more
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
information as well.
-GS
--
Todd Lipcon
Software Engineer, Cloudera
George P. Stathis
2010-09-23 16:43:07 UTC
Permalink
I'm there. Thanks St.Ack.
Post by Stack
James Kennedy is working on getting transactional hbase working w/
hbase TRUNK. Watch HBASE-2641 for the drop of changes needed in core
to make it so his github THBase can use HBase core.
St.Ack
Post by Ryan Rawson
hi,
sorry i dont. i think the current transactional/indexed person is
working on bringing it up to 0.89, perhaps they would enjoy your help
in testing or porting the code?
I'll poke a few people into replying.
-ryan
Post by George P. Stathis
Post by Ryan Rawson
When you say replication what exactly do you mean? In normal HDFS, as
you write the data is sent to 3 nodes yes, but with the flaw I
outlined, it doesnt matter because the datanodes and namenode will
pretend a data block just didnt exist if it wasnt closed properly.
That's the part I was not understanding. I do now. Thanks.
Post by Ryan Rawson
So even with the most careful white glove handling of hbase, you will
eventually have a crash and you will lose data w/o 0.89/CDH3 et. al.
You can circumvent this by storing the data elsewhere and spooling
into hbase, or perhaps just not minding if you lose data (yes those
applications exist).
Looking at those JIRAs in question, the first is already on trunk
which is 0.89. The second isn't alas. At this point the
transactional hbase just isnt being actively maintained by any
committer and we are reliant on kind people's contributions. So I
can't promise when it will hit 0.89/0.90.
Are you aware of any indexing alternatives in 0.89?
Post by Ryan Rawson
-ryan
On Mon, Sep 20, 2010 at 1:21 PM, George P. Stathis <
Post by George P. Stathis
Thanks for the response Ryan. I have no doubt that 0.89 can be used
in
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
production and that it has strong support. I just wanted to avoid
moving
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
to
Post by George P. Stathis
it now because we have limited resources and it would put a dent in
our
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
roadmap if we were to fast track the migration now. Specifically, we
are
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
using HBASE-2438 and HBASE-2426 to support pagination across indexes.
So
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
we
Post by George P. Stathis
either have to migrate those to 0.89 or somehow go stock and be able
to
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
support pagination across region servers.
Of course, if the choice is between migrating or losing more data,
data
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
safety comes first. But if we can buy two or three more months of
time
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
and
Post by George P. Stathis
avoid region server crashes (like you did for a year), maybe we can
go
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
that
Post by George P. Stathis
route for now. What do we need to do achieve that?
-GS
PS: Out of curiosity, I understand the WAL log append issue for a
single
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
regionserver when it comes to losing the data on a single node. But
if
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
that
Post by George P. Stathis
data is also being replicated on another region server, why wouldn't
it
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
be
Post by George P. Stathis
available there? Or is the WAL log shared across multiple region
servers
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
(maybe that's what I'm missing)?
Post by Ryan Rawson
Hey,
The problem is that the stock 0.20 hadoop wont let you read from a
non-closed file. It will report that length as 0. So if a
regionserver crashes, that last WAL log that is still open becomes 0
length and the data within in unreadable. That specifically is the
problem of data loss. You could always make it so your
regionservers
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
rarely crash - this is possible btw and I did it for over a year.
But you will want to run CDH3 or the append-branch releases to get
the
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
series of patches that fix this hole. It also happens that only
0.89
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
runs on it. I would like to avoid the hadoop "everyone uses 0.20
forever" problem and talk about what we could do to help you get on
0.89. Over here at SU we've made a commitment to the future of 0.89
and are running it in production. Let us know what else you'd need.
-ryan
On Mon, Sep 20, 2010 at 12:39 PM, George P. Stathis
Post by George P. Stathis
Thanks Todd. We are not quite ready to move to 0.89 yet. We have
made
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
custom
Post by George P. Stathis
modifications to the transactional contrib sources which are now
taken
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
out
Post by George P. Stathis
of 0.89. We are planning on moving to 0.90 when it comes out and
at
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
that
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
point, either migrate our customizations, or move back to the
out-of-the
Post by George P. Stathis
Post by Ryan Rawson
box
Post by George P. Stathis
features (which will require a re-write of our code).
We are well aware of the CDH distros but at the time we started
with
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
hbase,
Post by George P. Stathis
there was none that included HBase. I think CDH3 the first one to
include
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
HBase, correct? And is 0.89 the only one supported?
Moreover, are we saying that there is no way to prevent stock
hbase
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
0.20.6
Post by George P. Stathis
and hadoop 0.20.2 from losing data when a single node goes down?
It
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
does
Post by George P. Stathis
Post by Ryan Rawson
not
Post by George P. Stathis
matter if the data is replicated, it will still get lost?
-GS
Post by Todd Lipcon
Hi George,
The data loss problems you mentioned below are known issues when
running
Post by George P. Stathis
Post by Ryan Rawson
on
Post by George P. Stathis
Post by Todd Lipcon
stock Apache 0.20.x hadoop.
You should consider upgrading to CDH3b2, which includes a number
of
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
HDFS
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
patches that allow HBase to durably store data. You'll also have
to
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
upgrade
Post by George P. Stathis
Post by Todd Lipcon
to HBase 0.89 - we ship a version as part of CDH that will work
well.
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
Thanks
-Todd
On Sun, Sep 19, 2010 at 6:57 AM, George P. Stathis <
Post by George P. Stathis
Hi folks. I'd like to run the following data loss scenario by
you
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
to
Post by George P. Stathis
Post by Ryan Rawson
see
Post by George P. Stathis
Post by Todd Lipcon
if
Post by George P. Stathis
we are doing something obviously wrong with our setup here.
- Hadoop 0.20.1
- HBase 0.20.3
- 1 Master Node running Nameserver, SecondaryNameserver,
JobTracker,
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
HMaster and 1 Zookeeper (no zookeeper quorum right now)
- 4 child nodes running a Datanode, TaskTracker and
RegionServer
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
each
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
- dfs.replication is set to 2
- Host: Amazon EC2
Up until yesterday, we were frequently experiencing
HBASE-2077<https://issues.apache.org/jira/browse/HBASE-2077>,
which kept bringing our RegionServers down. What we realized
though
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
is
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
that
Post by George P. Stathis
we were losing data (a few hours worth) with just one out of
four
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
regionservers going down. This is problematic since we are
supposed
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
to
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
replicate at x2 out of 4 nodes, so at least one other node
should
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
be
Post by George P. Stathis
Post by Ryan Rawson
able
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
to
theoretically serve the data that the downed regionserver
can't.
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
- When a regionserver goes down unexpectedly, the only data
that
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
theoretically gets lost was whatever didn't make it to the
WAL,
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
right?
Post by George P. Stathis
Post by Todd Lipcon
Or
Post by George P. Stathis
wrong? E.g.
http://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-log.html
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
- We ran a hadoop fsck on our cluster and verified the
replication
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
factor
Post by George P. Stathis
as well as that the were no under replicated blocks. So why
was
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
our
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
data
Post by George P. Stathis
not
available from another node?
- If the log gets rolled every 60 minutes by default (we
haven't
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
touched
Post by George P. Stathis
the defaults), how can we lose data from up to 24 hours ago?
- When the downed regionserver comes back up, shouldn't that
data
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
be
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
available again? Ours wasn't.
- In such scenarios, is there a recommended approach for
restoring
Post by George P. Stathis
Post by Ryan Rawson
the
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
regionserver that goes down? We just brought them back up by
logging
Post by George P. Stathis
Post by Ryan Rawson
on
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
the
node itself an manually restarting them first. Now we have
automated
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
crons
that listen for their ports and restart them if they go down
within
Post by George P. Stathis
Post by Ryan Rawson
two
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
minutes.
- Are there way to recover such lost data?
- Are versions 0.89 / 0.90 addressing any of these issues?
- Curiosity question: when a regionserver goes down, does the
master
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
try
Post by George P. Stathis
to replicate that node's data on another node to satisfy the
dfs.replication
ratio?
For now, we have upgraded our HBase to 0.20.6, which is
supposed to
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
contain
Post by George P. Stathis
the HBASE-2077 <
https://issues.apache.org/jira/browse/HBASE-2077>
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
fix
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Todd Lipcon
(but
Post by George P. Stathis
no one has verified yet). Lars' blog also suggests that Hadoop
0.21.0
Post by George P. Stathis
Post by Ryan Rawson
is
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
the
way to go to avoid the file append issues but it's not
production
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
ready
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
yet. Should we stick to 0.20.1? Upgrade to 0.20.2?
Any tips here are definitely appreciated. I'll be happy to
provide
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
Post by George P. Stathis
Post by Ryan Rawson
more
Post by George P. Stathis
Post by Todd Lipcon
Post by George P. Stathis
information as well.
-GS
--
Todd Lipcon
Software Engineer, Cloudera
Andrew Purtell
2010-09-22 21:57:32 UTC
Permalink
While 0.89/0.90 is the way to go, there is also the 0.20-append branch of Hadoop, in the hadoop-common repo, which is better than nothing if using HBase 0.20:

http://github.com/apache/hadoop-common/tree/branch-0.20-append

There is also an amalgamation of 0.20-append and Yahoo Secure Hadoop 0.20.104:

http://github.com/apurtell/hadoop-common/tree/yahoo-hadoop-0.20.104-append

I'd recommend the former unless you also want strong authentication via Kerberos.

Best regards,

- Andy

Why is this email five sentences or less?
http://five.sentenc.es/
George P. Stathis
2010-09-23 01:21:33 UTC
Permalink
Thanks Andy, it's good to know there is an alternative. We'll attempt to go
to 0.89 but if we can't get reliable indexing, we may have to go with this
hadoop-append branch.

-GS
Post by Andrew Purtell
While 0.89/0.90 is the way to go, there is also the 0.20-append branch of
Hadoop, in the hadoop-common repo, which is better than nothing if using
http://github.com/apache/hadoop-common/tree/branch-0.20-append
http://github.com/apurtell/hadoop-common/tree/yahoo-hadoop-0.20.104-append
I'd recommend the former unless you also want strong authentication via Kerberos.
Best regards,
- Andy
Why is this email five sentences or less?
http://five.sentenc.es/
Andrew Purtell
2010-09-23 15:17:46 UTC
Permalink
Something else to consider is after 0.90, or whenever the coprocessor framework goes in, we will quite possibly build some kind of secondary indexing capability as a coprocessor (see HBASE-2000 and sub-issues). This stuff won't be backported, at least by us.

Best regards,

- Andy

Why is this email five sentences or less?
http://five.sentenc.es/
From: George P. Stathis
Subject: Re: A data loss scenario with a single region server going down
Date: Wednesday, September 22, 2010, 6:21 PM
Thanks Andy, it's good to know there is an alternative. We'll
attempt to go to 0.89 but if we can't get reliable indexing, we
may have to go with this hadoop-append branch.
-GS
Post by Andrew Purtell
While 0.89/0.90 is the way to go, there is also the
0.20-append branch of Hadoop, in the hadoop-common repo,
  http://github.com/apache/hadoop-common/tree/branch-0.20-append
There is also an amalgamation of 0.20-append and Yahoo
http://github.com/apurtell/hadoop-common/tree/yahoo-hadoop-0.20.104-append
I'd recommend the former unless you also want strong
authentication via Kerberos.
George P. Stathis
2010-09-23 16:56:20 UTC
Permalink
Thanks Andy. Good to know that's coming up. I started following HBASE-2038.
It does seem that it's quite a bit out (I'm guess well into 2011). I think
we will definitely be interested in migrating any indexes we have to the
Coprocessor model. We definitely prefer to go with features supported in
core. Until then though, hbase-transactional-tableindexed seems to be our
best bet unless there is something else folks here suggest we do.

-GS
Post by Andrew Purtell
Something else to consider is after 0.90, or whenever the coprocessor
framework goes in, we will quite possibly build some kind of secondary
indexing capability as a coprocessor (see HBASE-2000 and sub-issues). This
stuff won't be backported, at least by us.
Best regards,
- Andy
Why is this email five sentences or less?
http://five.sentenc.es/
From: George P. Stathis
Subject: Re: A data loss scenario with a single region server going down
Date: Wednesday, September 22, 2010, 6:21 PM
Thanks Andy, it's good to know there is an alternative. We'll
attempt to go to 0.89 but if we can't get reliable indexing, we
may have to go with this hadoop-append branch.
-GS
Post by Andrew Purtell
While 0.89/0.90 is the way to go, there is also the
0.20-append branch of Hadoop, in the hadoop-common repo,
http://github.com/apache/hadoop-common/tree/branch-0.20-append
There is also an amalgamation of 0.20-append and Yahoo
http://github.com/apurtell/hadoop-common/tree/yahoo-hadoop-0.20.104-append
Post by Andrew Purtell
I'd recommend the former unless you also want strong
authentication via Kerberos.
Continue reading on narkive:
Loading...