Discussion:
Delete reveals older version of a column even when VERSIONS=1
Mike Percy
2011-01-29 01:43:54 UTC
Permalink
Hi folks,
I am seeing some unexpected behavior with HBase 0.20.6 when deleting columns. Our cluster has been running for some time however we recently upgraded from Hbase 0.20.3. The family I am writing to is specified as VERSIONS => '1' when doing a describe, yet HBase appears to be maintaining several versions of the columns.

Below is a shell session demonstrating the problem. Is this a configuration problem, as-designed, or possibly a bug?

Thanks,
Mike

hbase(main):004:0> put 'table', 'row', 'family:qual', '1'
0 row(s) in 0.0110 seconds
hbase(main):007:0> get 'table', 'row'
COLUMN CELL
family:qual timestamp=1296264772717, value=1
1 row(s) in 0.0080 seconds
hbase(main):008:0> put 'table', 'row', 'family:qual', '2'
0 row(s) in 0.0020 seconds
hbase(main):009:0> put 'table', 'row', 'family:qual', '3'
0 row(s) in 0.0020 seconds
hbase(main):010:0> get 'table', 'row'
COLUMN CELL
family:qual timestamp=1296264797169, value=3
1 row(s) in 0.0030 seconds
hbase(main):011:0> delete 'table', 'row', 'family:qual'
0 row(s) in 0.0040 seconds
hbase(main):012:0> get 'table', 'row'
COLUMN CELL
family:qual timestamp=1296264795365, value=2
1 row(s) in 0.0630 seconds
hbase(main):013:0> delete 'table', 'row', 'family:qual'
0 row(s) in 0.0360 seconds
hbase(main):014:0> get 'table', 'row'
COLUMN CELL
family:qual timestamp=1296264772717, value=1
1 row(s) in 0.0030 seconds
hbase(main):013:0> delete 'table', 'row', 'family:qual'
0 row(s) in 0.0360 seconds
hbase(main):016:0> get 'table', 'row'
COLUMN CELL
0 row(s) in 0.0030 seconds
Ryan Rawson
2011-01-29 01:47:29 UTC
Permalink
I would call it 'a surprising, perhaps unexpected consequence of our
storage model'.

There are 2 types of deletes in hbase, you are doing type (a) "delete
a single version", but you probably want type (b) "delete all versions
in this column"
Post by Mike Percy
Hi folks,
I am seeing some unexpected behavior with HBase 0.20.6 when deleting columns. Our cluster has been running for some time however we recently upgraded from Hbase 0.20.3. The family I am writing to is specified as VERSIONS => '1' when doing a describe, yet HBase appears to be maintaining several versions of the columns.
Below is a shell session demonstrating the problem. Is this a configuration problem, as-designed, or possibly a bug?
Thanks,
Mike
hbase(main):004:0> put 'table', 'row', 'family:qual', '1'
0 row(s) in 0.0110 seconds
hbase(main):007:0> get 'table', 'row'
COLUMN                       CELL
 family:qual                   timestamp=1296264772717, value=1
1 row(s) in 0.0080 seconds
hbase(main):008:0> put 'table', 'row', 'family:qual', '2'
0 row(s) in 0.0020 seconds
hbase(main):009:0> put 'table', 'row', 'family:qual', '3'
0 row(s) in 0.0020 seconds
hbase(main):010:0> get 'table', 'row'
COLUMN                       CELL
 family:qual                   timestamp=1296264797169, value=3
1 row(s) in 0.0030 seconds
hbase(main):011:0> delete 'table', 'row', 'family:qual'
0 row(s) in 0.0040 seconds
hbase(main):012:0> get 'table', 'row'
COLUMN                       CELL
 family:qual                   timestamp=1296264795365, value=2
1 row(s) in 0.0630 seconds
hbase(main):013:0> delete 'table', 'row', 'family:qual'
0 row(s) in 0.0360 seconds
hbase(main):014:0> get 'table', 'row'
COLUMN                       CELL
 family:qual                   timestamp=1296264772717, value=1
1 row(s) in 0.0030 seconds
hbase(main):013:0> delete 'table', 'row', 'family:qual'
0 row(s) in 0.0360 seconds
hbase(main):016:0> get 'table', 'row'
COLUMN                       CELL
0 row(s) in 0.0030 seconds
Mike Percy
2011-01-29 02:10:16 UTC
Permalink
Hmm... how does this relate to setting VERSIONS => '1'? By setting # of versions to 1 are we getting some space benefit over say VERSIONS => '10'?

Thanks,
Mike
Post by Ryan Rawson
I would call it 'a surprising, perhaps unexpected consequence of our
storage model'.
There are 2 types of deletes in hbase, you are doing type (a) "delete
a single version", but you probably want type (b) "delete all versions
in this column"
Post by Mike Percy
Hi folks,
I am seeing some unexpected behavior with HBase 0.20.6 when deleting columns. Our cluster has been running for some time however we recently upgraded from Hbase 0.20.3. The family I am writing to is specified as VERSIONS => '1' when doing a describe, yet HBase appears to be maintaining several versions of the columns.
Below is a shell session demonstrating the problem. Is this a configuration problem, as-designed, or possibly a bug?
Thanks,
Mike
hbase(main):004:0> put 'table', 'row', 'family:qual', '1'
0 row(s) in 0.0110 seconds
hbase(main):007:0> get 'table', 'row'
COLUMN CELL
family:qual timestamp=1296264772717, value=1
1 row(s) in 0.0080 seconds
hbase(main):008:0> put 'table', 'row', 'family:qual', '2'
0 row(s) in 0.0020 seconds
hbase(main):009:0> put 'table', 'row', 'family:qual', '3'
0 row(s) in 0.0020 seconds
hbase(main):010:0> get 'table', 'row'
COLUMN CELL
family:qual timestamp=1296264797169, value=3
1 row(s) in 0.0030 seconds
hbase(main):011:0> delete 'table', 'row', 'family:qual'
0 row(s) in 0.0040 seconds
hbase(main):012:0> get 'table', 'row'
COLUMN CELL
family:qual timestamp=1296264795365, value=2
1 row(s) in 0.0630 seconds
hbase(main):013:0> delete 'table', 'row', 'family:qual'
0 row(s) in 0.0360 seconds
hbase(main):014:0> get 'table', 'row'
COLUMN CELL
family:qual timestamp=1296264772717, value=1
1 row(s) in 0.0030 seconds
hbase(main):013:0> delete 'table', 'row', 'family:qual'
0 row(s) in 0.0360 seconds
hbase(main):016:0> get 'table', 'row'
COLUMN CELL
0 row(s) in 0.0030 seconds
Buttler, David
2011-02-01 00:35:02 UTC
Permalink
The way I understand it is that old versions do not actually disappear until a compaction occurs. A compaction should occur once per day unless you have changed the major compaction settings, or whenever a region splits.

Dave



-----Original Message-----
From: Mike Percy [mailto:mpercy-ZXvpkYn067l8UrSeD/***@public.gmane.org]
Sent: Friday, January 28, 2011 6:10 PM
To: user-50Pas4EWwPEyzMRdD/***@public.gmane.org
Subject: Re: Delete reveals older version of a column even when VERSIONS=1

Hmm... how does this relate to setting VERSIONS => '1'? By setting # of versions to 1 are we getting some space benefit over say VERSIONS => '10'?

Thanks,
Mike
Post by Ryan Rawson
I would call it 'a surprising, perhaps unexpected consequence of our
storage model'.
There are 2 types of deletes in hbase, you are doing type (a) "delete
a single version", but you probably want type (b) "delete all versions
in this column"
Post by Mike Percy
Hi folks,
I am seeing some unexpected behavior with HBase 0.20.6 when deleting columns. Our cluster has been running for some time however we recently upgraded from Hbase 0.20.3. The family I am writing to is specified as VERSIONS => '1' when doing a describe, yet HBase appears to be maintaining several versions of the columns.
Below is a shell session demonstrating the problem. Is this a configuration problem, as-designed, or possibly a bug?
Thanks,
Mike
hbase(main):004:0> put 'table', 'row', 'family:qual', '1'
0 row(s) in 0.0110 seconds
hbase(main):007:0> get 'table', 'row'
COLUMN CELL
family:qual timestamp=1296264772717, value=1
1 row(s) in 0.0080 seconds
hbase(main):008:0> put 'table', 'row', 'family:qual', '2'
0 row(s) in 0.0020 seconds
hbase(main):009:0> put 'table', 'row', 'family:qual', '3'
0 row(s) in 0.0020 seconds
hbase(main):010:0> get 'table', 'row'
COLUMN CELL
family:qual timestamp=1296264797169, value=3
1 row(s) in 0.0030 seconds
hbase(main):011:0> delete 'table', 'row', 'family:qual'
0 row(s) in 0.0040 seconds
hbase(main):012:0> get 'table', 'row'
COLUMN CELL
family:qual timestamp=1296264795365, value=2
1 row(s) in 0.0630 seconds
hbase(main):013:0> delete 'table', 'row', 'family:qual'
0 row(s) in 0.0360 seconds
hbase(main):014:0> get 'table', 'row'
COLUMN CELL
family:qual timestamp=1296264772717, value=1
1 row(s) in 0.0030 seconds
hbase(main):013:0> delete 'table', 'row', 'family:qual'
0 row(s) in 0.0360 seconds
hbase(main):016:0> get 'table', 'row'
COLUMN CELL
0 row(s) in 0.0030 seconds
Ryan Rawson
2011-02-01 00:40:31 UTC
Permalink
You are correct, since we do not prune extra version except during
these major compactions that happen about once a day, if you delete a
recent version and it exposes an older version, you will see this.

I might consider this a mis-feature. I would encourage you to
consider using the Delete.deleteColumns() call found here:
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Delete.html#deleteColumns(byte[],
byte[])

and NOT USE:

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Delete.html#deleteColumn(byte[],
byte[])

Note the only difference between these is the plurality of 'column'.

I hope this helps!
-ryan
The way I understand it is that old versions do not actually disappear until a compaction occurs.  A compaction should occur once per day unless you have changed the major compaction settings, or whenever a region splits.
Dave
-----Original Message-----
Sent: Friday, January 28, 2011 6:10 PM
Subject: Re: Delete reveals older version of a column even when VERSIONS=1
Hmm... how does this relate to setting VERSIONS => '1'? By setting # of versions to 1 are we getting some space benefit over say VERSIONS => '10'?
Thanks,
Mike
Post by Ryan Rawson
I would call it 'a surprising, perhaps unexpected consequence of our
storage model'.
There are 2 types of deletes in hbase, you are doing type (a) "delete
a single version", but you probably want type (b) "delete all versions
in this column"
Post by Mike Percy
Hi folks,
I am seeing some unexpected behavior with HBase 0.20.6 when deleting columns. Our cluster has been running for some time however we recently upgraded from Hbase 0.20.3. The family I am writing to is specified as VERSIONS => '1' when doing a describe, yet HBase appears to be maintaining several versions of the columns.
Below is a shell session demonstrating the problem. Is this a configuration problem, as-designed, or possibly a bug?
Thanks,
Mike
hbase(main):004:0> put 'table', 'row', 'family:qual', '1'
0 row(s) in 0.0110 seconds
hbase(main):007:0> get 'table', 'row'
COLUMN                       CELL
 family:qual                   timestamp=1296264772717, value=1
1 row(s) in 0.0080 seconds
hbase(main):008:0> put 'table', 'row', 'family:qual', '2'
0 row(s) in 0.0020 seconds
hbase(main):009:0> put 'table', 'row', 'family:qual', '3'
0 row(s) in 0.0020 seconds
hbase(main):010:0> get 'table', 'row'
COLUMN                       CELL
 family:qual                   timestamp=1296264797169, value=3
1 row(s) in 0.0030 seconds
hbase(main):011:0> delete 'table', 'row', 'family:qual'
0 row(s) in 0.0040 seconds
hbase(main):012:0> get 'table', 'row'
COLUMN                       CELL
 family:qual                   timestamp=1296264795365, value=2
1 row(s) in 0.0630 seconds
hbase(main):013:0> delete 'table', 'row', 'family:qual'
0 row(s) in 0.0360 seconds
hbase(main):014:0> get 'table', 'row'
COLUMN                       CELL
 family:qual                   timestamp=1296264772717, value=1
1 row(s) in 0.0030 seconds
hbase(main):013:0> delete 'table', 'row', 'family:qual'
0 row(s) in 0.0360 seconds
hbase(main):016:0> get 'table', 'row'
COLUMN                       CELL
0 row(s) in 0.0030 seconds
Mike Percy
2011-02-02 02:40:58 UTC
Permalink
Hi David and Ryan,
That is very interesting! This makes things much clearer.

Thanks for your help!
Mike
Post by Ryan Rawson
You are correct, since we do not prune extra version except during
these major compactions that happen about once a day, if you delete a
recent version and it exposes an older version, you will see this.
I might consider this a mis-feature. I would encourage you to
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Delete.html#deleteColumns(byte[],
byte[])
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Delete.html#deleteColumn(byte[],
byte[])
Note the only difference between these is the plurality of 'column'.
I hope this helps!
-ryan
Post by Buttler, David
The way I understand it is that old versions do not actually disappear until a compaction occurs. A compaction should occur once per day unless you have changed the major compaction settings, or whenever a region splits.
Dave
-----Original Message-----
Sent: Friday, January 28, 2011 6:10 PM
Subject: Re: Delete reveals older version of a column even when VERSIONS=1
Hmm... how does this relate to setting VERSIONS => '1'? By setting # of versions to 1 are we getting some space benefit over say VERSIONS => '10'?
Thanks,
Mike
Post by Ryan Rawson
I would call it 'a surprising, perhaps unexpected consequence of our
storage model'.
There are 2 types of deletes in hbase, you are doing type (a) "delete
a single version", but you probably want type (b) "delete all versions
in this column"
Post by Mike Percy
Hi folks,
I am seeing some unexpected behavior with HBase 0.20.6 when deleting columns. Our cluster has been running for some time however we recently upgraded from Hbase 0.20.3. The family I am writing to is specified as VERSIONS => '1' when doing a describe, yet HBase appears to be maintaining several versions of the columns.
Below is a shell session demonstrating the problem. Is this a configuration problem, as-designed, or possibly a bug?
Thanks,
Mike
hbase(main):004:0> put 'table', 'row', 'family:qual', '1'
0 row(s) in 0.0110 seconds
hbase(main):007:0> get 'table', 'row'
COLUMN CELL
family:qual timestamp=1296264772717, value=1
1 row(s) in 0.0080 seconds
hbase(main):008:0> put 'table', 'row', 'family:qual', '2'
0 row(s) in 0.0020 seconds
hbase(main):009:0> put 'table', 'row', 'family:qual', '3'
0 row(s) in 0.0020 seconds
hbase(main):010:0> get 'table', 'row'
COLUMN CELL
family:qual timestamp=1296264797169, value=3
1 row(s) in 0.0030 seconds
hbase(main):011:0> delete 'table', 'row', 'family:qual'
0 row(s) in 0.0040 seconds
hbase(main):012:0> get 'table', 'row'
COLUMN CELL
family:qual timestamp=1296264795365, value=2
1 row(s) in 0.0630 seconds
hbase(main):013:0> delete 'table', 'row', 'family:qual'
0 row(s) in 0.0360 seconds
hbase(main):014:0> get 'table', 'row'
COLUMN CELL
family:qual timestamp=1296264772717, value=1
1 row(s) in 0.0030 seconds
hbase(main):013:0> delete 'table', 'row', 'family:qual'
0 row(s) in 0.0360 seconds
hbase(main):016:0> get 'table', 'row'
COLUMN CELL
0 row(s) in 0.0030 seconds
Loading...