Discussion:
Scanning table using partial row key match returns unexpected results
William Shen
2018-10-16 20:15:01 UTC
Permalink
Hi there,

I am trying to scan using a partial match on the row key (derived from the
Phoenix primary key), however, hbase shell is returning results that do not
look like a match. Can someone help me understand why the following row
keys are considered a match and returned?

In addition, I am not sure how to interpret the values like \xF3^ and \x14'
that are suppose to be hex values...

hbase(main):001:0> import org.apache.hadoop.hbase.filter.CompareFilter

=> Java::OrgApacheHadoopHbaseFilter::CompareFilter

hbase(main):002:0> import org.apache.hadoop.hbase.filter.SubstringComparator

=> Java::OrgApacheHadoopHbaseFilter::SubstringComparator

hbase(main):003:0> import org.apache.hadoop.hbase.filter.RowFilter

=> Java::OrgApacheHadoopHbaseFilter::RowFilter

hbase(main):004:0> scan 'TEST_SCHEMA.TEST_TABLE', {COLUMNS => 'TG:_0',
LIMIT => 5, FILTER =>
RowFilter.new(CompareFilter::CompareOp.valueOf('EQUAL'),SubstringComparator.new("\x80\x00\x00\x00\x00\x00\x14\x27\x00\x07\x80\x00\x00\x00\x00\xC7\xE5\x87"))}

ROW
COLUMN+CELL




\x00\x80\x00\x00\x00\x00\x00\x14'\x00\x07\x80\x00\x00\x00\x00\xBC\xEA\x81
column=TG:_0, timestamp=1481844289334, value=




\x00\x80\x00\x00\x00\x00\x00\x14'\x00\x07\x80\x00\x00\x00\x00\xBC\xEA\xE5
column=TG:_0, timestamp=1481844289334, value=




\x00\x80\x00\x00\x00\x00\x00\x14'\x00\x07\x80\x00\x00\x00\x00\xBC\xF3^
column=TG:_0, timestamp=1481844289334, value=




\x00\x80\x00\x00\x00\x00\x00\x14'\x00\x07\x80\x00\x00\x00\x00\xBC\xF8\xA5
column=TG:_0, timestamp=1481844289334, value=




\x00\x80\x00\x00\x00\x00\x00\x14'\x00\x07\x80\x00\x00\x00\x00\xBC\xF9b
column=TG:_0, timestamp=1481844289334, value=




5 row(s) in 0.9430 seconds

Thanks in advance!

- Will
William Shen
2018-10-17 00:04:40 UTC
Permalink
Actually, they are correctly matched. After further investigation, it turns
out that the Row Key is printed out differently because it used binary
string representation (
https://stackoverflow.com/questions/42353013/what-are-the-non-hex-characters-in-hbase-shell-rowkey
).
After converting them back to hex, you can see they are actually correctly
matched:

hbase(main):015:0>
Bytes.toHex(Bytes.toBytesBinary("\x80\x00\x00\x00\x00\x00\x14\x27\x00\x07\x80\x00\x00\x00\x00\xC7\xE5\x87"))

=> "fd000000000014270007fd00000000fdfd"

is indeed contained in:

hbase(main):016:0>
Bytes.toHex(Bytes.toBytesBinary("\x00\x80\x00\x00\x00\x00\x00\x14'\x00\x07\x80\x00\x00\x00\x00\xBC\xEA\x81")
)

=> "00fd000000000014270007fd00000000fdfd"

and

hbase(main):018:0>
Bytes.toHex(Bytes.toBytesBinary("\x00\x80\x00\x00\x00\x00\x00\x14'\x00\x07\x80\x00\x00\x00\x00\xBC\xF3^"))

=> "00fd000000000014270007fd00000000fdfd5e"
Post by William Shen
Hi there,
I am trying to scan using a partial match on the row key (derived from the
Phoenix primary key), however, hbase shell is returning results that do not
look like a match. Can someone help me understand why the following row
keys are considered a match and returned?
In addition, I am not sure how to interpret the values like \xF3^ and
\x14' that are suppose to be hex values...
hbase(main):001:0> import org.apache.hadoop.hbase.filter.CompareFilter
=> Java::OrgApacheHadoopHbaseFilter::CompareFilter
hbase(main):002:0> import
org.apache.hadoop.hbase.filter.SubstringComparator
=> Java::OrgApacheHadoopHbaseFilter::SubstringComparator
hbase(main):003:0> import org.apache.hadoop.hbase.filter.RowFilter
=> Java::OrgApacheHadoopHbaseFilter::RowFilter
hbase(main):004:0> scan 'TEST_SCHEMA.TEST_TABLE', {COLUMNS => 'TG:_0',
LIMIT => 5, FILTER =>
RowFilter.new(CompareFilter::CompareOp.valueOf('EQUAL'),SubstringComparator.new("\x80\x00\x00\x00\x00\x00\x14\x27\x00\x07\x80\x00\x00\x00\x00\xC7\xE5\x87"))}
ROW
COLUMN+CELL
\x00\x80\x00\x00\x00\x00\x00\x14'\x00\x07\x80\x00\x00\x00\x00\xBC\xEA\x81
column=TG:_0, timestamp=1481844289334, value=
\x00\x80\x00\x00\x00\x00\x00\x14'\x00\x07\x80\x00\x00\x00\x00\xBC\xEA\xE5
column=TG:_0, timestamp=1481844289334, value=
\x00\x80\x00\x00\x00\x00\x00\x14'\x00\x07\x80\x00\x00\x00\x00\xBC\xF3^
column=TG:_0, timestamp=1481844289334, value=
\x00\x80\x00\x00\x00\x00\x00\x14'\x00\x07\x80\x00\x00\x00\x00\xBC\xF8\xA5
column=TG:_0, timestamp=1481844289334, value=
\x00\x80\x00\x00\x00\x00\x00\x14'\x00\x07\x80\x00\x00\x00\x00\xBC\xF9b
column=TG:_0, timestamp=1481844289334, value=
5 row(s) in 0.9430 seconds
Thanks in advance!
- Will
Loading...