How to Filter HBase Scan Based on Column Value in Java
How can we filter a scan of an HBase table based on some column value in Java?
Suppose we have an HBase table with the column greeting
(a column qualifier).
We want to filter the scan results to only greetings
that contain the string "hello"
.
1. Filter cell value using SingleColumnValueFilter
We can use a SingleColumnValueFilter
to filter cells based on its value.
byte[] CF = Bytes.toBytes("column_family");
byte[] CQ = Bytes.toBytes("greeting");
SingleColumnValueFilter filter = new SingleColumnValueFilter(
CF, CQ,
CompareOp.EQUAL,
comparator
);
The SingleColumnValueFilter
will take in a column family and column qualifier for the first two arguments.
For the third and fourth arguments, we’ll want to use the EQUAL
compare operator along with a custom comparator like SubstringComparator
or RegexStringComparator
, where we’ll define our filter condition.
2. Set filter conditions with a comparator
The SubstringComparator
will return a cell if the supplied substring appears in a cell value in the column.
SubstringComparator comparator = new SubstringComparator("hello");
The RegexStringComparator
will return a cell if the supplied regular expression matches a cell value in the column.
We can certainly perform more complex operation using regular expressions than with a simple substring comparator, but the filter operations will be less performant.
RegexStringComparator comparator = new RegexStringComparator(".*hello.*");
3. Apply filter to the scan
After defining the comparator and creating the filter, we can apply the filter to a scan.
Scan scan = new Scan();
scan.setFilter(filter);