How to Filter HBase Scan Based on Column Value in Java


How can we filter a scan of an HBase table based on some column value in Java?

Suppose we have an HBase table with the column greeting (a column qualifier).

We want to filter the scan results to only greetings that contain the string "hello".

1. Filter cell value using SingleColumnValueFilter

We can use a SingleColumnValueFilter to filter cells based on its value.

byte[] CF = Bytes.toBytes("column_family");
byte[] CQ = Bytes.toBytes("greeting");
SingleColumnValueFilter filter = new SingleColumnValueFilter(
  CF, CQ, 
  CompareOp.EQUAL,
  comparator
);

The SingleColumnValueFilter will take in a column family and column qualifier for the first two arguments.

For the third and fourth arguments, we’ll want to use the EQUAL compare operator along with a custom comparator like SubstringComparator or RegexStringComparator, where we’ll define our filter condition.

2. Set filter conditions with a comparator

The SubstringComparator will return a cell if the supplied substring appears in a cell value in the column.

SubstringComparator comparator = new SubstringComparator("hello");

The RegexStringComparator will return a cell if the supplied regular expression matches a cell value in the column.

We can certainly perform more complex operation using regular expressions than with a simple substring comparator, but the filter operations will be less performant.

RegexStringComparator comparator = new RegexStringComparator(".*hello.*");

3. Apply filter to the scan

After defining the comparator and creating the filter, we can apply the filter to a scan.

Scan scan = new Scan();
scan.setFilter(filter);