How to Lowercase All Column Names in Java Spark Dataset


How can we lowercase all column names, or column headers, in a Java Spark Dataset?

Suppose we’re working with a Dataset<Row> ds.

1. Using toDF()

A simple way to rename columns is to use toDF(), which returns a Dataset with the specified column names.

We can first create an array with the lowercase columns, then we can pass those column names into toDF().

String[] lowerCased = Arrays
  .asList(ds.columns())
  .stream()
  .map(String::toLowerCase)
  .toArray(String[]::new);
ds = ds.toDF(lowerCased);

2. Using withColumnRenamed()

Another way to lowercase all column names is to use a for loop and withColumnRenamed(), which returns a new Dataset with the new column header.

columns() will return a String[] array containing all the column names. We can then use withColumnRenamed() to replace all column names with the lowercased string.

for (String col : ds.columns()) {
  ds = ds.withColumnRenamed(col, col.toLowerCase());
}