How to Lowercase All Column Names in Java Spark Dataset
How can we lowercase all column names, or column headers, in a Java Spark Dataset?
Suppose we’re working with a Dataset<Row> ds
.
1. Using toDF()
A simple way to rename columns is to use toDF()
, which returns a Dataset
with the specified column names.
We can first create an array with the lowercase columns, then we can pass those column names into toDF()
.
String[] lowerCased = Arrays
.asList(ds.columns())
.stream()
.map(String::toLowerCase)
.toArray(String[]::new);
ds = ds.toDF(lowerCased);
2. Using withColumnRenamed()
Another way to lowercase all column names is to use a for
loop and withColumnRenamed()
, which returns a new Dataset
with the new column header.
columns()
will return a String[]
array containing all the column names. We can then use withColumnRenamed()
to replace all column names with the lowercased string.
for (String col : ds.columns()) {
ds = ds.withColumnRenamed(col, col.toLowerCase());
}