How to Lowercase All Column Names in Java Spark Dataset
How can we lowercase all column names, or column headers, in a Java Spark Dataset?
Suppose we’re working with a Dataset<Row> ds.
1. Using toDF()
A simple way to rename columns is to use toDF(), which returns a Dataset with the specified column names.
We can first create an array with the lowercase columns, then we can pass those column names into toDF().
String[] lowerCased = Arrays
.asList(ds.columns())
.stream()
.map(String::toLowerCase)
.toArray(String[]::new);
ds = ds.toDF(lowerCased);
2. Using withColumnRenamed()
Another way to lowercase all column names is to use a for loop and withColumnRenamed(), which returns a new Dataset with the new column header.
columns() will return a String[] array containing all the column names. We can then use withColumnRenamed() to replace all column names with the lowercased string.
for (String col : ds.columns()) {
ds = ds.withColumnRenamed(col, col.toLowerCase());
}