How to Convert Date String to Milliseconds in a Java Spark Dataset


How can we convert a date string to a millisecond timestamp from a Spark Dataset in Java?

Suppose we have a ts column in our Dataset<Row>, which holds a date string.

{"ts":"2022-06-27 00:46:31.990000000"}

This date string follows the format: yyyy-MM-dd HH:mm:ss.SSSSSSSSS.

We can easily use unix_timestamp() to return the Unix timestamp (in seconds) since 1970-01-01 00:00:00 UTC as an unsigned integer.

We’ll create a new column using withColumn() and default the value to the millisecond timestamp of the date string.

import static org.apache.spark.sql.functions.col;
import static org.apache.spark.sql.functions.unix_timestamp;

ds = ds.withColumn("tsMillis", unix_timestamp(col("ts")).multiply(1000));

Note that we’ll want to multiply the column value by 1000 to ensure our timestamp is in milliseconds.