How to Convert Date String to Milliseconds in a Java Spark Dataset
How can we convert a date string to a millisecond timestamp from a Spark Dataset in Java?
Suppose we have a ts
column in our Dataset<Row>
, which holds a date string.
{"ts":"2022-06-27 00:46:31.990000000"}
This date string follows the format: yyyy-MM-dd HH:mm:ss.SSSSSSSSS
.
We can easily use unix_timestamp()
to return the Unix timestamp (in seconds) since 1970-01-01 00:00:00 UTC
as an unsigned integer.
We’ll create a new column using withColumn()
and default the value to the millisecond timestamp of the date string.
import static org.apache.spark.sql.functions.col;
import static org.apache.spark.sql.functions.unix_timestamp;
ds = ds.withColumn("tsMillis", unix_timestamp(col("ts")).multiply(1000));
Note that we’ll want to multiply the column value by 1000
to ensure our timestamp is in milliseconds.