How to Get the Day of Week from a Timestamp Column in a PySpark DataFrame
How can we get the day of week from a timestamp column in a PySpark DataFrame?
Suppose we have a DataFrame df
with the column datetime
, which is of type timestamp
.
We can easily get the day of week using date_format()
.
Get the day of week in short form
We can get the day of week in short form using date_format()
and E
.
from pyspark.sql.functions import date_format
df = df.withColumn("day", date_format('datetime', 'E'))
+----------+---+
| datetime|day|
+----------+---+
|2022-01-10|Mon|
+----------+---+
Get the day of week in long form
We can get the day of week in long form using date_format()
and EEEE
.
from pyspark.sql.functions import date_format
df = df.withColumn("day", date_format('datetime', 'EEEE'))
+----------+------+
| datetime| day|
+----------+------+
|2022-01-10|Monday|
+----------+------+
Get the first letter of the day of week
We can get the first letter of the day of week using date_format()
and EEEEE
.
from pyspark.sql.functions import date_format
df = df.withColumn("day", date_format('datetime', 'EEEEE'))
+----------+---+
| datetime|day|
+----------+---+
|2022-01-10| M|
+----------+---+
Read more about the
SimpleDateFormat
that PySpark follows.