How to Get the Day of Week from a Timestamp Column in a PySpark DataFrame


How can we get the day of week from a timestamp column in a PySpark DataFrame?

Suppose we have a DataFrame df with the column datetime, which is of type timestamp.

We can easily get the day of week using date_format().

Get the day of week in short form

We can get the day of week in short form using date_format() and E.

from pyspark.sql.functions import date_format
df = df.withColumn("day", date_format('datetime', 'E'))
+----------+---+
|  datetime|day|
+----------+---+
|2022-01-10|Mon|
+----------+---+

Get the day of week in long form

We can get the day of week in long form using date_format() and EEEE.

from pyspark.sql.functions import date_format
df = df.withColumn("day", date_format('datetime', 'EEEE'))
+----------+------+
|  datetime|   day|
+----------+------+
|2022-01-10|Monday|
+----------+------+

Get the first letter of the day of week

We can get the first letter of the day of week using date_format() and EEEEE.

from pyspark.sql.functions import date_format
df = df.withColumn("day", date_format('datetime', 'EEEEE'))
+----------+---+
|  datetime|day|
+----------+---+
|2022-01-10|  M|
+----------+---+

Read more about the SimpleDateFormat that PySpark follows.