I am adding a new column, week of year, from UTC date-time column:
df.withColumn("week_of_year", new org.apache.spark.sql.Column(dt weekOfWeekyear)).select("week_of_year", "utc_time").take(10).foreach(println)
It works great:
[25,2015-06-21T15:55:10.602Z]
[25,2015-06-21T21:28:42.056Z]
[25,2015-06-21T21:02:26.701Z]
[25,2015-06-21T02:45:58.263Z]
I don't know if this is an expensive process or are there alternatives to it. Any comments?