These are few ways to write If-Else
/ When-Then-Else
/ When-Otherwise
expression in pyspark
.
Sample dataframe
df = spark.createDataFrame([(1,1),(2,2),(3,3)],['id','value'])
df.show()
#+---+-----+
#| id|value|
#+---+-----+
#| 1| 1|
#| 2| 2|
#| 3| 3|
#+---+-----+
#Desired Output:
#+---+-----+----------+
#| id|value|value_desc|
#+---+-----+----------+
#| 1| 1| one|
#| 2| 2| two|
#| 3| 3| other|
#+---+-----+----------+
Option#1: withColumn()
using when-otherwise
from pyspark.sql.functions import when
df.withColumn("value_desc",when(df.value == 1, 'one').when(df.value == 2, 'two').otherwise('other')).show()
Option#2: select()
using when-otherwise
from pyspark.sql.functions import when
df.select("*",when(df.value == 1, 'one').when(df.value == 2, 'two').otherwise('other').alias('value_desc')).show()
Option3: selectExpr()
using SQL equivalent CASE expression
df.selectExpr("*","CASE WHEN value == 1 THEN 'one' WHEN value == 2 THEN 'two' ELSE 'other' END AS value_desc").show()
SQL like expression can also be written in withColumn()
and select()
using pyspark.sql.functions.expr function. Here are examples.
Option4: select()
using expr function
from pyspark.sql.functions import expr
df.select("*",expr("CASE WHEN value == 1 THEN 'one' WHEN value == 2 THEN 'two' ELSE 'other' END AS value_desc")).show()
Option5: withColumn()
using expr function
from pyspark.sql.functions import expr
df.withColumn("value_desc",expr("CASE WHEN value == 1 THEN 'one' WHEN value == 2 THEN 'two' ELSE 'other' END AS value_desc")).show()
Output:
#+---+-----+----------+
#| id|value|value_desc|
#+---+-----+----------+
#| 1| 1| one|
#| 2| 2| two|
#| 3| 3| other|
#+---+-----+----------+