apache spark - Why this nested "when" does not work in pyspark?

Question

Welcome To Ask or Share your Answers For Others

apache spark - Why this nested "when" does not work in pyspark?

asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

apache spark - Why this nested "when" does not work in pyspark?

I'm trying to divide people into age range with

from pyspark import SparkFiles
from pyspark.sql import functions as fn

## Import data

url_users = "https://raw.githubusercontent.com/leanhdung1994/BigData/main/users.csv"
spark.sparkContext.addFile(url_users)
users_from_file = spark.read.csv("file://" + SparkFiles.get("users.csv"), header = True, sep = ",", inferSchema = True)

## Generate column age

reference_date = date(2017, 12, 31)
from pyspark.sql.types import IntegerType
def cal_age(born):
    return reference_date.year - born.year - ((reference_date.month, reference_date.day) < (born.month, born.day))
users_from_file = users_from_file.withColumn('age', cal_age_udf(fn.to_date(fn.col('birth_date'))))

## Generate column range

users_from_file1 = users_from_file.withColumn('range', fn.when(fn.col("age") <= 25, 1)fn.when(fn.col("age") <= 35, 2).fn.otherwise(3))

users_from_file1.show()

Then it returns an error

SyntaxError: invalid syntax
  File "<command-2296735704765764>", line 3
    users_from_file1 = users_from_file.withColumn('range', fn.when(fn.col("age") <= 25, 1)fn.when(fn.col("age") <= 35, 2).fn.otherwise(3))
                                                                                           ^
SyntaxError: invalid syntax

Could you please elaborate more on this nested when? This syntax of When is from this answer, but it does not work.

question from:https://stackoverflow.com/questions/65898638/why-this-nested-when-does-not-work-in-pyspark

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-06T19:16:03+0000

It should be

fn.when(fn.col("age") <= 25, 1).when(fn.col("age") <= 35, 2).otherwise(3)

No need to specify fn again after the first when.

Categories

apache spark - Why this nested "when" does not work in pyspark?

apache spark - Why this nested "when" does not work in pyspark?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags