I have a Spark job in which I am applying a custom transformation function to all records of my RDD. This function is very long and somewhat "fragile" as it might fail unexpectedly. I also have a fallback function that should be used in case of a failure - this one is much faster and stable. For reasons beyond the scope of this question, I can't split the primary function into smaller pieces, nor can I catch the primary function's failure inside the function itself and handle the fallback (with try/except for example) as I also want to handle failures caused by the execution environment, such as OOM.
Here's a simple example of what I'm trying to achieve:
def primary_logic(*args):
...
def fallback_logic(*args):
...
def spark_map_function(*args):
current_task_attempt_num = ... # how do I get this?
if current_task_attempt_num == 0:
return primary_logic(*args)
else:
return fallback_logic(*args)
result = rdd.map(spark_map_function)
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…