Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
328 views
in Technique[技术] by (71.8m points)

apache spark - What happens internally when we restart Azure Databricks cluster?

When we get many stage failures , we generally restart cluster to avoid stage failures. I want to know

1)What exactly is happening when we restart it.

2)Is it removing metadata / cache from the cluster?

3)Is there any other way to meet the above requirement without restarting cluster.

question from:https://stackoverflow.com/questions/65938325/what-happens-internally-when-we-restart-azure-databricks-cluster

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

When you restart the cluster, the spark application is initialized over again, like literally from scratch all cache in clusters are wiped.

You will see this evident in cluster driver logs when you restart, spark initialize and boots all libraries loads metastore and DBFS.

One thing a immediate a quick restart (not more than ~5 mins gap) does not do is not deprovisioning the underlying VM instance hosting the application. If you think the VM is in bad state terminate - give a gap of 5 mins and start again. ( this does not work clusters over pool as pools sustain VMs even after termination.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...