Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.3k views
in Technique[技术] by (71.8m points)

spark Dataframe clarification on select

I have created one dataframe ordersDF.Below is the schema.

root
 |-- order_id: long (nullable = true)
 |-- order_date: string (nullable = true)
 |-- order_customer_id: long (nullable = true)
 |-- order_status: string (nullable = true)
 

In some places we are using 'order_id', order_id,ordersDF.order_id.It is really confusing when to use which one. For example.

1)ordersDF.select(order_id).show() -- NameError: name 'order_id' is not defined
 ordersDF.where('order_id==9').show() --No error here
 
 2)ordersDF.select('order_id').show() --No error here
 
 3)ordersDF.select(ordersDF.order_id).show()--No error here
 
 4)ordersDF.where('ordersDF.order_id==9').show() --AnalysisException: cannot resolve '`ordersDF.order_id`' given input columns: [order_customer_id, order_date, order_id, order_status]; line 1 pos 0;
question from:https://stackoverflow.com/questions/65914467/spark-dataframe-clarification-on-select

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

From what I understand, you are confused while passing a column.
Use DF.ColumnName without any ' or pass column name under single quotes like 'order_id'. You can use any of these ways to pass the column.

This should solve your problem.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...