Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
442 views
in Technique[技术] by (71.8m points)

python - Add new rows to pyspark Dataframe

Am very new pyspark but familiar with pandas. I have a pyspark Dataframe

# instantiate Spark
spark = SparkSession.builder.getOrCreate()

# make some test data
columns = ['id', 'dogs', 'cats']
vals = [
     (1, 2, 0),
     (2, 0, 1)
]

# create DataFrame
df = spark.createDataFrame(vals, columns)

wanted to add new Row (4,5,7) so it will output:

df.show()
+---+----+----+
| id|dogs|cats|
+---+----+----+
|  1|   2|   0|
|  2|   0|   1|
|  4|   5|   7|
+---+----+----+
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

As thebluephantom has already said union is the way to go. I'm just answering your question to give you a pyspark example:

# if not already created automatically, instantiate Sparkcontext
spark = SparkSession.builder.getOrCreate()

columns = ['id', 'dogs', 'cats']
vals = [(1, 2, 0), (2, 0, 1)]

df = spark.createDataFrame(vals, columns)

newRow = spark.createDataFrame([(4,5,7)], columns)
appended = df.union(newRow)
appended.show()

Please have also a lookat the databricks FAQ: https://kb.databricks.com/data/append-a-row-to-rdd-or-dataframe.html


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...