This solution is also an approach that uses less code, avoids serialization to RDD and is likely easier to understand:
from pyspark.sql.types import IntegerType
# notice the variable name (more below)
mylist = [1, 2, 3, 4]
# notice the parens after the type name
spark.createDataFrame(mylist, IntegerType()).show()
NOTE: About naming your variable list
: the term list
is a Python builtin function and as such, it is strongly recommended that we avoid using builtin names as the name/label for our variables because we end up overwriting things like the list()
function. When prototyping something fast and dirty, a number of folks use something like: mylist
.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…